Skip to content

Conversation

allenschmaltz
Copy link

I believe these 2 papers and 2 blog posts will be of interest to LLM interpretability researchers, as they cover a rather different set of approaches for interpretability over the models with non-identifiable parameters (e.g., LLMs), and work well in practice. These can be characterized at a high-level as "uncertainty-aware interpretability-by-exemplar" methods.

Feel free to include or not include at your discretion. (I also fixed a minor typo.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant