Hello InterPLM team,
First of all, thank you for this amazing work! The ability to interpret PLM features via SAEs is incredibly useful.
I am currently using the pre-trained ESM-2-650M SAEs (specifically Layer 33) for my research. While I can load the SAE weights and extract features successfully, I noticed that the biological concept annotations (the mapping between Feature IDs and biological concepts like Swiss-Prot keywords, domains, etc.) seem to be readily available or visualized primarily for the ESM-2-8M model on the dashboard.
Given that your paper highlights that the 650M model captures significantly more and richer biological concepts (especially in deeper layers) compared to the 8M model, having access to the pre-computed annotations for the 650M model would be extremely valuable.
Could you please provide or point me to where I can download the feature-to-concept annotation files for the ESM-2-650M model (especially Layer 33)?
Having these annotations would allow us to better interpret the discriminative features we've identified in our downstream tasks without running the full annotation pipeline from scratch.
Thank you for your time and help!
Hello InterPLM team,
First of all, thank you for this amazing work! The ability to interpret PLM features via SAEs is incredibly useful.
I am currently using the pre-trained ESM-2-650M SAEs (specifically Layer 33) for my research. While I can load the SAE weights and extract features successfully, I noticed that the biological concept annotations (the mapping between Feature IDs and biological concepts like Swiss-Prot keywords, domains, etc.) seem to be readily available or visualized primarily for the ESM-2-8M model on the dashboard.
Given that your paper highlights that the 650M model captures significantly more and richer biological concepts (especially in deeper layers) compared to the 8M model, having access to the pre-computed annotations for the 650M model would be extremely valuable.
Could you please provide or point me to where I can download the feature-to-concept annotation files for the ESM-2-650M model (especially Layer 33)?
Having these annotations would allow us to better interpret the discriminative features we've identified in our downstream tasks without running the full annotation pipeline from scratch.
Thank you for your time and help!