I tried to run this with Gemma 2 27b it and found that it doesn't quite work. I verified that everything works with qwen/qwen-1_8b-chat.
I get this error message:
Assertion error: All scores have been filtered out
It also seems that KL scores are very large (>10)
I tried to find the reason but could not find a solution so far.
However, I did verify that the chat template worked correctly and it also seemed i could sample text from the model normally, when i placed a breakpoint in the function get_mean_activations which measured the activations.
What seemed odd was that the mean_diff of activations between harmful and harmless prompts was quite large, often between -200 and +200. In comparison, the mean diff of qwen was more like -2 to 2. So possibly there is an issue with the hooks?
The current GemmaModel is designed for Gemma 1 models. It seems the only architectural change is to add a rms norm before and after the MLP. I am not familiar with the details of the Gemma2RMSNorm implementation.
I tried to run this with Gemma 2 27b it and found that it doesn't quite work. I verified that everything works with qwen/qwen-1_8b-chat.
I get this error message:
Assertion error: All scores have been filtered outIt also seems that KL scores are very large (>10)
I tried to find the reason but could not find a solution so far.
However, I did verify that the chat template worked correctly and it also seemed i could sample text from the model normally, when i placed a breakpoint in the function get_mean_activations which measured the activations.
What seemed odd was that the mean_diff of activations between harmful and harmless prompts was quite large, often between -200 and +200. In comparison, the mean diff of qwen was more like -2 to 2. So possibly there is an issue with the hooks?
The current GemmaModel is designed for Gemma 1 models. It seems the only architectural change is to add a rms norm before and after the MLP. I am not familiar with the details of the Gemma2RMSNorm implementation.