Currently not working with Gemma 2 models

I tried to run this with Gemma 2 27b it and found that it doesn't  quite work. I verified that everything works with qwen/qwen-1_8b-chat.

I get this error message:

```Assertion error: All scores have been filtered out```

It also seems that KL scores are very large (>10)

I tried to find the reason but could not find a solution so far.

However, I did verify that the chat template worked correctly and it also seemed i could sample text from the model normally, when i placed a breakpoint in the function get_mean_activations which measured the activations. 

What seemed odd was that the mean_diff of activations between harmful and harmless prompts was quite large, often between -200 and +200. In comparison, the mean diff of qwen was more like -2 to 2. So possibly there is an issue with the hooks?

The current GemmaModel is designed for Gemma 1 models. It seems the only architectural change is to add a rms norm before and after the MLP. I am not familiar with the details of the Gemma2RMSNorm implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Currently not working with Gemma 2 models #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Currently not working with Gemma 2 models #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions