Here is my suggestion:
- change Block indictor into 12 block buttons; So it's clear to user/viewer that these blocks are serial;
- change Attention Head switch to Y axis; So it's clear to user/viewer that Attention Head is DIFFERENT dimension than Block; Also highlight active Head.
- UI should be fixed, Positional Encoding should NOT be part of embedding; Should be part of Attention Block; Because each Attention block applies its own ROPE.
- more identical Blocks; There should have another similar indicator in beginning of residual stream. So it's clear for user embedding is DONE once.
- Start & End of residual stream should have same color;
- Also, modify GPT2 model, so probality section can show different position decode token at the end of residual stream; (This will take more effort)
Thanks for your great work. I love this repo.

Here is my suggestion:
Thanks for your great work. I love this repo.