[bugfix] fix glm5.2 indexer_type 'shared'#134
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the GPT bridge to conditionally handle the indexer state dict update when using the 'dsa' experimental attention variant by checking and reducing the indexer's presence across the pipeline parallel group. Feedback suggests using defensive programming with getattr to avoid potential AttributeError exceptions when accessing core_attention and indexer on mg_attn.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for GLM 5.2 with a shared DSA (Dense Sparse Attention) indexer by adding the glm_moe_dsa model implementation. This includes custom attention, GPT model, transformer block, and loader classes to manage shared indexer states across pipeline parallel stages, as well as bridge updates for handling indexer states. The feedback suggests defensive retrieval of indexer_types to prevent potential AttributeErrors and specifying the dimension in squeeze() on the attention mask to avoid unintended squeezing of the batch dimension when the batch size is 1.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| def __init__(self, config, submodules, layer_number, *args, **kwargs): | ||
| super().__init__(config, submodules, layer_number, *args, **kwargs) | ||
| indexer_types = config.hf_config.indexer_types |
There was a problem hiding this comment.
| else: | ||
| b = query.size(1) | ||
| assert attention_mask.shape == (b, 1, sq, skv) | ||
| mask = attention_mask.squeeze() |
There was a problem hiding this comment.
Using attention_mask.squeeze() without specifying a dimension is dangerous because if the batch size b is 1, it will squeeze the batch dimension as well, leading to inconsistent tensor shapes (e.g., (sq, skv) instead of (1, sq, skv)). Specifying the dimension to squeeze (dimension 1, which is the singleton dimension for attention heads/groups) ensures consistent shapes regardless of the batch size.
| mask = attention_mask.squeeze() | |
| mask = attention_mask.squeeze(1) |
| def get_transformer_layer_spec(self, vp_stage: Optional[int] = None): | ||
| transformer_layer_spec = super().get_transformer_layer_spec(vp_stage) | ||
|
|
||
| indexer_types = self.config.hf_config.indexer_types |
There was a problem hiding this comment.
No description provided.