Skip to content

Fix multiple critical bugs in Transformer implementation#1

Merged
SauersML merged 1 commit intomainfrom
fix-transformer-bugs
Jul 29, 2025
Merged

Fix multiple critical bugs in Transformer implementation#1
SauersML merged 1 commit intomainfrom
fix-transformer-bugs

Conversation

@SauersML
Copy link
Copy Markdown
Owner

This commit fixes six critical bugs in the Transformer model implementation that prevented it from training correctly.

The fixes include:

  1. Backpropagation Design: The backpropagation pass was fundamentally flawed as it lacked the necessary input activations. The backward methods for all layers have been refactored to accept the forward pass inputs, and the training loop now caches these activations.

  2. Gradient Calculations: The gradient calculation logic in LayerNorm::backward and MultiHeadAttention::backward was mathematically incorrect. These functions have been rewritten with the correct backpropagation implementations.

  3. Unstable Tokenizer: The tokenizer was rebuilding its vocabulary on every call, making the token-to-word mapping unstable. The vocabulary building is now a separate, one-time step (build_vocab), and the tokenize function only uses the frozen vocabulary.

  4. Deterministic Sampling: The random number generator was re-initialized with a fixed seed in the sampling functions, making predictions deterministic. The RNG is now created once and passed as a mutable reference to ensure stateful, non-deterministic sampling.

  5. Identical Embeddings: The embedding vectors were identical due to a bug in the initialization loop. This was already fixed in the provided code, so the fix was verified.

  6. Unused Custom exp: The softmax function was using the standard library's .exp() method instead of the custom exp function. The call has been corrected.

Additionally, compilation errors related to file paths and borrow checking that arose from the fixes have been resolved.

This commit fixes six critical bugs in the Transformer model implementation that prevented it from training correctly.

The fixes include:

1.  **Backpropagation Design:** The backpropagation pass was fundamentally flawed as it lacked the necessary input activations. The `backward` methods for all layers have been refactored to accept the forward pass inputs, and the training loop now caches these activations.

2.  **Gradient Calculations:** The gradient calculation logic in `LayerNorm::backward` and `MultiHeadAttention::backward` was mathematically incorrect. These functions have been rewritten with the correct backpropagation implementations.

3.  **Unstable Tokenizer:** The tokenizer was rebuilding its vocabulary on every call, making the token-to-word mapping unstable. The vocabulary building is now a separate, one-time step (`build_vocab`), and the `tokenize` function only uses the frozen vocabulary.

4.  **Deterministic Sampling:** The random number generator was re-initialized with a fixed seed in the sampling functions, making predictions deterministic. The RNG is now created once and passed as a mutable reference to ensure stateful, non-deterministic sampling.

5.  **Identical Embeddings:** The embedding vectors were identical due to a bug in the initialization loop. This was already fixed in the provided code, so the fix was verified.

6.  **Unused Custom `exp`:** The `softmax` function was using the standard library's `.exp()` method instead of the custom `exp` function. The call has been corrected.

Additionally, compilation errors related to file paths and borrow checking that arose from the fixes have been resolved.
@SauersML SauersML merged commit cadb5eb into main Jul 29, 2025
1 check passed
@SauersML SauersML deleted the fix-transformer-bugs branch July 29, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant