Fix multiple critical bugs in Transformer implementation by SauersML · Pull Request #1 · SauersML/rustformer

SauersML · 2025-07-29T23:01:11Z

This commit fixes six critical bugs in the Transformer model implementation that prevented it from training correctly.

The fixes include:

Backpropagation Design: The backpropagation pass was fundamentally flawed as it lacked the necessary input activations. The backward methods for all layers have been refactored to accept the forward pass inputs, and the training loop now caches these activations.
Gradient Calculations: The gradient calculation logic in LayerNorm::backward and MultiHeadAttention::backward was mathematically incorrect. These functions have been rewritten with the correct backpropagation implementations.
Unstable Tokenizer: The tokenizer was rebuilding its vocabulary on every call, making the token-to-word mapping unstable. The vocabulary building is now a separate, one-time step (build_vocab), and the tokenize function only uses the frozen vocabulary.
Deterministic Sampling: The random number generator was re-initialized with a fixed seed in the sampling functions, making predictions deterministic. The RNG is now created once and passed as a mutable reference to ensure stateful, non-deterministic sampling.
Identical Embeddings: The embedding vectors were identical due to a bug in the initialization loop. This was already fixed in the provided code, so the fix was verified.
Unused Custom exp: The softmax function was using the standard library's .exp() method instead of the custom exp function. The call has been corrected.

Additionally, compilation errors related to file paths and borrow checking that arose from the fixes have been resolved.

This commit fixes six critical bugs in the Transformer model implementation that prevented it from training correctly. The fixes include: 1. **Backpropagation Design:** The backpropagation pass was fundamentally flawed as it lacked the necessary input activations. The `backward` methods for all layers have been refactored to accept the forward pass inputs, and the training loop now caches these activations. 2. **Gradient Calculations:** The gradient calculation logic in `LayerNorm::backward` and `MultiHeadAttention::backward` was mathematically incorrect. These functions have been rewritten with the correct backpropagation implementations. 3. **Unstable Tokenizer:** The tokenizer was rebuilding its vocabulary on every call, making the token-to-word mapping unstable. The vocabulary building is now a separate, one-time step (`build_vocab`), and the `tokenize` function only uses the frozen vocabulary. 4. **Deterministic Sampling:** The random number generator was re-initialized with a fixed seed in the sampling functions, making predictions deterministic. The RNG is now created once and passed as a mutable reference to ensure stateful, non-deterministic sampling. 5. **Identical Embeddings:** The embedding vectors were identical due to a bug in the initialization loop. This was already fixed in the provided code, so the fix was verified. 6. **Unused Custom `exp`:** The `softmax` function was using the standard library's `.exp()` method instead of the custom `exp` function. The call has been corrected. Additionally, compilation errors related to file paths and borrow checking that arose from the fixes have been resolved.

SauersML merged commit cadb5eb into main Jul 29, 2025
1 check passed

SauersML deleted the fix-transformer-bugs branch July 29, 2025 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiple critical bugs in Transformer implementation#1

Fix multiple critical bugs in Transformer implementation#1
SauersML merged 1 commit intomainfrom
fix-transformer-bugs

SauersML commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SauersML commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant