Skip to content

feat: refactored online model to use t5 architecture#4

Merged
imabeastdrew merged 3 commits intomainfrom
decoder-transformer-refactor
Dec 4, 2025
Merged

feat: refactored online model to use t5 architecture#4
imabeastdrew merged 3 commits intomainfrom
decoder-transformer-refactor

Conversation

@schennam714
Copy link
Copy Markdown
Collaborator

@schennam714 schennam714 commented Dec 4, 2025

Summary

Replaces the custom vanilla decoder-only transformer in the online model with HuggingFace's T5Stack.

Changes

Architecture (src/musicagent/models/online.py)

Removed:

  • RMSNorm, RelativePositionBias, SelfAttentionWithRelPos, DecoderBlock — custom vanilla components

Added:

  • T5Stack configured as decoder-only (is_decoder=True, is_encoder_decoder=False)
  • Explicit attention_mask parameter in forward() for correct batched generation
  • Cumulative attention mask tracking in generate() to handle variable-length padded batches

Preserved:

  • Separate melody/chord embeddings with is_melody mask
  • Interleaved sequence format [SOS, y₁, x₁, y₂, x₂, ...]
  • Public API: forward(), generate(), enable_gradient_checkpointing()

Tests (tests/test_model.py)

  • Replaced test_online_causal_masktest_online_causal_behavior (tests observable behavior, not internal implementation)
  • Added test_online_generate_with_variable_length_batch (verifies attention mask fix)

Breaking Changes

  • Not checkpoint-compatible with vanilla model weights (fc_out now has bias=False to match T5 convention)

Notes

  • Default FFN is ReLU (same as vanilla). Set feed_forward_proj="gated-gelu" in T5Config for T5 1.1 gated variant.
  • Cross-attention layers exist in T5Stack but are skipped (no encoder hidden states provided)
  • All 55 tests pass

@imabeastdrew imabeastdrew merged commit cdb8aad into main Dec 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants