[codex] Add sparse filesystem weight broadcast by samsja · Pull Request #2607 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-23T21:22:22Z

Summary

Adds sparse checkpoint-format transfer as an opt-in mode of the existing filesystem weight broadcast backend.

Adds shared, trainer, orchestrator, and inference config support for weight_broadcast.type = "filesystem" with weight_broadcast.sparse = true.
Implements a trainer backend that writes full checkpoints on first/forced syncs and layerwise sparse delta directories otherwise, without gathering the full model state dict on rank 0.
Adds an inference worker that materializes sparse deltas into a private local checkpoint cache before reusing the existing vLLM checkpoint reload path.
Logs sparse broadcast metrics from the trainer backend; orchestrator W&B logging emits only sparse_broadcast_ratio so the main run shows the delta/full-weight size ratio directly.
Preserves full base checkpoints needed by surviving sparse delta chains during broadcast cleanup.
Keeps LoRA rejected for sparse filesystem broadcast until adapter semantics are defined.

Validation

CLI parse check against examples/reverse_text/rl.toml with --weight-broadcast.type filesystem --weight-broadcast.sparse true --weight-broadcast.full-sync-interval 10
uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py packages/prime-rl-configs/src/prime_rl/configs/inference.py packages/prime-rl-configs/src/prime_rl/configs/rl.py packages/prime-rl-configs/src/prime_rl/utils/validation.py src/prime_rl/trainer/rl/broadcast/__init__.py src/prime_rl/trainer/rl/broadcast/sparse_filesystem.py src/prime_rl/trainer/rl/train.py src/prime_rl/inference/vllm/server.py tests/unit/test_configs.py
uv run pytest tests/unit/test_configs.py tests/unit/utils/test_sparse_weights.py tests/unit/orchestrator/test_scheduler.py -q
uv run ruff check src/prime_rl/orchestrator/scheduler.py tests/unit/orchestrator/test_scheduler.py
uv run pytest tests/unit/orchestrator/test_scheduler.py tests/unit/utils/test_sparse_weights.py -q
uv run ruff check src/prime_rl/utils/sparse_weights.py src/prime_rl/trainer/rl/broadcast/base.py src/prime_rl/trainer/rl/broadcast/sparse_filesystem.py src/prime_rl/trainer/rl/train.py src/prime_rl/orchestrator/scheduler.py tests/unit/utils/test_sparse_weights.py
uv run ruff check src/prime_rl/utils/sparse_weights.py src/prime_rl/trainer/rl/broadcast/sparse_filesystem.py src/prime_rl/inference/vllm/worker/sparse_filesystem.py src/prime_rl/trainer/rl/train.py packages/prime-rl-configs/src/prime_rl/configs/{trainer,orchestrator,inference,rl}.py tests/unit/utils/test_sparse_weights.py tests/unit/test_configs.py
uv run pytest tests/unit/utils/test_sparse_weights.py tests/unit/test_configs.py -q

samsja added 5 commits May 23, 2026 21:21

Add sparse filesystem weight broadcast

a923a4a

Add sparse broadcast metrics

bfbce5c

Log sparse broadcast ratio from orchestrator

d596e18

Rename orchestrator sparse ratio metric

68ac0b6

Fold sparse broadcast into filesystem config

797eb0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add sparse filesystem weight broadcast#2607

[codex] Add sparse filesystem weight broadcast#2607
samsja wants to merge 5 commits into
mainfrom
feat/sparse-filesystem-broadcast

samsja commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented May 23, 2026 •

edited

Loading