Releases: instructlab/training
Releases · instructlab/training
v0.12.0 - GPT-OSS Support
Full fine-tuning now supports gpt-oss models, alongside minor bugfixes to ensure correct loss calculations with higher gradient accumulation.
What's Changed
- Disable workflow runs on forks by default by @fynnsu in #632
- Adding GPT OSS Support by @Maxusmusti in #646
- Update numpy from <2.0 to <2.3 by @Maxusmusti in #656
- Add kernels>0.9.0 to CUDA requirements by @Maxusmusti in #658
Full Changelog: v0.11.1...v0.12.0
v0.11.1
What's Changed
- Add general logging implementation by @fynnsu in #500
- docs: add CI documentation by @nathan-weinberg in #555
- fix: Use default torch timeout for nccl watchdog unless overridden by @booxter in #521
- fix: Fix markdown-lint violations by @booxter in #559
- ci: add 3.12 smoke workflow flavor by @booxter in #535
- adds barriers after checkpoint saving by @JamesKunstle in #566
- ci: Fix smoke failures due to
pre
not available in local actions by @booxter in #565 - Checkout correct branch on
pull_request_target
trigger by @fynnsu in #549 - Logging Fixes & Enhancements by @RobotSail in #571
- docs: Remove badge for a no longer existing job by @booxter in #542
- uses
__name__
in logging.getLogger by @JamesKunstle in #573 - ci: stop reporting results to slack by @ktdreyer in #574
- CI: Constrain all dependencies; introduce a Monday workflow to update pins by @booxter in #558
- ci: Run jobs on constraints-dev.txt change by @booxter in #580
- chore: update constraints-dev.txt (2025-05-30) by @courtneypacheco in #579
- remove old Deepspeed-native code by @JamesKunstle in #567
- add DCO.txt by @ktdreyer in #588
- ci: Disable dependabot for pip dependencies by @booxter in #587
- feat: refactor main_ds.py (1/n) Model class by @cdoern in #572
- ci: do not require DCO job by @ktdreyer in #595
- 'granite-3.3-2b-instruct' for smoketest; smaller smoke dataset by @JamesKunstle in #590
- fixes unit tests requiring cuda by @JamesKunstle in #586
- chore: update constraints-dev.txt (2025-06-02) by @courtneypacheco in #584
- ci: Cover more test dependencies with pins by @booxter in #581
- ci: Introduce python 3.12 e2e large job flavor by @booxter in #563
- Implicit distributed backend selection by @booxter in #516
- ci: Fix incorrect indent in workflow steps by @booxter in #599
- feat: refactor main_ds.py (2/n) Accelerator class by @cdoern in #594
- chore: update constraints-dev.txt (2025-06-09) by @courtneypacheco in #602
- feat: add medium e2e CI job for each PR by @cdoern in #551
- test: fix e2e target by @cdoern in #610
- chore: update constraints-dev.txt (2025-06-16) by @courtneypacheco in #612
- Remove Dolomite support by @booxter in #616
- Revert "test: fix e2e target" by @bbrowning in #620
- ci: Remove harden-runner steps from jobs by @booxter in #617
- test: disable per-PR test by @cdoern in #631
- fix edge case for qwen3 data processing by @RobotSail in #626
- uncap accelerate in
requirements-cuda.txt
by @ktdreyer in #628 - chore: update constraints-dev.txt (2025-06-30) by @courtneypacheco in #623
- Fix a mistake in formatting a floating-point value by @mtake in #639
- Add a tutorial for fine-tuning and interpolation by @mtake in #640
New Contributors
- @bbrowning made their first contribution in #620
- @mtake made their first contribution in #639
Full Changelog: v0.11...v0.11.1
v0.10.4
v0.10.3
v0.11
What's Changed
- ci: Remove workflow that doesn't utilize training library (medium, -mp) by @booxter in #478
- Obey the FSDP sharding option default by @Maxusmusti in #486
- Change default internal sharding strategy to HYBRID_SHARD by @Maxusmusti in #488
- chore: Update the large e2e job to use fallback logic for selecting EC2 instances by @courtneypacheco in #491
- moves deepspeed requirements into their own file; add deepspeed extras by @JamesKunstle in #455
- chore: introduce dummy workflow by @cdoern in #497
- ci: Search for necessary instance for smoke job in multiple AZs by @booxter in #481
- ci: Fix -sdk fake workflow failure on actionlint by @booxter in #501
- build(deps): Bump actions/setup-python from 5.5.0 to 5.6.0 by @dependabot in #493
- use instructlab
constraints-dev.txt
in e2e test by @ktdreyer in #499 - build(deps): Bump step-security/harden-runner from 2.11.1 to 2.12.0 by @dependabot in #490
- ci: Use tox-current-env to reuse prepared venv with torch by @booxter in #482
- fix: extend nccl timeout by @cdoern in #507
- always log storage by @RobotSail in #510
- deps: Remove caps on ROCm dependencies by @courtneypacheco in #517
- ci: don't trigger pull_request_target job on its own workflow by @booxter in #519
- Enable pylint 'unused-argument' check by @fynnsu in #528
New Contributors
Full Changelog: v0.10.0...v0.11
v0.10.2 - Remove ROCm dependency caps
What's Changed
Full Changelog: v0.10.1...v0.10.2
v0.10.1 - Updating Default FSDP Sharding
What's Changed
- ci: Remove workflow that doesn't utilize training library (medium, -mp) by @booxter in #478
- Obey the FSDP sharding option default (backport #486) by @mergify in #487
- Change default internal sharding strategy to HYBRID_SHARD (backport #488) by @mergify in #489
Full Changelog: v0.10.0...v0.10.1
v0.10.0 - Updated FSDP Mixed Precision and Liger Kernel Model Option Support
What's Changed
- disables e2e-nvidia-l4-x1 test by @JamesKunstle in #454
- ci: Fix unit test run due to no tests found to execute by @booxter in #466
- ci: Don't run smoke tests when only irrelevant files are touched by @booxter in #460
- ci: don't waste ec2 resources on unit tests by @booxter in #464
- ci: Trigger unit test run on tox.ini change by @booxter in #469
- ci: Fix path filter for unit tests for the workflow file by @booxter in #461
- chore: Don't install pytest dependencies for coverage reports by @booxter in #468
- chore: Remove spell checks from the repo by @booxter in #458
- chore: Don't set ec2_runner_variant for unit tests by @booxter in #475
- Remove CHANGELOG.md by @booxter in #457
- Fix FSDP mixed precision setting and loss w/ accelerate by @Maxusmusti in #465
- fixes non-granite model instantiation with Liger Kernel by @JamesKunstle in #476
- ci: Install torch before flash-attn by @booxter in #474
- ci: Use pull_request as trigger for unit tests by @booxter in #473
- ci: Run unit tests for all supported python version, 3.11+ by @booxter in #472
- chore: Require python3.11+ by @booxter in #470
- chore: Drop pytest-asyncio by @booxter in #467
- chore: don't trigger unit tests for cuda and rocm requirements changes by @booxter in #463
- build(deps): Bump step-security/harden-runner from 2.10.4 to 2.11.1 by @dependabot in #452
- build(deps): Bump machulav/ec2-github-runner from 2.3.8 to 2.3.9 by @dependabot in #450
- build(deps): Bump aws-actions/configure-aws-credentials from 4.0.2 to 4.1.0 by @dependabot in #451
Full Changelog: v0.9.0...v0.10.0
v0.9.0
What's Changed
- build(deps): Bump machulav/ec2-github-runner from 2.3.8 to 2.3.9 by @dependabot in #431
- build(deps): Bump step-security/harden-runner from 2.11.0 to 2.11.1 by @dependabot in #439
- Adds Liger Kernels as optional optimization by @JamesKunstle in #441
- fix: model.forward now accepts return_dict via kwargs by @booxter in #443
- Adds smoke test workflow and tests by @JamesKunstle in #424
- change pytest targets.
test-unit
andtest-smoke
tounit
andsmoke
by @JamesKunstle in #453
Full Changelog: v0.8.0...v0.9.0
Training Release v0.8.1
What's Changed
Full Changelog: v0.8.0...v0.8.1