Skip to content

0.0.42

Latest

Choose a tag to compare

@erfanzar erfanzar released this 18 Jul 15:35
· 70 commits to main since this release

Release Notes (0.0.42, 0.0.41, 0.0.40, ...)

Major Features & Enhancements

  • Distributed Execution & Cluster Management

    • Added RayExecutor for executing remote functions with support for multi-slice and resumable executions.
    • Implemented Ray TPU/GPU/CPU Cluster Setup utilities.
    • Enhanced TPU patcher and cluster utility functions, including dynamic patching and improved command-line configuration.
  • Sharding & Partitioning

    • Introduced new dynamic sharding axes and enhanced partition manager functionality.
    • Added flexible sharding strategies: Data Parallelism (DP), Fully Sharded Data Parallel (FSDP), Tensor Parallelism (TP), Expert Parallelism (EP), Sequence Parallelism (SP).
    • Improved partition axis handling, including helper functions and dataclass-based refactors.
  • PyTree & Serialization

    • Added FrozenPyTree and improved PyTree module for better JAX compatibility.
    • Enhanced serialization capabilities for JAX PyTree-compatible dataclasses.
    • Improved error handling and docstrings in state management.
  • Optimized Operations & Quantization

    • Improved Triton call logging and error handling for more consistent output.
    • Enhanced quantization functions and support for float8, float16, bfloat16, and dynamic loss scaling.
    • Added support for 8-bit and NF4 quantization for efficient model deployment.
  • Documentation & Usability

    • Updated and expanded documentation, including project structure, key features, and API references.
    • Improved README and Sphinx documentation structure.
    • Added license headers and improved code readability and maintainability.
  • General Refactoring & Maintenance

    • Refactored codebase for improved clarity, maintainability, and Python 3.10+ compatibility.
    • Updated dependencies and switched from poetry to uv for build management.
    • Removed deprecated and obsolete modules, streamlined imports, and improved module exports.

Notable Fixes

  • Fixed issues with mesh creation for multi-slice environments.
  • Enhanced error handling for Ray command execution in TPU patcher.
  • Fixed Python 3.10 compatibility issues.
  • Improved logging and validation in sharding and partitioning utilities.