Release Notes (0.0.42, 0.0.41, 0.0.40, ...)
Major Features & Enhancements
-
Distributed Execution & Cluster Management
- Added RayExecutor for executing remote functions with support for multi-slice and resumable executions.
- Implemented Ray TPU/GPU/CPU Cluster Setup utilities.
- Enhanced TPU patcher and cluster utility functions, including dynamic patching and improved command-line configuration.
-
Sharding & Partitioning
- Introduced new dynamic sharding axes and enhanced partition manager functionality.
- Added flexible sharding strategies: Data Parallelism (DP), Fully Sharded Data Parallel (FSDP), Tensor Parallelism (TP), Expert Parallelism (EP), Sequence Parallelism (SP).
- Improved partition axis handling, including helper functions and dataclass-based refactors.
-
PyTree & Serialization
- Added
FrozenPyTreeand improved PyTree module for better JAX compatibility. - Enhanced serialization capabilities for JAX PyTree-compatible dataclasses.
- Improved error handling and docstrings in state management.
- Added
-
Optimized Operations & Quantization
- Improved Triton call logging and error handling for more consistent output.
- Enhanced quantization functions and support for float8, float16, bfloat16, and dynamic loss scaling.
- Added support for 8-bit and NF4 quantization for efficient model deployment.
-
Documentation & Usability
- Updated and expanded documentation, including project structure, key features, and API references.
- Improved README and Sphinx documentation structure.
- Added license headers and improved code readability and maintainability.
-
General Refactoring & Maintenance
- Refactored codebase for improved clarity, maintainability, and Python 3.10+ compatibility.
- Updated dependencies and switched from
poetrytouvfor build management. - Removed deprecated and obsolete modules, streamlined imports, and improved module exports.
Notable Fixes
- Fixed issues with mesh creation for multi-slice environments.
- Enhanced error handling for Ray command execution in TPU patcher.
- Fixed Python 3.10 compatibility issues.
- Improved logging and validation in sharding and partitioning utilities.