Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
71344f9
Implemented a distributed solution which refactors current run_model …
coketaste Jul 3, 2025
ea2dc0c
Fixed the test cases for distributed solution
coketaste Jul 3, 2025
bb64b73
Updated the interface and fix the issue due to updating
coketaste Jul 3, 2025
86d1790
Reorganize the cli interface of distributed solution
coketaste Jul 3, 2025
dd71dfa
Updated the interface of distributed solution and refine the code wit…
coketaste Jul 4, 2025
8236a7b
Added setup.py for installation with dev
coketaste Jul 4, 2025
0c42bbf
Fix the test case of distributed cli
coketaste Jul 4, 2025
f942a45
Fixed the flow of manifest and run_phase to work properly
coketaste Jul 4, 2025
08ff29b
Updated setup.py for the cases of modern and legacy installation
coketaste Jul 4, 2025
d82d78e
Fixed and enhanced live log in build and run phases
coketaste Jul 4, 2025
c848419
Fixed the log generate for different phase, and correct log name
coketaste Jul 4, 2025
3e2a44c
Fix the perf.csv generation in distributed execution
coketaste Jul 5, 2025
8a359ac
Fixed the data which update to perf.csv
coketaste Jul 5, 2025
ac32cbe
Fixed the columns in perf.csv due to parsing issue
coketaste Jul 5, 2025
bb6d3fc
Fix the incorrect regex escaping in the container runner that prevent…
coketaste Jul 5, 2025
a255c50
Update the patterns of performance and metric
coketaste Jul 5, 2025
8988508
Fixed the issue of docker_image column in perf.csv
coketaste Jul 5, 2025
f0a10a7
Improve the interface and reduce erro in registry flow
coketaste Jul 5, 2025
8caae5c
Updated the flow of run phase, fix the docker pull, fix the creds ver…
coketaste Jul 5, 2025
942e666
updated the tagged name for docker image and add a docker_image_tagge…
coketaste Jul 5, 2025
a7baa17
Updated the sequence of operations in build phas
coketaste Jul 6, 2025
2e613ca
Fixed the registry_image
coketaste Jul 6, 2025
b4e7d22
Update the registry_image
coketaste Jul 6, 2025
d1ecb97
Updated the process of run phase
coketaste Jul 6, 2025
799cce7
Refactored the file structure of package for distributed_cli
coketaste Jul 6, 2025
9875fda
Fixed the errors in unit tests
coketaste Jul 6, 2025
168ffe5
Fix the error in unit test of distributed cli
coketaste Jul 6, 2025
756d82a
Refactored constants to make design as best practices
coketaste Jul 6, 2025
9431d7f
Cleanup:
coketaste Jul 6, 2025
cf50f13
Fixed the regex pattern
coketaste Jul 6, 2025
c909a93
Fix ensures that distributed_cli logs will now contain the same detai…
coketaste Jul 6, 2025
c04435d
Implemented new test cases for pre/post scripts and profiling cases
coketaste Jul 6, 2025
72bc7bc
Debug the test cases
coketaste Jul 6, 2025
c0dd6ca
Fixed the test cases in distributed integration
coketaste Jul 6, 2025
92db9fb
Refactor context class to make it work on usages of build-only on cpu…
coketaste Jul 6, 2025
9628a01
Update the validation function and GPU detection in additional context
coketaste Jul 6, 2025
68e19fb
tests now automatically detect machine capabilities and skip GPU-depe…
coketaste Jul 6, 2025
5dfa775
Create a new madengine CLI application
coketaste Jul 6, 2025
901c12b
Fixed the test cases of distrubted integration and profiling
coketaste Jul 6, 2025
6caf244
Fix the python version compatible issue
coketaste Jul 7, 2025
d87e9b0
Fixed the error of model dict
coketaste Jul 7, 2025
61ac4f7
Update the input arg of clean docker cache and it guide
coketaste Jul 7, 2025
9469d8b
Updated distributed-execution-solution
coketaste Jul 7, 2025
b94a118
Ensures that when you run the example command on a build-only node, t…
coketaste Jul 7, 2025
802a36c
Fix the docker env vars set during build phase
coketaste Jul 7, 2025
50267e7
Filter out redundent MAD env vars
coketaste Jul 7, 2025
a52f853
Refine the docs and add diagrams of flow
coketaste Jul 7, 2025
c77cee7
Updated images of flow chart
coketaste Jul 7, 2025
df8cb08
Updated the madengien cli guide
coketaste Jul 7, 2025
2d1ae9d
Removed the execution config and enhanced implementation of manifest.…
coketaste Jul 7, 2025
9ee383b
clean up the code
coketaste Jul 7, 2025
3c1da45
Updated the distributed cli interface and clean up the code
coketaste Jul 7, 2025
0fb0e53
Fix the pulling issue from registry
coketaste Jul 7, 2025
ab0bbe6
Updated the docs
coketaste Jul 7, 2025
81bc4e4
Created a professional, comprehensive, and maintainable documentation…
coketaste Jul 8, 2025
ab36c76
make a well-formatted documentation of README
coketaste Jul 8, 2025
85c66de
Fix the MODEL_DIR setup issue
coketaste Jul 8, 2025
91805ae
Fixed the out of date unit tests in distributed cli
coketaste Jul 8, 2025
0a1a679
All syntax errors resolved - file compiles successfully in distribute…
coketaste Jul 8, 2025
ef64de6
Fix the test case of distributed integration
coketaste Jul 8, 2025
23b3bbb
Fixed the test profiling
coketaste Jul 8, 2025
0fec233
Updated the fix to handle permssion erro
coketaste Jul 8, 2025
b5f6486
Refine the assertion
coketaste Jul 8, 2025
7060f76
Added test cases of mad_cli and distributed integration
coketaste Jul 8, 2025
b65bf0d
Massively enhanced distributed execution with runners of SSH, Ansbile…
coketaste Jul 9, 2025
661a9ae
Reverted somme missing functions
coketaste Jul 9, 2025
29ac831
new functionality allows users to provide Docker Hub credentials via …
coketaste Jul 9, 2025
8e26033
Merge branch 'coketaste/refactor' into coketaste/refactor-runners
coketaste Jul 9, 2025
db75808
Changed docker.io to dockerhub
coketaste Jul 9, 2025
14cc12e
Merge branch 'coketaste/refactor' into coketaste/refactor-runners
coketaste Jul 9, 2025
9b09f01
Fix the test case of context
coketaste Jul 9, 2025
2a26dbf
Updated README.md
coketaste Jul 9, 2025
b35508b
Fix the unit test of e2e distributed run with profiling
coketaste Jul 9, 2025
a61c287
Fixed the issue of mocks gpu
coketaste Jul 9, 2025
96d7e27
Rewrite the unit test gpu version
coketaste Jul 9, 2025
566f1cb
Fixed the manfiest name error
coketaste Jul 10, 2025
cbd86c1
Fixed the missing manifest file
coketaste Jul 10, 2025
b3052f5
Updated the warning message of missing cred
coketaste Jul 10, 2025
4955bcf
Merge pull request #14 from ROCm/coketaste/refactor-runners
coketaste Jul 10, 2025
71fe348
Updated the MAD_DOCKERHUB_ creds parsing logic
coketaste Jul 10, 2025
49f60dc
Merge branch 'coketaste/refactor' of https://github.com/ROCm/madengin…
coketaste Jul 10, 2025
32b5ff7
Updatd README
coketaste Jul 11, 2025
b22bc7b
Implemented a batch input arg for madengine-cli build
coketaste Jul 11, 2025
768dcf9
enhanced logging system is now active and will automatically highligh…
coketaste Jul 11, 2025
a4b324f
Fix the error local variable docker_image referenced before assignment
coketaste Jul 11, 2025
ebfb472
Updated the perf dataframe output
coketaste Jul 11, 2025
e47572e
The fixes are backward compatible and maintain existing functionality…
coketaste Jul 11, 2025
3a73edc
Fixed the problematic log
coketaste Jul 11, 2025
e1000a4
Fixed the error pattern, removed the wrong string
coketaste Jul 11, 2025
06934d3
Fixed the error of test prof
coketaste Jul 12, 2025
59dd584
Updated the interface of mad_cli
coketaste Jul 12, 2025
d696784
Merge pull request #17 from ROCm/coketaste/refactor-stage
coketaste Jul 12, 2025
5821b3b
Update README.md
coketaste Jul 14, 2025
30f1329
ensure that the DistributedOrchestrator.build_phase method and the un…
coketaste Jul 21, 2025
f6c18fa
Updated the build batch manifest to distributed orchestrator
coketaste Jul 21, 2025
11895f9
Debug the batch manifest
coketaste Jul 21, 2025
27627aa
Update the flow use per-model registry settings for both build and ru…
coketaste Jul 23, 2025
c7c6d37
correct registry image will be used for each model as intended
coketaste Jul 23, 2025
7449493
The push_image function now accepts and uses the explicit registry_im…
coketaste Jul 23, 2025
7f2c63b
Updated the explicit_registry_image assignment
coketaste Jul 23, 2025
9f50d04
Debug the registry info setting
coketaste Jul 23, 2025
05f8a26
Updated the function of export build manifest
coketaste Jul 23, 2025
8f8dc88
Add verbose for debugging
coketaste Jul 24, 2025
de6b49c
Debug the export build manifest
coketaste Jul 24, 2025
f1a3905
Debug the registry extract from batch build metadata
coketaste Jul 24, 2025
d412956
Debug the exaction
coketaste Jul 24, 2025
a03fa0d
Merge pull request #23 from ROCm/coketaste/refactor-batch
coketaste Jul 24, 2025
624cc29
Corrected the content of synthetic image which built_new is false in …
coketaste Jul 24, 2025
af7ddb4
Fixed the type error in additional context
coketaste Jul 25, 2025
b5a800b
Debug the parsing of gpu vendoer and guest os
coketaste Jul 25, 2025
bc18784
Correct the pattern of Dockerfile
coketaste Jul 25, 2025
558b7af
Updated the print
coketaste Jul 25, 2025
0b7eba6
Update the rich print
coketaste Jul 25, 2025
f4778ec
Merge pull request #25 from ROCm/coketaste/refactor-cleanup
coketaste Jul 25, 2025
57c4bce
Figured out a critical issue about dual CLI implementation creating m…
Jul 26, 2025
7ca3147
Fixed the dockerfile matched
coketaste Jul 27, 2025
55f630d
Resolved conflicts of merge
coketaste Jul 27, 2025
56eda87
refactored the logic in _process_batch_manifest_entries() to include …
coketaste Jul 27, 2025
6b60a37
Added unit tests for new unified error handlers
coketaste Jul 27, 2025
bc9153e
Updated README.md
coketaste Jul 28, 2025
55d378d
Implemented a SLURM runner follows the same comprehensive pattern as …
coketaste Jul 28, 2025
e369f1f
Fixed the errors in unit tests
coketaste Jul 28, 2025
aa9d39f
Merge pull request #29 from ROCm/coketaste/refactor-update-runner
coketaste Jul 28, 2025
90ec534
Used Rich console print to replace part of regular print to enhance t…
coketaste Jul 31, 2025
4256588
Updated rich conosle print to enhance the log readability
coketaste Jul 31, 2025
226b6a4
Update the new line
coketaste Jul 31, 2025
9090d23
Updated the new line for all sections
coketaste Jul 31, 2025
279223a
Updated final table of dataframe
coketaste Jul 31, 2025
bd16f88
Updated the display of dataframe from head to tail
coketaste Jul 31, 2025
af89326
Updated the checking gpu status
coketaste Jul 31, 2025
1c8f17c
Cleanup
coketaste Jul 31, 2025
1445618
Merge pull request #30 from ROCm/coketaste/refactor-update-log
coketaste Jul 31, 2025
4b57f4b
Merge pull request #27 from ROCm/coketaste/refactor-update
coketaste Aug 5, 2025
72982f8
Updated README
coketaste Aug 5, 2025
b6b79ca
Added discover command to mad_cli
coketaste Aug 5, 2025
00f4a5e
Implemented CLI detect MAD_CONTAINER_IMAGE in additional context, pro…
coketaste Aug 6, 2025
ee5740d
Merge pull request #31 from ROCm/coketaste/refactor-interface
coketaste Aug 7, 2025
364bef4
Implemented the core multi-GPU architectures support for docker image…
coketaste Aug 8, 2025
156bcfe
Implemented unit tests for the feature of multi-gpu arch
coketaste Aug 8, 2025
8457257
Debug and fix the unit test of multi gpu arch
coketaste Aug 8, 2025
3a0b4c7
Debug the issue of display results table
coketaste Aug 8, 2025
682bec2
Enhanced the results table, and improved the flow of handle gpu arch …
coketaste Aug 8, 2025
89784ca
Creates architecture-specific images with proper naming and metadata,…
coketaste Aug 9, 2025
23bbf57
Fixed the syntax error
coketaste Aug 9, 2025
4e61147
Merge pull request #32 from ROCm/coketaste/refactor-multi-gpu-archs
coketaste Aug 13, 2025
5444a67
ported changes from coketaste/amd-smi
Boss2002n Oct 3, 2025
9dfe5d8
Revert "ported changes from coketaste/amd-smi"
Boss2002n Oct 3, 2025
d5c3402
Resolved merging conflicts
coketaste Oct 20, 2025
e9202c2
Fixed the tools for distributed mode
coketaste Oct 21, 2025
b49ed4b
Fixed the cleanup
coketaste Oct 21, 2025
0ac1855
Merge pull request #52 from ROCm/coketaste/refactor-tools
coketaste Oct 21, 2025
15cbeaa
Fixed the table of resutls
coketaste Oct 21, 2025
026fec3
Fixed the GPU Product Name
coketaste Nov 27, 2025
9b7b347
Fixed the issue in selftest
coketaste Nov 27, 2025
eca075a
Enhanced unit tests and cleanup
coketaste Nov 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,22 @@ __pycache__/
# C extensions
*.so

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# IDE files
.vscode/
.idea/
*.swp
*.swo
*~

# Distribution / packaging
.Python
build/
Expand Down Expand Up @@ -36,7 +52,7 @@ MANIFEST
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
# Testing and coverage
htmlcov/
.tox/
.nox/
Expand All @@ -49,6 +65,23 @@ coverage.xml
*.py,cover
.hypothesis/
.pytest_cache/

# MADEngine specific
credential.json
data.json
*.log
*.csv
*.html
library_trace.csv
library_perf.csv
perf.csv
perf.html

# Temporary and build files
temp/
tmp/
*.tmp
.pytest_cache/
cover/

# Translations
Expand Down
36 changes: 36 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Pre-commit hooks configuration
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- id: check-toml
- id: check-added-large-files
- id: check-merge-conflict
- id: debug-statements

- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
language_version: python3

- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort

- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.3.0
hooks:
- id: mypy
additional_dependencies: [types-requests, types-PyYAML]
exclude: ^(tests/|scripts/)
68 changes: 68 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Changelog

All notable changes to MADEngine will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Comprehensive development tooling and configuration
- Pre-commit hooks for code quality
- Makefile for common development tasks
- Developer guide with coding standards
- Type checking with mypy
- Code formatting with black and isort
- Enhanced .gitignore for better file exclusions
- CI/CD configuration templates
- **Major Documentation Refactor**: Complete integration of distributed execution and CLI guides into README.md
- Professional open-source project structure with badges and table of contents
- Comprehensive MAD package integration documentation
- Enhanced model discovery and tag system documentation
- Modern deployment scenarios and configuration examples

### Changed
- Improved package initialization and imports
- Replaced print statements with proper logging in main CLI
- Enhanced error handling and logging throughout codebase
- Cleaned up setup.py for better maintainability
- Updated development dependencies in pyproject.toml
- **Complete README.md overhaul**: Merged all documentation into a single, comprehensive source
- Restructured documentation to emphasize MAD package integration
- Enhanced CLI usage examples and distributed execution workflows
- Improved developer contribution guidelines and legacy compatibility notes

### Fixed
- Removed Python cache files from repository
- Fixed import organization and structure
- Improved docstring formatting and consistency

### Removed
- Unnecessary debug print statements
- Python cache files and build artifacts
- **Legacy documentation files**: `docs/distributed-execution-solution.md` and `docs/madengine-cli-guide.md`
- Redundant documentation scattered across multiple files

## [Previous Versions]

For changes in previous versions, please refer to the git history.

---

## Guidelines for Changelog Updates

### Categories
- **Added** for new features
- **Changed** for changes in existing functionality
- **Deprecated** for soon-to-be removed features
- **Removed** for now removed features
- **Fixed** for any bug fixes
- **Security** for vulnerability fixes

### Format
- Keep entries brief but descriptive
- Include ticket/issue numbers when applicable
- Group related changes together
- Use present tense ("Add feature" not "Added feature")
- Target audience: users and developers of the project
Loading