ROCm-Agent

Two-tool optimization pipeline for AMD ROCm GPUs: Analyzer (profiling + bottleneck detection) and Optimizer (AITER-first kernel replacement + Triton custom kernels + fusion).

Architecture

User Model (PyTorch / vLLM)
         │
    ┌────▼────┐
    │ Analyzer │  env setup → baseline → profiling → roofline analysis
    └────┬────┘
         │  bottlenecks.json, analysis_summary.json
    ┌────▼─────┐
    │ Optimizer │  AITER match → fusion detect → Triton fallback → integrate
    └──────────┘

Analyzer

Environment setup (Docker rocm/vllm-dev or venv)
Baseline benchmarking (vllm bench serve or torch.profiler)
Kernel trace collection (--enforce-eager + torch_profiler_record_shapes)
TraceLens roofline analysis (TFLOPS/s, TB/s, arithmetic intensity, bound type)
Bottleneck ranking with optimization strategy tags

Optimizer

Goal 1 — Kernel optimization: Target high time-proportion and low roofline-efficiency kernels
Goal 2 — Kernel fusion: Detect and apply operator fusion patterns
AITER-first: Check ROCm/aiter for pre-optimized AMD kernels before writing custom code
Triton fallback: @triton.jit with @triton.autotune for remaining targets
Integration: vLLM CustomOp.register_oot() or VLLM_ROCM_USE_AITER env vars

Quick Start

# 1. Analyze a vLLM model
bash scripts/analyze.sh --model Qwen/Qwen3-8B --output /tmp/rocm_agent_qwen

# 2. Optimize based on analysis results
bash scripts/optimize.sh --project /tmp/rocm_agent_qwen

# 3. Or run the full pipeline
bash scripts/run_pipeline.sh --model Qwen/Qwen3-8B --output /tmp/rocm_agent_qwen

Project Structure

ROCm-Agent/
├── analyzer/              # Tool 1: profiling + bottleneck analysis
│   ├── SKILL.md           # Agent instructions
│   ├── env_setup.py       # Docker/venv environment setup
│   ├── baseline_runner.py # vLLM or PyTorch baseline benchmarks
│   ├── profiler.py        # Trace collection
│   ├── trace_analyzer.py  # TraceLens integration + roofline
│   ├── bottleneck_ranker.py # Bottleneck ranking + strategy tags
│   └── platform_specs.py  # AMD GPU peak specs
│
├── optimizer/             # Tool 2: AITER + Triton optimization
│   ├── SKILL.md           # Agent instructions
│   ├── aiter_matcher.py   # Map bottlenecks to AITER operators
│   ├── fusion_analyzer.py # Detect fusion opportunities
│   ├── triton_optimizer.py# Custom Triton kernel generation
│   ├── kernel_tester.py   # Correctness + benchmark
│   └── integrator.py      # vLLM CustomOp / AITER env vars
│
├── aiter_catalog/         # AITER operator knowledge base
│   ├── operator_map.json  # Bottleneck → AITER mapping
│   └── templates/         # Ready-to-use integration code
│
├── agent_workdir/         # CUDA-Agent style workspace template
│   ├── SKILL.md           # ROCm-adapted agent instructions
│   ├── model.py           # Original model template
│   ├── model_new.py       # Optimized model template
│   └── kernels/           # Custom Triton kernels
│
├── utils/                 # Shared utilities
│   ├── compile.py         # Triton compilation helpers
│   ├── verification.py    # Output correctness checks
│   ├── profiling.py       # Performance comparison
│   └── gpu_info.py        # AMD GPU detection
│
├── scripts/               # CLI entry points
│   ├── analyze.sh
│   ├── optimize.sh
│   └── run_pipeline.sh
│
└── examples/              # Working examples
    ├── rmsnorm/
    └── vllm_qwen/

Supported Hardware

GPU	Memory BW	BF16 TFLOPS	FP8 TFLOPS
MI300X	5.3 TB/s	708	1273
MI325X	6.0 TB/s	843	1519
MI355X	8.0 TB/s	1686	3567

Dependencies

Python 3.10+
PyTorch 2.x with ROCm support
vLLM (ROCm build)
AITER (AI Tensor Engine for ROCm) (auto-cloned if not found)
TraceLens (auto-cloned if not found)
Triton (included with PyTorch ROCm)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent_workdir		agent_workdir
aiter_catalog		aiter_catalog
analyzer		analyzer
examples		examples
optimizer		optimizer
scripts		scripts
utils		utils
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROCm-Agent

Architecture

Analyzer

Optimizer

Quick Start

Project Structure

Supported Hardware

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ROCm-Agent

Architecture

Analyzer

Optimizer

Quick Start

Project Structure

Supported Hardware

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages