Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ Keywords that suggest modify mode: update, fix, change, add [kernel] to [existin

### Stage 1: Analyze

Spawn a **Model Analyzer** agent (read [model-analyzer.md](model-analyzer.md)).
Follow the **Model Analyzer** workflow in [model-analyzer.md](model-analyzer.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.

The agent reads the HF `modeling_*.py` source and produces a **model profile** answering 12 architectural questions from [decision-matrix.md](decision-matrix.md).
This stage reads the HF `modeling_*.py` source and produces a **model profile** answering 12 architectural questions from [decision-matrix.md](decision-matrix.md).

**Human checkpoint:** Present the profile. Confirm before proceeding.

### Stage 2: Generate

Spawn a **Code Generator** agent (read [code-generator.md](code-generator.md)).
Follow the **Code Generator** workflow in [code-generator.md](code-generator.md).

Generates/modifies up to 13 files:

Expand All @@ -48,7 +48,7 @@ Generates/modifies up to 13 files:

### Stage 3: Validate

Spawn a **Validator** agent (read [validator.md](validator.md)).
Follow the **Validator** workflow in [validator.md](validator.md).

Runs instance patching test, convergence test, and lint check. Retries up to 3 times on failure.

Expand All @@ -69,13 +69,13 @@ Read the existing `apply_liger_kernel_to_{model_type}` function in `monkey_patch

### Stage 2: Apply Changes

Spawn the **Code Generator** agent (read [code-generator.md](code-generator.md)) in **modify mode**.
Follow the **Code Generator** workflow in [code-generator.md](code-generator.md) in **modify mode**.

**Human checkpoint:** Present changes for review.

### Stage 3: Validate

Spawn the **Validator** agent (read [validator.md](validator.md)). This stage is **mandatory** — do not skip it. At minimum, run:
Follow the **Validator** workflow in [validator.md](validator.md). This stage is **mandatory** — do not skip it. At minimum, run:

1. Instance patching test: `pytest test/transformers/test_monkey_patch.py -k "{model_type}" -xvs`
2. All convergence tests for the model:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ Develops Triton kernels for Liger Kernel through a 3-stage pipeline with human r

### Stage 1: Analyze

Spawn an **Analyzer** agent (read [analyzer.md](analyzer.md)).
Follow the **Analyzer** workflow in [analyzer.md](analyzer.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.

Accepts any input: local file, URL, code snippet, natural language description, or model component reference. Produces a standalone PyTorch reference implementation and a kernel profile.

**Human checkpoint:** Present PyTorch reference + kernel profile. Confirm before proceeding.

### Stage 2: Generate

Spawn a **Generator** agent (read [generator.md](generator.md)).
Follow the **Generator** workflow in [generator.md](generator.md).

Generates/modifies up to 8 files:

Expand All @@ -41,7 +41,7 @@ Generates/modifies up to 8 files:

### Stage 3: Validate

Spawn a **Validator** agent (read [validator.md](validator.md)).
Follow the **Validator** workflow in [validator.md](validator.md).

Runs checkstyle, unit tests (hard gate — stops on persistent failure), benchmarks, and generates plots. Optionally runs ncu profiling.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ If any validation fails, report clearly and stop.

### Stage 1: Profile

Spawn a **Profiler** agent (read [profiler.md](profiler.md)).
Follow the **Profiler** workflow in [profiler.md](profiler.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.

The agent:
This stage:
1. Creates the workspace directory `optimization/{kernel}/`
2. Copies the original kernel as a snapshot
3. Runs baseline benchmarks using the existing benchmark script
Expand All @@ -59,9 +59,9 @@ The agent:

### Stage 2: Optimize

Spawn an **Optimizer** agent (read [optimizer.md](optimizer.md)).
Follow the **Optimizer** workflow in [optimizer.md](optimizer.md).

The agent runs an autonomous optimization loop:
This stage runs an autonomous optimization loop:

1. Read the optimization profile and original kernel
2. **Always try parameter tuning first** (BLOCK_SIZE, num_warps, num_stages manual sweep -- NOT @triton.autotune)
Expand All @@ -81,9 +81,9 @@ The agent runs an autonomous optimization loop:

### Stage 3: Finalize

Spawn a **Finalizer** agent (read [finalizer.md](finalizer.md)).
Follow the **Finalizer** workflow in [finalizer.md](finalizer.md).

The agent:
This stage:
1. Applies the winning variant in-place to `src/liger_kernel/ops/{kernel}.py`
2. Runs the full test suite: `python -m pytest test/transformers/test_{kernel}.py -xvs` (hard gate)
3. Runs checkstyle: `make checkstyle` (auto-fix with `ruff check . --fix && ruff format .`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -276,12 +276,12 @@ Benchmark results ({gpu_name}):
- Speed (backward): {delta}% {faster/slower}
- Speed (full): {delta}% {faster/slower}
- Memory: {delta}% {reduction/increase}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

If the host AI runtime has its own commit-attribution convention (e.g., a `Co-Authored-By` trailer), append it per that runtime's guidelines.

#### Step 7c: Push and Create PR

```bash
Expand Down Expand Up @@ -351,12 +351,12 @@ Tested on {gpu_name}. Values are median ms (speed) or MB (memory).
- [x] All existing unit tests pass
- [x] Benchmarks show improvement across all input sizes
- [x] No regression on non-target metrics (speed/memory balance maintained)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
```

If the host AI runtime appends a "Generated with" footer (e.g., Claude Code, Cursor, Copilot), include it per that runtime's guidelines.

**Important**: Do NOT include plots as image attachments in the PR. Plots are for local review only and live in the optimization workspace.

### Step 8: Present the Before/After Summary
Expand Down
1 change: 1 addition & 0 deletions .claude/skills
33 changes: 33 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# AGENTS.md

Guidance for AI coding assistants working in this repository.

## Skills

Repository-specific workflow guides ("skills") live in [`.agents/skills/`](.agents/skills/). Each subdirectory is a self-contained guide for a multi-stage workflow. Read the `SKILL.md` at the top of each subdirectory for an overview; the other files in that directory are referenced from `SKILL.md` and should be read on demand.

| Skill | What it does |
|-------|--------------|
| [`liger-kernel-dev`](.agents/skills/liger-kernel-dev/SKILL.md) | Develops new Triton kernels from a PyTorch reference (or modifies existing kernels). 3-stage pipeline: Analyze → Generate → Validate. NVIDIA GPUs only. |
| [`liger-autopatch`](.agents/skills/liger-autopatch/SKILL.md) | Adds Liger Kernel support for a new HuggingFace Transformers model, or modifies an existing monkey-patch. 3-stage pipeline: Analyze → Generate → Validate. |
| [`liger-kernel-perf`](.agents/skills/liger-kernel-perf/SKILL.md) | Optimizes the performance of an existing Liger Triton kernel. 3-stage pipeline: Profile → Optimize → Finalize. NVIDIA GPUs only. |

The skills are written to be runtime-agnostic — they describe the workflow as a sequence of stages a competent agent (or human) can follow. Where a stage says "Follow the X workflow in `x.md`", that's a directive to read and execute that file's instructions; runtimes that support parallel subagents may delegate the stage, but it is not required.

## Vendor-specific shortcuts

For convenience, some assistants auto-discover skills from vendor-specific paths. These point at the canonical `.agents/skills/` directory:

- `.claude/skills` → symlink → `.agents/skills` (for Claude Code)

If you're adding support for another assistant, add a symlink (or your tool's preferred adapter) pointing to `.agents/skills/`. Do not duplicate the content.

## Repo conventions

- Source layout: `src/liger_kernel/{ops,transformers}/` for Triton ops and `nn.Module` / HF wrappers respectively
- Tests: `test/transformers/` (unit) and `test/convergence/{bf16,fp32}/` (model convergence)
- Benchmarks: `benchmark/scripts/` (scripts) and `benchmark/data/all_benchmark_data.csv` (results)
- Lint/format: `make checkstyle` (uses `ruff`)
- Install dev mode: `pip install -e ".[dev]"`

See `README.md` for the project overview and contribution guide.