linkedin · vaibhavjindal · May 15, 2026 · May 14, 2026
diff --git a/.claude/skills/liger-autopatch/SKILL.md → .agents/skills/liger-autopatch/SKILL.md b/.claude/skills/liger-autopatch/SKILL.md → .agents/skills/liger-autopatch/SKILL.md
@@ -18,15 +18,15 @@ Keywords that suggest modify mode: update, fix, change, add [kernel] to [existin
 
 ### Stage 1: Analyze
 
-Spawn a **Model Analyzer** agent (read [model-analyzer.md](model-analyzer.md)).
+Follow the **Model Analyzer** workflow in [model-analyzer.md](model-analyzer.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.
 
-The agent reads the HF `modeling_*.py` source and produces a **model profile** answering 12 architectural questions from [decision-matrix.md](decision-matrix.md).
+This stage reads the HF `modeling_*.py` source and produces a **model profile** answering 12 architectural questions from [decision-matrix.md](decision-matrix.md).
 
 **Human checkpoint:** Present the profile. Confirm before proceeding.
 
 ### Stage 2: Generate
 
-Spawn a **Code Generator** agent (read [code-generator.md](code-generator.md)).
+Follow the **Code Generator** workflow in [code-generator.md](code-generator.md).
 
 Generates/modifies up to 13 files:
 
@@ -48,7 +48,7 @@ Generates/modifies up to 13 files:
 
 ### Stage 3: Validate
 
-Spawn a **Validator** agent (read [validator.md](validator.md)).
+Follow the **Validator** workflow in [validator.md](validator.md).
 
 Runs instance patching test, convergence test, and lint check. Retries up to 3 times on failure.
 
@@ -69,13 +69,13 @@ Read the existing `apply_liger_kernel_to_{model_type}` function in `monkey_patch
 
 ### Stage 2: Apply Changes
 
-Spawn the **Code Generator** agent (read [code-generator.md](code-generator.md)) in **modify mode**.
+Follow the **Code Generator** workflow in [code-generator.md](code-generator.md) in **modify mode**.
 
 **Human checkpoint:** Present changes for review.
 
 ### Stage 3: Validate
 
-Spawn the **Validator** agent (read [validator.md](validator.md)). This stage is **mandatory** — do not skip it. At minimum, run:
+Follow the **Validator** workflow in [validator.md](validator.md). This stage is **mandatory** — do not skip it. At minimum, run:
 
 1. Instance patching test: `pytest test/transformers/test_monkey_patch.py -k "{model_type}" -xvs`
 2. All convergence tests for the model:

diff --git a/.../skills/liger-autopatch/code-generator.md → .../skills/liger-autopatch/code-generator.md b/.../skills/liger-autopatch/code-generator.md → .../skills/liger-autopatch/code-generator.md
diff --git a/...skills/liger-autopatch/decision-matrix.md → ...skills/liger-autopatch/decision-matrix.md b/...skills/liger-autopatch/decision-matrix.md → ...skills/liger-autopatch/decision-matrix.md
diff --git a/...liger-autopatch/examples/gemma-profile.md → ...liger-autopatch/examples/gemma-profile.md b/...liger-autopatch/examples/gemma-profile.md → ...liger-autopatch/examples/gemma-profile.md
diff --git a/...liger-autopatch/examples/llama-profile.md → ...liger-autopatch/examples/llama-profile.md b/...liger-autopatch/examples/llama-profile.md → ...liger-autopatch/examples/llama-profile.md
diff --git a/.../skills/liger-autopatch/model-analyzer.md → .../skills/liger-autopatch/model-analyzer.md b/.../skills/liger-autopatch/model-analyzer.md → .../skills/liger-autopatch/model-analyzer.md
diff --git a/...-autopatch/templates/lce-forward-dense.md → ...-autopatch/templates/lce-forward-dense.md b/...-autopatch/templates/lce-forward-dense.md → ...-autopatch/templates/lce-forward-dense.md
diff --git a/...er-autopatch/templates/lce-forward-moe.md → ...er-autopatch/templates/lce-forward-moe.md b/...er-autopatch/templates/lce-forward-moe.md → ...er-autopatch/templates/lce-forward-moe.md
diff --git a/...er-autopatch/templates/monkey-patch-fn.md → ...er-autopatch/templates/monkey-patch-fn.md b/...er-autopatch/templates/monkey-patch-fn.md → ...er-autopatch/templates/monkey-patch-fn.md
diff --git a/...r-autopatch/templates/test-convergence.md → ...r-autopatch/templates/test-convergence.md b/...r-autopatch/templates/test-convergence.md → ...r-autopatch/templates/test-convergence.md
diff --git a/...utopatch/templates/test-instance-patch.md → ...utopatch/templates/test-instance-patch.md b/...utopatch/templates/test-instance-patch.md → ...utopatch/templates/test-instance-patch.md
diff --git a/.claude/skills/liger-autopatch/validator.md → .agents/skills/liger-autopatch/validator.md b/.claude/skills/liger-autopatch/validator.md → .agents/skills/liger-autopatch/validator.md
diff --git a/.claude/skills/liger-kernel-dev/SKILL.md → .agents/skills/liger-kernel-dev/SKILL.md b/.claude/skills/liger-kernel-dev/SKILL.md → .agents/skills/liger-kernel-dev/SKILL.md
@@ -16,15 +16,15 @@ Develops Triton kernels for Liger Kernel through a 3-stage pipeline with human r
 
 ### Stage 1: Analyze
 
-Spawn an **Analyzer** agent (read [analyzer.md](analyzer.md)).
+Follow the **Analyzer** workflow in [analyzer.md](analyzer.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.
 
 Accepts any input: local file, URL, code snippet, natural language description, or model component reference. Produces a standalone PyTorch reference implementation and a kernel profile.
 
 **Human checkpoint:** Present PyTorch reference + kernel profile. Confirm before proceeding.
 
 ### Stage 2: Generate
 
-Spawn a **Generator** agent (read [generator.md](generator.md)).
+Follow the **Generator** workflow in [generator.md](generator.md).
 
 Generates/modifies up to 8 files:
 
@@ -41,7 +41,7 @@ Generates/modifies up to 8 files:
 
 ### Stage 3: Validate
 
-Spawn a **Validator** agent (read [validator.md](validator.md)).
+Follow the **Validator** workflow in [validator.md](validator.md).
 
 Runs checkstyle, unit tests (hard gate — stops on persistent failure), benchmarks, and generates plots. Optionally runs ncu profiling.
 

diff --git a/.claude/skills/liger-kernel-dev/analyzer.md → .agents/skills/liger-kernel-dev/analyzer.md b/.claude/skills/liger-kernel-dev/analyzer.md → .agents/skills/liger-kernel-dev/analyzer.md
diff --git a/...nel-dev/examples/cross-entropy-profile.md → ...nel-dev/examples/cross-entropy-profile.md b/...nel-dev/examples/cross-entropy-profile.md → ...nel-dev/examples/cross-entropy-profile.md
diff --git a/...r-kernel-dev/examples/rms-norm-profile.md → ...r-kernel-dev/examples/rms-norm-profile.md b/...r-kernel-dev/examples/rms-norm-profile.md → ...r-kernel-dev/examples/rms-norm-profile.md
diff --git a/...ger-kernel-dev/examples/swiglu-profile.md → ...ger-kernel-dev/examples/swiglu-profile.md b/...ger-kernel-dev/examples/swiglu-profile.md → ...ger-kernel-dev/examples/swiglu-profile.md
diff --git a/.claude/skills/liger-kernel-dev/generator.md → .agents/skills/liger-kernel-dev/generator.md b/.claude/skills/liger-kernel-dev/generator.md → .agents/skills/liger-kernel-dev/generator.md
diff --git a/...liger-kernel-dev/kernel-profile-format.md → ...liger-kernel-dev/kernel-profile-format.md b/...liger-kernel-dev/kernel-profile-format.md → ...liger-kernel-dev/kernel-profile-format.md
diff --git a/...s/liger-kernel-dev/templates/benchmark.md → ...s/liger-kernel-dev/templates/benchmark.md b/...s/liger-kernel-dev/templates/benchmark.md → ...s/liger-kernel-dev/templates/benchmark.md
diff --git a/...er-kernel-dev/templates/functional-api.md → ...er-kernel-dev/templates/functional-api.md b/...er-kernel-dev/templates/functional-api.md → ...er-kernel-dev/templates/functional-api.md
diff --git a/...er-kernel-dev/templates/module-wrapper.md → ...er-kernel-dev/templates/module-wrapper.md b/...er-kernel-dev/templates/module-wrapper.md → ...er-kernel-dev/templates/module-wrapper.md
diff --git a/.../liger-kernel-dev/templates/ops-kernel.md → .../liger-kernel-dev/templates/ops-kernel.md b/.../liger-kernel-dev/templates/ops-kernel.md → .../liger-kernel-dev/templates/ops-kernel.md
diff --git a/...s/liger-kernel-dev/templates/unit-test.md → ...s/liger-kernel-dev/templates/unit-test.md b/...s/liger-kernel-dev/templates/unit-test.md → ...s/liger-kernel-dev/templates/unit-test.md
diff --git a/.claude/skills/liger-kernel-dev/validator.md → .agents/skills/liger-kernel-dev/validator.md b/.claude/skills/liger-kernel-dev/validator.md → .agents/skills/liger-kernel-dev/validator.md
diff --git a/.claude/skills/liger-kernel-perf/SKILL.md → .agents/skills/liger-kernel-perf/SKILL.md b/.claude/skills/liger-kernel-perf/SKILL.md → .agents/skills/liger-kernel-perf/SKILL.md
@@ -42,9 +42,9 @@ If any validation fails, report clearly and stop.
 
 ### Stage 1: Profile
 
-Spawn a **Profiler** agent (read [profiler.md](profiler.md)).
+Follow the **Profiler** workflow in [profiler.md](profiler.md). If the host runtime supports parallel subagents, this stage may be delegated to one; otherwise execute the workflow directly.
 
-The agent:
+This stage:
 1. Creates the workspace directory `optimization/{kernel}/`
 2. Copies the original kernel as a snapshot
 3. Runs baseline benchmarks using the existing benchmark script
@@ -59,9 +59,9 @@ The agent:
 
 ### Stage 2: Optimize
 
-Spawn an **Optimizer** agent (read [optimizer.md](optimizer.md)).
+Follow the **Optimizer** workflow in [optimizer.md](optimizer.md).
 
-The agent runs an autonomous optimization loop:
+This stage runs an autonomous optimization loop:
 
 1. Read the optimization profile and original kernel
 2. **Always try parameter tuning first** (BLOCK_SIZE, num_warps, num_stages manual sweep -- NOT @triton.autotune)
@@ -81,9 +81,9 @@ The agent runs an autonomous optimization loop:
 
 ### Stage 3: Finalize
 
-Spawn a **Finalizer** agent (read [finalizer.md](finalizer.md)).
+Follow the **Finalizer** workflow in [finalizer.md](finalizer.md).
 
-The agent:
+This stage:
 1. Applies the winning variant in-place to `src/liger_kernel/ops/{kernel}.py`
 2. Runs the full test suite: `python -m pytest test/transformers/test_{kernel}.py -xvs` (hard gate)
 3. Runs checkstyle: `make checkstyle` (auto-fix with `ruff check . --fix && ruff format .`)

diff --git a/...ude/skills/liger-kernel-perf/finalizer.md → ...nts/skills/liger-kernel-perf/finalizer.md b/...ude/skills/liger-kernel-perf/finalizer.md → ...nts/skills/liger-kernel-perf/finalizer.md
@@ -276,12 +276,12 @@ Benchmark results ({gpu_name}):
 - Speed (backward): {delta}% {faster/slower}
 - Speed (full): {delta}% {faster/slower}
 - Memory: {delta}% {reduction/increase}
-
-Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
 EOF
 )"
 ```
 
+If the host AI runtime has its own commit-attribution convention (e.g., a `Co-Authored-By` trailer), append it per that runtime's guidelines.
+
 #### Step 7c: Push and Create PR
 
 ```bash
@@ -351,12 +351,12 @@ Tested on {gpu_name}. Values are median ms (speed) or MB (memory).
 - [x] All existing unit tests pass
 - [x] Benchmarks show improvement across all input sizes
 - [x] No regression on non-target metrics (speed/memory balance maintained)
-
-🤖 Generated with [Claude Code](https://claude.com/claude-code)
 EOF
 )"
 ```
 
+If the host AI runtime appends a "Generated with" footer (e.g., Claude Code, Cursor, Copilot), include it per that runtime's guidelines.
+
 **Important**: Do NOT include plots as image attachments in the PR. Plots are for local review only and live in the optimization workspace.
 
 ### Step 8: Present the Before/After Summary

diff --git a/...er-kernel-perf/optimization-strategies.md → ...er-kernel-perf/optimization-strategies.md b/...er-kernel-perf/optimization-strategies.md → ...er-kernel-perf/optimization-strategies.md
diff --git a/...ude/skills/liger-kernel-perf/optimizer.md → ...nts/skills/liger-kernel-perf/optimizer.md b/...ude/skills/liger-kernel-perf/optimizer.md → ...nts/skills/liger-kernel-perf/optimizer.md
diff --git a/.claude/skills/liger-kernel-perf/profiler.md → .agents/skills/liger-kernel-perf/profiler.md b/.claude/skills/liger-kernel-perf/profiler.md → .agents/skills/liger-kernel-perf/profiler.md
diff --git a/...el-perf/templates/optimization-profile.md → ...el-perf/templates/optimization-profile.md b/...el-perf/templates/optimization-profile.md → ...el-perf/templates/optimization-profile.md
diff --git a/...er-kernel-perf/templates/variant-notes.md → ...er-kernel-perf/templates/variant-notes.md b/...er-kernel-perf/templates/variant-notes.md → ...er-kernel-perf/templates/variant-notes.md
diff --git a/.claude/skills b/.claude/skills
@@ -0,0 +1 @@
+../.agents/skills
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,33 @@
+# AGENTS.md
+
+Guidance for AI coding assistants working in this repository.
+
+## Skills
+
+Repository-specific workflow guides ("skills") live in [`.agents/skills/`](.agents/skills/). Each subdirectory is a self-contained guide for a multi-stage workflow. Read the `SKILL.md` at the top of each subdirectory for an overview; the other files in that directory are referenced from `SKILL.md` and should be read on demand.
+
+| Skill | What it does |
+|-------|--------------|
+| [`liger-kernel-dev`](.agents/skills/liger-kernel-dev/SKILL.md) | Develops new Triton kernels from a PyTorch reference (or modifies existing kernels). 3-stage pipeline: Analyze → Generate → Validate. NVIDIA GPUs only. |
+| [`liger-autopatch`](.agents/skills/liger-autopatch/SKILL.md) | Adds Liger Kernel support for a new HuggingFace Transformers model, or modifies an existing monkey-patch. 3-stage pipeline: Analyze → Generate → Validate. |
+| [`liger-kernel-perf`](.agents/skills/liger-kernel-perf/SKILL.md) | Optimizes the performance of an existing Liger Triton kernel. 3-stage pipeline: Profile → Optimize → Finalize. NVIDIA GPUs only. |
+
+The skills are written to be runtime-agnostic — they describe the workflow as a sequence of stages a competent agent (or human) can follow. Where a stage says "Follow the X workflow in `x.md`", that's a directive to read and execute that file's instructions; runtimes that support parallel subagents may delegate the stage, but it is not required.
+
+## Vendor-specific shortcuts
+
+For convenience, some assistants auto-discover skills from vendor-specific paths. These point at the canonical `.agents/skills/` directory:
+
+- `.claude/skills` → symlink → `.agents/skills` (for Claude Code)
+
+If you're adding support for another assistant, add a symlink (or your tool's preferred adapter) pointing to `.agents/skills/`. Do not duplicate the content.
+
+## Repo conventions
+
+- Source layout: `src/liger_kernel/{ops,transformers}/` for Triton ops and `nn.Module` / HF wrappers respectively
+- Tests: `test/transformers/` (unit) and `test/convergence/{bf16,fp32}/` (model convergence)
+- Benchmarks: `benchmark/scripts/` (scripts) and `benchmark/data/all_benchmark_data.csv` (results)
+- Lint/format: `make checkstyle` (uses `ruff`)
+- Install dev mode: `pip install -e ".[dev]"`
+
+See `README.md` for the project overview and contribution guide.