Skip to content

Help wanted: add before-vs-after proof entries for kernel skills #1

@KrxGu

Description

@KrxGu

What this is

The proof/ directory holds empirical evidence that skill files produce measurably better kernel code — same model, same prompt, with and without the skill file injected into context.

There is currently one entry:

  • proof/cuda/softmax/ — validates write-cuda-softmax-kernel (RTX 4070, Claude Sonnet 4.6, 2 bug classes caught)

Every other skill in this repo has no proof entry yet.


What we are looking for

Run a before-vs-after benchmark for any skill in skills/. The bar is intentionally low — a screenshot, a correctness table, or a chart is enough.

Skills that most need proof entries

Skill Category Priority
write-cuda-reduction-kernel cuda high
write-cuda-gemm-kernel cuda high
write-cuda-layernorm-kernel cuda high
write-triton-softmax-kernel triton high
write-triton-attention-kernel triton high
write-int8-quantized-kernel quantization high
write-fp8-kernel quantization high
avoid-warp-divergence cuda medium
write-numerically-stable-kernel patterns medium
handle-boundary-conditions patterns medium
port-cuda-kernel-to-triton portability medium

How to contribute a proof

  1. Pick a skill from the table above (comment below to claim it so no one duplicates effort).
  2. Generate a kernel without the skill file using any capable coding model.
  3. Generate the same kernel with the skill file injected into context. Same model, same base prompt.
  4. Run both. Compare correctness and/or performance.
  5. Create proof/<category>/<kernel-name>/ and drop in your artifacts.
  6. Open a PR.

Full instructions: proof/README.md


Minimum bar

  • Same model, same base prompt — only the skill file differs between the two runs.
  • At least one correctness check (not just speed numbers).
  • Hardware model + shapes tested noted somewhere.

A chart is nice but not required. Raw numbers in a table are fine. A screenshot works.


What a strong entry looks like

See the existing softmax proof as a reference:

You do not need to match that level of polish for a first entry. Correctness and reproducibility matter more than visual quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions