What this is
The proof/ directory holds empirical evidence that skill files produce measurably better kernel code — same model, same prompt, with and without the skill file injected into context.
There is currently one entry:
proof/cuda/softmax/ — validates write-cuda-softmax-kernel (RTX 4070, Claude Sonnet 4.6, 2 bug classes caught)
Every other skill in this repo has no proof entry yet.
What we are looking for
Run a before-vs-after benchmark for any skill in skills/. The bar is intentionally low — a screenshot, a correctness table, or a chart is enough.
Skills that most need proof entries
| Skill |
Category |
Priority |
write-cuda-reduction-kernel |
cuda |
high |
write-cuda-gemm-kernel |
cuda |
high |
write-cuda-layernorm-kernel |
cuda |
high |
write-triton-softmax-kernel |
triton |
high |
write-triton-attention-kernel |
triton |
high |
write-int8-quantized-kernel |
quantization |
high |
write-fp8-kernel |
quantization |
high |
avoid-warp-divergence |
cuda |
medium |
write-numerically-stable-kernel |
patterns |
medium |
handle-boundary-conditions |
patterns |
medium |
port-cuda-kernel-to-triton |
portability |
medium |
How to contribute a proof
- Pick a skill from the table above (comment below to claim it so no one duplicates effort).
- Generate a kernel without the skill file using any capable coding model.
- Generate the same kernel with the skill file injected into context. Same model, same base prompt.
- Run both. Compare correctness and/or performance.
- Create
proof/<category>/<kernel-name>/ and drop in your artifacts.
- Open a PR.
Full instructions: proof/README.md
Minimum bar
- Same model, same base prompt — only the skill file differs between the two runs.
- At least one correctness check (not just speed numbers).
- Hardware model + shapes tested noted somewhere.
A chart is nice but not required. Raw numbers in a table are fine. A screenshot works.
What a strong entry looks like
See the existing softmax proof as a reference:
You do not need to match that level of polish for a first entry. Correctness and reproducibility matter more than visual quality.
What this is
The
proof/directory holds empirical evidence that skill files produce measurably better kernel code — same model, same prompt, with and without the skill file injected into context.There is currently one entry:
proof/cuda/softmax/— validateswrite-cuda-softmax-kernel(RTX 4070, Claude Sonnet 4.6, 2 bug classes caught)Every other skill in this repo has no proof entry yet.
What we are looking for
Run a before-vs-after benchmark for any skill in
skills/. The bar is intentionally low — a screenshot, a correctness table, or a chart is enough.Skills that most need proof entries
write-cuda-reduction-kernelwrite-cuda-gemm-kernelwrite-cuda-layernorm-kernelwrite-triton-softmax-kernelwrite-triton-attention-kernelwrite-int8-quantized-kernelwrite-fp8-kernelavoid-warp-divergencewrite-numerically-stable-kernelhandle-boundary-conditionsport-cuda-kernel-to-tritonHow to contribute a proof
proof/<category>/<kernel-name>/and drop in your artifacts.Full instructions: proof/README.md
Minimum bar
A chart is nice but not required. Raw numbers in a table are fine. A screenshot works.
What a strong entry looks like
See the existing softmax proof as a reference:
You do not need to match that level of polish for a first entry. Correctness and reproducibility matter more than visual quality.