Add docs for AI decoder training with PyTorch #344

wsttiger · 2025-11-07T01:00:17Z

Add TensorRT Decoder Training Tutorial

Overview

This PR introduces comprehensive documentation for training and deploying neural network-based quantum error correction decoders using the TensorRT decoder plugin (trt_decoder), which is being released with CUDA-Q QEC v0.5.0.

Motivation

With the release of the TensorRT decoder, users need clear guidance on:

How to generate training data for QEC decoding tasks
How to train custom neural network decoders
How to export models to ONNX format
How to deploy trained models with TensorRT for accelerated GPU inference

This tutorial provides an end-to-end workflow demonstrating the complete pipeline from data generation to production deployment.

Changes Made

New Documentation

training_ai_decoders.rst: Comprehensive tutorial covering:
- Training neural network decoders with PyTorch and Stim
- Surface code circuit generation and data sampling
- MLP architecture design and training workflow
- ONNX model export
- TensorRT deployment with Python and C++ examples
- Converting ONNX models to TensorRT engines using trtexec
- Performance tuning and best practices

Training Script

train_mlp_decoder.py: Complete working example demonstrating:
- Stim-based synthetic data generation for surface codes
- PyTorch MLP decoder implementation
- Training loop with validation and early stopping
- ONNX export for TensorRT deployment
- Moved from unittest directory to docs/sphinx/examples/qec/python/

Documentation Updates

examples.rst: Added new tutorial to QEC examples table of contents
decoders.rst: Minor formatting cleanup

Tutorial Features

✅ Complete end-to-end workflow from training to deployment
✅ Both Python and C++ usage examples
✅ Multiple precision modes (fp16, fp32, bf16, int8, fp8, tf32, best)
✅ Production deployment guidance with pre-built TensorRT engines
✅ Best practices and performance tuning tips
✅ Clear dependency requirements

Target Audience

Users training custom neural network decoders for QEC
Researchers experimenting with ML-based decoding approaches
Developers deploying production QEC systems with TensorRT acceleration

Testing

The training script has been tested and successfully:

Generates training data using Stim
Trains an MLP decoder on surface code syndromes
Exports models to ONNX format compatible with TensorRT
Achieves convergence with validation accuracy monitoring

Related Components:

TensorRT decoder plugin (PR Add trt decoder #307)
CUDA-Q QEC library decoder interface

- Add comprehensive tutorial for training neural network decoders - Demonstrate PyTorch/Stim workflow for surface code decoding - Include ONNX export and TensorRT deployment examples - Move training script to examples directory Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

docs/sphinx/examples/qec/python/train_mlp_decoder.py

docs/sphinx/examples_rst/qec/training_ai_decoders.rst

Consolidate AI decoder training documentation into decoders.rst and enhance PyTorch installation guidance: - Merge training_ai_decoders.rst content into decoders.rst for better organization and discoverability - Update terminology from "Neural Network" to "AI" decoders throughout - Add "Optional Dependencies" section in installation guide with: - PyTorch requirements for Tensor Network Decoder, Generative Quantum Eigensolver (GQE), and AI decoder training - Link to PyTorch installation page - CUDA 12.8+ requirement note - Add cross-reference links between installation guide and AI decoder deployment section - Clarify that AI decoders don't use parity check matrix (placeholder only) in Python and C++ code examples - Remove standalone training_ai_decoders.rst file - Update examples.rst table of contents These changes improve documentation clarity and help users understand PyTorch dependencies and AI decoder workflows. Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 · 2025-11-13T00:41:46Z

@wsttiger regarding the CI failures, what is the plan? Do the CI runners need an extra package installed? If so, can you add the package to the runners as part of this PR?

docs/sphinx/examples_rst/qec/decoders.rst

docs/sphinx/quickstart/installation.rst

Add comprehensive tutorial for training and deploying AI decoders with TensorRT, along with improved PyTorch installation guidance: Documentation changes: - Add train_mlp_decoder.py example showing complete workflow for training neural network decoders using PyTorch and Stim - Update decoders.rst with AI decoder deployment documentation and TensorRT decoder usage examples - Add "Optional Dependencies" section in installation guide with: - PyTorch requirements for Tensor Network Decoder (CPU only), Generative Quantum Eigensolver (GQE), and AI decoder training - Link to PyTorch installation page - Note for Blackwell architecture users requiring CUDA 12.8+ - Clarify that AI decoders don't use parity check matrix (placeholder only) in Python and C++ code examples - Add cross-reference links between installation guide and AI decoder section CI/CD changes: - Update GitHub Actions workflows to support new documentation examples - Modify all_libs.yaml, all_libs_release.yaml, lib_qec.yaml, and lib_solvers.yaml for proper build integration These changes provide users with a complete guide for training custom AI decoders and deploying them with optimized TensorRT inference. Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-11-14T02:04:44Z

/ok to test 3e3ecc1

Signed-off-by: Scott Thornton <[email protected]>

…nto train_trt_decoder_docs Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

…ion page Signed-off-by: Scott Thornton <[email protected]>

docs/sphinx/examples_rst/qec/decoders.rst

Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 · 2025-11-18T02:26:38Z

Approved, but let's hold off on merging until the release is done.

wsttiger added 2 commits November 7, 2025 00:51

Formatting

0bffff3

Signed-off-by: Scott Thornton <[email protected]>

wsttiger requested review from bmhowe23, cketcham2333, justinlietz, kvmto and melody-ren November 7, 2025 01:00

melody-ren reviewed Nov 7, 2025

View reviewed changes

docs/sphinx/examples/qec/python/train_mlp_decoder.py Show resolved Hide resolved

melody-ren reviewed Nov 7, 2025

View reviewed changes

docs/sphinx/examples_rst/qec/training_ai_decoders.rst Outdated Show resolved Hide resolved

bmhowe23 added the documentation Improvements or additions to documentation label Nov 7, 2025

bmhowe23 reviewed Nov 13, 2025

View reviewed changes

docs/sphinx/examples_rst/qec/decoders.rst Outdated Show resolved Hide resolved

bmhowe23 reviewed Nov 13, 2025

View reviewed changes

docs/sphinx/quickstart/installation.rst Outdated Show resolved Hide resolved

melody-ren reviewed Nov 13, 2025

View reviewed changes

docs/sphinx/quickstart/installation.rst Outdated Show resolved Hide resolved

wsttiger and others added 2 commits November 13, 2025 22:58

Merge branch 'main' into train_trt_decoder_docs

3ffc00a

Signed-off-by: Scott Thornton <[email protected]>

wsttiger added 4 commits November 14, 2025 04:21

Disabled TensorRT example for ARM

52843a0

Signed-off-by: Scott Thornton <[email protected]>

Merge branch 'train_trt_decoder_docs' of github.com:wsttiger/cudaqx i…

fdb301f

…nto train_trt_decoder_docs Signed-off-by: Scott Thornton <[email protected]>

forgot to import sys

c5a1b46

Signed-off-by: Scott Thornton <[email protected]>

docs: Addressing PR comments about Optional Dependencies on Installat…

53b863a

…ion page Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 requested changes Nov 17, 2025

View reviewed changes

docs/sphinx/examples_rst/qec/decoders.rst Outdated Show resolved Hide resolved

wsttiger and others added 2 commits November 18, 2025 00:47

Fixed up C++ example for trt_decoder

7276aa4

Signed-off-by: Scott Thornton <[email protected]>

Merge branch 'main' into train_trt_decoder_docs

f4ea4d1

bmhowe23 approved these changes Nov 18, 2025

View reviewed changes

Merge branch 'main' into train_trt_decoder_docs

5f2e4b1

bmhowe23 enabled auto-merge (squash) November 19, 2025 00:54

Merge branch 'main' into train_trt_decoder_docs

c34e6ae

bmhowe23 merged commit 3079c21 into NVIDIA:main Nov 19, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add docs for AI decoder training with PyTorch #344

Add docs for AI decoder training with PyTorch #344

Uh oh!

wsttiger commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

bmhowe23 commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wsttiger commented Nov 14, 2025

Uh oh!

Uh oh!

bmhowe23 commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add docs for AI decoder training with PyTorch #344

Add docs for AI decoder training with PyTorch #344

Uh oh!

Conversation

wsttiger commented Nov 7, 2025

Add TensorRT Decoder Training Tutorial

Overview

Motivation

Changes Made

New Documentation

Training Script

Documentation Updates

Tutorial Features

Target Audience

Testing

Uh oh!

Uh oh!

Uh oh!

bmhowe23 commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wsttiger commented Nov 14, 2025

Uh oh!

Uh oh!

bmhowe23 commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants