Merge branch 'main' into realtime_docs

bmhowe23 · bmhowe23 · commit 1c040d150104 · 2025-11-18T17:25:43.000-08:00
diff --git a/docs/sphinx/api/qec/cpp_api.rst b/docs/sphinx/api/qec/cpp_api.rst
@@ -51,6 +51,11 @@ NVIDIA QLDPC Decoder
 
 .. include:: nv_qldpc_decoder_api.rst
 
+Sliding Window Decoder
+----------------------
+
+.. include:: sliding_window_api.rst
+
 Real-Time Decoding
 ==================
 
diff --git a/docs/sphinx/api/qec/nv_qldpc_decoder_api.rst b/docs/sphinx/api/qec/nv_qldpc_decoder_api.rst
@@ -100,16 +100,53 @@
         - `clip_value` (float): Value to clip the BP messages to. Should be a
           non-negative value (defaults to 0.0, which disables clipping). Introduced in
           0.4.0.
-        - `bp_method` (int): The BP method to use. 0 for sum-product, 1 for min-sum.
-          Defaults to 0. Introduced in 0.4.0.
+        - `bp_method` (int): Core BP algorithm to use (defaults to 0). Introduced in 0.4.0,
+          expanded in 0.5.0:
+
+          - 0: sum-product
+          - 1: min-sum (introduced in 0.4.0)
+          - 2: min-sum+mem (uniform memory strength, introduced in 0.5.0)
+          - 3: min-sum+dmem (disordered memory strength, introduced in 0.5.0)
+        - `composition` (int): Iteration strategy (defaults to 0). Introduced in 0.5.0:
+
+          - 0: Standard (single run)
+          - 1: Sequential relay (multiple gamma legs). Requires: `bp_method=3`, `srelay_config`
         - `scale_factor` (float): The scale factor to use for min-sum. Defaults to 1.0.
           When set to 0.0, the scale factor is dynamically computed based on the
           number of iterations. Introduced in 0.4.0.
+        - `proc_float` (string): The processing float type to use. Defaults to
+          "fp64". Valid values are "fp32" and "fp64". Introduced in 0.5.0.
+        - `gamma0` (float): Memory strength parameter. Required for `bp_method=2`, and for
+          `composition=1` (sequential relay). Introduced in 0.5.0.
+        - `gamma_dist` (vector<float>): Gamma distribution interval [min, max] for disordered
+          memory strength. Required for `bp_method=3` if `explicit_gammas` not provided.
+          Introduced in 0.5.0.
+        - `explicit_gammas` (vector<vector<float>>): Explicit gamma values for each variable node.
+          For `bp_method=3` with `composition=0`, provide a 2D vector where each row has
+          `block_size` columns. For `composition=1` (Sequential relay), provide `num_sets` rows
+          (one per relay leg). Overrides `gamma_dist` if provided. Introduced in 0.5.0.
+        - `srelay_config` (heterogeneous_map): Sequential relay configuration (required for
+          `composition=1`). Contains the following parameters. Introduced in 0.5.0:
+
+          - `pre_iter` (int): Number of pre-iterations to run before relay legs
+          - `num_sets` (int): Number of relay sets (legs) to run
+          - `stopping_criterion` (string): When to stop relay legs:
+
+            - "All": Run all legs
+            - "FirstConv": Stop relay after first convergence
+            - "NConv": Stop after N convergences (requires `stop_nconv` parameter)
+          - `stop_nconv` (int): Number of convergences to wait for before stopping
+            (required only when `stopping_criterion="NConv"`)
+        - `bp_seed` (int): Seed for random number generation used in `bp_method=3` (disordered
+          memory BP). Optional parameter, defaults to 42 if not provided. Introduced in 0.5.0.
         - `opt_results` (heterogeneous_map): Optional results to return. This field can be
           left empty if no additional results are desired. Choices are:
-            - `bp_llr_history` (int): Return the last `bp_llr_history` iterations
-              of the BP LLR history. Minimum value is 0 and maximum value is
-              max_iterations. The actual number of returned iterations might be fewer
-              than `bp_llr_history` if BP converges before the requested number of
-              iterations. Introduced in 0.4.0.
+
+          - `bp_llr_history` (int): Return the last `bp_llr_history` iterations
+            of the BP LLR history. Minimum value is 0 and maximum value is
+            max_iterations. The actual number of returned iterations might be fewer
+            than `bp_llr_history` if BP converges before the requested number of
+            iterations. Introduced in 0.4.0. Note: Not supported for `composition=1`.
+          - `num_iter` (bool): If true, return the number of BP iterations run.
+            Introduced in 0.5.0.
 
diff --git a/docs/sphinx/api/qec/python_api.rst b/docs/sphinx/api/qec/python_api.rst
@@ -39,6 +39,11 @@ NVIDIA QLDPC Decoder
 
 .. include:: nv_qldpc_decoder_api.rst
 
+Sliding Window Decoder
+----------------------
+
+.. include:: sliding_window_api.rst
+
 .. _tensor_network_decoder_api_python:
 
 Tensor Network Decoder
diff --git a/docs/sphinx/components/qec/introduction.rst b/docs/sphinx/components/qec/introduction.rst
@@ -672,13 +672,29 @@ The Quantum Low-Density Parity-Check (QLDPC) decoder leverages GPU-accelerated b
 Since belief propagation is an iterative method which may not converge, decoding can be improved with a second-stage post-processing step. The `nv-qldpc-decoder`
 API provides various post-processing options, which can be selected through its parameters.
 
+**Belief Propagation Methods:**
+
+The decoder supports multiple BP algorithms (configured via ``bp_method``):
+
+* **Sum-Product BP** (``bp_method=0``, default): Classic belief propagation algorithm that computes exact probabilities.
+* **Min-Sum BP** (``bp_method=1``): Approximation to sum-product that uses min operations instead of sum. Optionally accepts ``scale_factor``.
+* **Memory-based BP** (``bp_method=2``): Min-sum with uniform memory strength across all variable nodes. **Requires:** ``gamma0``.
+* **Disordered Memory BP** (``bp_method=3``): Min-sum with per-variable memory strengths. **Requires:** ``gamma_dist`` [min, max] OR ``explicit_gammas`` (2D vector).
+
+**Sequential Relay Decoding:**
+
+Starting with version 0.5.0, the decoder supports Sequential Relay BP (configured via ``composition=1``), which combines disordered memory BP 
+with multiple "relay legs" - sequential runs with different gamma configurations. **Requires:** ``bp_method=3``, ``gamma0``, ``srelay_config``, and either ``gamma_dist`` OR ``explicit_gammas``.
+
 The QLDPC decoder `nv-qldpc-decoder` requires a CUDA-Q compatible GPU. See the list below for dependencies and compatibility:
 https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#dependencies-and-compatibility
 
 The decoder is based on the following references:
 
 * https://arxiv.org/pdf/2005.07016 
-* https://github.com/quantumgizmos/ldpc
+* https://github.com/quantumgizmos/ldpc 
+* https://arxiv.org/pdf/2506.01779 
+* https://github.com/trmue/relay 
 
 
 Usage:
diff --git a/docs/sphinx/components/solvers/introduction.rst b/docs/sphinx/components/solvers/introduction.rst
@@ -75,9 +75,6 @@ The :code:`molecule_options` structure provides extensive configuration for mole
 +---------------------+---------------+------------------+------------------------------------------+
 | integrals_casscf    | bool          | false            | Use CASSCF orbitals for integrals        |
 +---------------------+---------------+------------------+------------------------------------------+
-| potfile             | optional      | nullopt          | Path to external potential file          |
-|                     | <string>      |                  |                                          |
-+---------------------+---------------+------------------+------------------------------------------+
 | verbose             | bool          | false            | Enable detailed output logging           |
 +---------------------+---------------+------------------+------------------------------------------+
 
diff --git a/docs/sphinx/examples/qec/python/nv-qldpc-decoder.py b/docs/sphinx/examples/qec/python/nv-qldpc-decoder.py
@@ -199,7 +199,154 @@ def run_decoder(filename, num_shots, run_as_batched):
     )
 
 
+def demonstrate_bp_methods():
+    """
+    Demonstrate different BP methods available in nv-qldpc-decoder.
+    Shows configurations for: sum-product, min-sum, memory BP, 
+    disordered memory BP, and sequential relay BP.
+    """
+    # Simple 3x7 parity check matrix for demonstration
+    H_list = [[1, 0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0, 1],
+              [0, 0, 1, 0, 1, 1, 1]]
+    H = np.array(H_list, dtype=np.uint8)
+
+    print("=" * 60)
+    print("Demonstrating BP Methods in nv-qldpc-decoder")
+    print("=" * 60)
+
+    # Method 0: Sum-Product BP (default)
+    print("\n1. Sum-Product BP (bp_method=0, default):")
+    try:
+        decoder_sp = qec.get_decoder("nv-qldpc-decoder",
+                                     H,
+                                     bp_method=0,
+                                     max_iterations=30)
+    except Exception as e:
+        print(
+            'The nv-qldpc-decoder is not available with your current CUDA-Q ' +
+            'QEC installation.')
+        exit(0)
+    print("   Created decoder with sum-product BP")
+
+    # Method 1: Min-Sum BP
+    print("\n2. Min-Sum BP (bp_method=1):")
+    decoder_ms = qec.get_decoder("nv-qldpc-decoder",
+                                 H,
+                                 bp_method=1,
+                                 max_iterations=30,
+                                 scale_factor=1.0)
+    print("   Created decoder with min-sum BP")
+
+    # Method 2: Min-Sum with uniform Memory (Mem-BP)
+    print("\n3. Mem-BP (bp_method=2, uniform memory strength):")
+    decoder_mem = qec.get_decoder("nv-qldpc-decoder",
+                                  H,
+                                  bp_method=2,
+                                  max_iterations=30,
+                                  gamma0=0.5)
+    print("   Created decoder with Mem-BP (gamma0=0.5)")
+
+    # Method 3: Min-Sum with Disordered Memory (DMem-BP)
+    print("\n4. DMem-BP (bp_method=3, disordered memory strength):")
+    # Option A: Using gamma_dist (random gammas in range)
+    decoder_dmem = qec.get_decoder("nv-qldpc-decoder",
+                                   H,
+                                   bp_method=3,
+                                   max_iterations=30,
+                                   gamma_dist=[0.1, 0.5],
+                                   bp_seed=42)
+    print("   Created decoder with DMem-BP (gamma_dist=[0.1, 0.5])")
+
+    # Option B: Using explicit_gammas (specify exact gamma for each variable)
+    block_size = H.shape[1]
+    explicit_gammas = [[0.1 + 0.05 * i for i in range(block_size)]]
+    decoder_dmem_explicit = qec.get_decoder("nv-qldpc-decoder",
+                                            H,
+                                            bp_method=3,
+                                            max_iterations=30,
+                                            explicit_gammas=explicit_gammas)
+    print("   Created decoder with DMem-BP (explicit gammas)")
+
+    # Method 4: Sequential Relay BP (composition=1)
+    print("\n5. Sequential Relay BP (composition=1):")
+    print("   Requires bp_method=3 and srelay_config")
+
+    # Configure relay parameters
+    srelay_config = {
+        'pre_iter': 5,  # Run 5 iterations with gamma0 before relay legs
+        'num_sets': 3,  # Use 3 relay legs
+        'stopping_criterion': 'FirstConv'  # Stop after first convergence
+    }
+
+    # Option A: Using gamma_dist for relay legs
+    decoder_relay = qec.get_decoder("nv-qldpc-decoder",
+                                    H,
+                                    bp_method=3,
+                                    composition=1,
+                                    max_iterations=50,
+                                    gamma0=0.3,
+                                    gamma_dist=[0.1, 0.5],
+                                    srelay_config=srelay_config,
+                                    bp_seed=42)
+    print("   Created decoder with Relay-BP (gamma_dist, FirstConv stopping)")
+
+    # Option B: Using explicit gammas for each relay leg
+    num_relay_legs = 3
+    explicit_relay_gammas = [
+        [0.1 + 0.02 * i for i in range(block_size)],  # First relay leg
+        [0.2 + 0.03 * i for i in range(block_size)],  # Second relay leg
+        [0.3 + 0.04 * i for i in range(block_size)]  # Third relay leg
+    ]
+
+    srelay_config_all = {
+        'pre_iter': 10,
+        'num_sets': 3,
+        'stopping_criterion': 'All'  # Run all relay legs
+    }
+
+    decoder_relay_explicit = qec.get_decoder(
+        "nv-qldpc-decoder",
+        H,
+        bp_method=3,
+        composition=1,
+        max_iterations=50,
+        gamma0=0.3,
+        explicit_gammas=explicit_relay_gammas,
+        srelay_config=srelay_config_all)
+    print("   Created decoder with Relay-BP (explicit gammas, All legs)")
+
+    # Option C: NConv stopping criterion
+    srelay_config_nconv = {
+        'pre_iter': 5,
+        'num_sets': 10,
+        'stopping_criterion': 'NConv',
+        'stop_nconv': 3  # Stop after 3 convergences
+    }
+
+    decoder_relay_nconv = qec.get_decoder("nv-qldpc-decoder",
+                                          H,
+                                          bp_method=3,
+                                          composition=1,
+                                          max_iterations=50,
+                                          gamma0=0.3,
+                                          gamma_dist=[0.1, 0.6],
+                                          srelay_config=srelay_config_nconv,
+                                          bp_seed=42)
+    print("   Created decoder with Relay-BP (NConv stopping after 3)")
+
+    print("\n" + "=" * 60)
+    print("All decoder configurations created successfully!")
+    print("=" * 60)
+
+
 if __name__ == "__main__":
+    # Demonstrate different BP methods (introduced in v0.5.0)
+    print("\n### PART 1: BP Methods Demonstration ###\n")
+    demonstrate_bp_methods()
+
+    # Full decoding with test data
+    print("\n\n### PART 2: Full Decoding Example with Test Data ###\n")
+
     # See other test data options in https://github.com/NVIDIA/cudaqx/releases/tag/0.2.0
     filename = 'osd_1008_8785_0.001.json'
     bz2filename = filename + '.bz2'
diff --git a/docs/sphinx/examples/solvers/python/generate_molecular_hamiltonians.py b/docs/sphinx/examples/solvers/python/generate_molecular_hamiltonians.py
@@ -9,7 +9,7 @@
 # [Begin Documentation]
 import cudaq_solvers as solvers
 
-# Generate active space Hamiltonian using HF molecular orbitals
+# Generate active space Hamiltonian using RHF molecular orbitals
 
 geometry = [('N', (0.0, 0.0, 0.5600)), ('N', (0.0, 0.0, -0.5600))]
 molecule = solvers.create_molecule(geometry,
@@ -20,7 +20,24 @@
                                    norb_cas=3,
                                    verbose=True)
 
-print('N2 HF Hamiltonian')
+print('N2 RHF Hamiltonian')
+print('Energies : ', molecule.energies)
+print('No. of orbitals: ', molecule.n_orbitals)
+print('No. of electrons: ', molecule.n_electrons)
+
+# Generate active space Hamiltonian using UHF molecular orbitals
+
+geometry = [('N', (0.0, 0.0, 0.5600)), ('N', (0.0, 0.0, -0.5600))]
+molecule = solvers.create_molecule(geometry,
+                                   'sto-3g',
+                                   0,
+                                   0,
+                                   nele_cas=2,
+                                   norb_cas=3,
+                                   UR=True,
+                                   verbose=True)
+
+print('N2 UHF Hamiltonian')
 print('Energies : ', molecule.energies)
 print('No. of orbitals: ', molecule.n_orbitals)
 print('No. of electrons: ', molecule.n_electrons)
@@ -83,3 +100,36 @@
 print('Energies: ', molecule.energies)
 print('No. of orbitals: ', molecule.n_orbitals)
 print('No. of electrons: ', molecule.n_electrons)
+
+# For open-shell systems: Generate active space Hamiltonian using ROHF molecular orbitals
+geometry = [('N', (0.0, 0.0, 0.5600)), ('N', (0.0, 0.0, -0.5600))]
+molecule = solvers.create_molecule(geometry,
+                                   'sto-3g',
+                                   1,
+                                   1,
+                                   nele_cas=3,
+                                   norb_cas=3,
+                                   ccsd=True,
+                                   verbose=True)
+
+print('N2+ ROHF Hamiltonian')
+print('Energies : ', molecule.energies)
+print('No. of orbitals: ', molecule.n_orbitals)
+print('No. of electrons: ', molecule.n_electrons)
+
+# For open-shell systems: Generate active space Hamiltonian using UHF molecular orbitals
+geometry = [('N', (0.0, 0.0, 0.5600)), ('N', (0.0, 0.0, -0.5600))]
+molecule = solvers.create_molecule(geometry,
+                                   'sto-3g',
+                                   1,
+                                   1,
+                                   nele_cas=3,
+                                   norb_cas=3,
+                                   ccsd=True,
+                                   UR=True,
+                                   verbose=True)
+
+print('N2+ UHF Hamiltonian')
+print('Energies : ', molecule.energies)
+print('No. of orbitals: ', molecule.n_orbitals)
+print('No. of electrons: ', molecule.n_electrons)
diff --git a/docs/sphinx/examples_rst/qec/decoders.rst b/docs/sphinx/examples_rst/qec/decoders.rst
@@ -75,12 +75,33 @@ CUDA-Q QEC library. The library follows the CUDA-Q decoder Python and C++ interf
 :cpp:class:`cudaq::qec::decoder` for C++), but as documented in the API sections
 (:ref:`nv_qldpc_decoder_api_python` for Python and
 :ref:`nv_qldpc_decoder_api_cpp` for C++), there are many configuration options
-that can be passed to the constructor. The following example shows how to
-exercise the decoder using non-trivial pre-generated test data. The test data
-was generated using scripts originating from the GitHub repo for
-`BivariateBicycleCodes
-<https://github.com/sbravyi/BivariateBicycleCodes>`_ [#f1]_; it includes parity
-check matrices (PCMs) and test syndromes to exercise a decoder.
+that can be passed to the constructor.
+
+Belief Propagation Methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``nv-qldpc-decoder`` supports multiple belief propagation (BP) algorithms, each with different trade-offs 
+between accuracy, convergence, and speed:
+
+* **Sum-Product BP** (``bp_method=0``): The standard BP algorithm. Good baseline performance.
+* **Min-Sum BP** (``bp_method=1``): Faster approximation to sum-product. Can be tuned with ``scale_factor``.
+* **Memory-based BP** (``bp_method=2``): Adds uniform memory (``gamma0``) to help escape local minima. Useful when standard BP fails to converge.
+* **Disordered Memory BP** (``bp_method=3``): Uses per-variable memory strengths for better adaptability to code structure.
+* **Sequential Relay BP** (``composition=1``): Advanced method that runs multiple "relay legs" with different gamma configurations. See examples below for configuration.
+
+Usage Example
+~~~~~~~~~~~~~
+
+The following example shows how to exercise the decoder using non-trivial pre-generated test data. 
+The test data was generated using scripts originating from the GitHub repo for
+`BivariateBicycleCodes <https://github.com/sbravyi/BivariateBicycleCodes>`_ [#f1]_; 
+it includes parity check matrices (PCMs) and test syndromes to exercise a decoder.
+
+The example demonstrates:
+
+1. **Basic decoder configuration** with OSD post-processing
+2. **All BP methods** including Sequential Relay BP
+3. **Batched decoding** for improved performance
 
 .. literalinclude:: ../../examples/qec/python/nv-qldpc-decoder.py
     :language: python
diff --git a/docs/sphinx/examples_rst/solvers/molecular_hamiltonians.rst b/docs/sphinx/examples_rst/solvers/molecular_hamiltonians.rst