NVIDIA
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 17 additions & 9 deletions b/‎README.md‎
Lines changed: 17 additions & 9 deletions
diff --git a/‎docker/release/Dockerfile‎
Lines changed: 5 additions & 0 deletions b/‎docker/release/Dockerfile‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/sphinx/api/qec/nv_qldpc_decoder_api.rst‎
Lines changed: 52 additions & 14 deletions b/‎docs/sphinx/api/qec/nv_qldpc_decoder_api.rst‎
Lines changed: 52 additions & 14 deletions
diff --git a/‎docs/sphinx/api/solvers/cpp_api.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/sphinx/api/solvers/cpp_api.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/sphinx/api/solvers/python_api.rst‎
Lines changed: 2 additions & 0 deletions b/‎docs/sphinx/api/solvers/python_api.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/sphinx/components/qec/introduction.rst‎
Lines changed: 17 additions & 1 deletion b/‎docs/sphinx/components/qec/introduction.rst‎
Lines changed: 17 additions & 1 deletion
diff --git a/‎docs/sphinx/components/solvers/introduction.rst‎
Lines changed: 77 additions & 6 deletions b/‎docs/sphinx/components/solvers/introduction.rst‎
Lines changed: 77 additions & 6 deletions
@@ -98,6 +98,9 @@ apps/
 # vim files
 *.tmp
 
+# Wheel files
+*.whl
+
 # Temporary build files for metapackages
 libs/*/python/metapackages/LICENSE
 libs/*/python/metapackages/NOTICE
 
@@ -1,29 +1,37 @@
 # Welcome to the CUDA-QX repository
 
-This repository contains a set of libraries that build on 
-NVIDIA CUDA-Q. These libraries enable the rapid development of hybrid quantum-classical 
-application code leveraging state-of-the-art CPUs, GPUs, and QPUs. 
+This repository contains a set of libraries that build on
+NVIDIA CUDA-Q. These libraries enable the rapid development of hybrid quantum-classical
+application code leveraging state-of-the-art CPUs, GPUs, and QPUs.
 
 ## Getting Started
-To learn more about how to work with the CUDA-QX libraries, please take a look at the 
-[CUDA-QX Documentation][cudaqx_docs]. The page contains detailed 
-[installation instructions][official_install] for officially released packages. 
+
+To learn more about how to work with the CUDA-QX libraries, please take a look at the
+[CUDA-QX Documentation][cudaqx_docs]. The page contains detailed
+[installation instructions][official_install] for officially released packages.
 
 [cudaqx_docs]: https://nvidia.github.io/cudaqx
 [official_install]: https://nvidia.github.io/cudaqx/quickstart/installation.html
 
 ## Contributing
 
 There are many ways in which you can get involved with CUDA-QX. If you are
-interested in developing quantum applications with the CUDA-QX libraries, 
-this repository is a great place to get started! For more information about 
-contributing to the CUDA-QX platform, please take a look at 
+interested in developing quantum applications with the CUDA-QX libraries,
+this repository is a great place to get started! For more information about
+contributing to the CUDA-QX platform, please take a look at
 [Contributing.md](./Contributing.md).
 
 ## License
 
 The code in this repository is licensed under [Apache License 2.0](./LICENSE).
 
+When distributed via PyPI, GHCR, or NGC, the binaries generated from this source
+code are also distributed under the Apache License 2.0; however, the
+`libcudaq-qec-nv-qldpc-decoder.so` library is closed source and is subject to
+the [NVIDIA Software License Agreement][github_qec_license]
+
+[github_qec_license]: https://github.com/NVIDIA/cudaqx/blob/main/libs/qec/LICENSE
+
 Contributing a pull request to this repository requires accepting the
 Contributor License Agreement (CLA) declaring that you have the right to, and
 actually do, grant us the rights to use your contribution. A CLA-bot will
 
@@ -27,6 +27,11 @@ ARG TARGETARCH
 
 USER root
 
+# Update the copyright notification:
+RUN echo -e "This container also includes CUDA-Q QEC and CUDA-Q Solvers.\n"\
+"Use of this container implies consent to the NVIDIA Software License agreement at\n"\
+"https://github.com/NVIDIA/cudaqx/blob/main/libs/qec/LICENSE\n" >> "$CUDA_QUANTUM_PATH/Copyright.txt"
+
 # Determine the appropriate zip file
 COPY installed_files-${TARGETARCH}.zip /tmp/
 
 
@@ -82,13 +82,14 @@
           (defaults to 1). Ignored unless `use_osd` is true.
         - `osd_order` (int): OSD postprocessor order (defaults to 0). Ref:
           `Decoding Across the Quantum LDPC Code Landscape <https://arxiv.org/pdf/2005.07016>`_
-            - For `osd_method=2` (Exhaustive), the number of possible
-              permutations searched after OSD-0 grows by 2^osd_order.
-            - For `osd_method=3` (Combination Sweep), this is the λ parameter. All
-              weight 1 permutations and the first λ bits worth of weight 2
-              permutations are searched after OSD-0. This is (syndrome_length -
-              block_size + λ * (λ - 1) / 2) additional permutations.
-            - For other `osd_method` values, this is ignored.
+
+          - For `osd_method=2` (Exhaustive), the number of possible
+            permutations searched after OSD-0 grows by 2^osd_order.
+          - For `osd_method=3` (Combination Sweep), this is the λ parameter. All
+            weight 1 permutations and the first λ bits worth of weight 2
+            permutations are searched after OSD-0. This is (syndrome_length -
+            block_size + λ * (λ - 1) / 2) additional permutations.
+          - For other `osd_method` values, this is ignored.
         - `bp_batch_size` (int): Number of syndromes that will be decoded in
           parallel for the BP decoder (defaults to 1)
         - `osd_batch_size` (int): Number of syndromes that will be decoded in
@@ -99,16 +100,53 @@
         - `clip_value` (float): Value to clip the BP messages to. Should be a
           non-negative value (defaults to 0.0, which disables clipping). Introduced in
           0.4.0.
-        - `bp_method` (int): The BP method to use. 0 for sum-product, 1 for min-sum.
-          Defaults to 0. Introduced in 0.4.0.
+        - `bp_method` (int): Core BP algorithm to use (defaults to 0). Introduced in 0.4.0,
+          expanded in 0.5.0:
+
+          - 0: sum-product
+          - 1: min-sum (introduced in 0.4.0)
+          - 2: min-sum+mem (uniform memory strength, introduced in 0.5.0)
+          - 3: min-sum+dmem (disordered memory strength, introduced in 0.5.0)
+        - `composition` (int): Iteration strategy (defaults to 0). Introduced in 0.5.0:
+
+          - 0: Standard (single run)
+          - 1: Sequential relay (multiple gamma legs). Requires: `bp_method=3`, `srelay_config`
         - `scale_factor` (float): The scale factor to use for min-sum. Defaults to 1.0.
           When set to 0.0, the scale factor is dynamically computed based on the
           number of iterations. Introduced in 0.4.0.
+        - `proc_float` (string): The processing float type to use. Defaults to
+          "fp64". Valid values are "fp32" and "fp64". Introduced in 0.5.0.
+        - `gamma0` (float): Memory strength parameter. Required for `bp_method=2`, and for
+          `composition=1` (sequential relay). Introduced in 0.5.0.
+        - `gamma_dist` (vector<float>): Gamma distribution interval [min, max] for disordered
+          memory strength. Required for `bp_method=3` if `explicit_gammas` not provided.
+          Introduced in 0.5.0.
+        - `explicit_gammas` (vector<vector<float>>): Explicit gamma values for each variable node.
+          For `bp_method=3` with `composition=0`, provide a 2D vector where each row has
+          `block_size` columns. For `composition=1` (Sequential relay), provide `num_sets` rows
+          (one per relay leg). Overrides `gamma_dist` if provided. Introduced in 0.5.0.
+        - `srelay_config` (heterogeneous_map): Sequential relay configuration (required for
+          `composition=1`). Contains the following parameters. Introduced in 0.5.0:
+
+          - `pre_iter` (int): Number of pre-iterations to run before relay legs
+          - `num_sets` (int): Number of relay sets (legs) to run
+          - `stopping_criterion` (string): When to stop relay legs:
+
+            - "All": Run all legs
+            - "FirstConv": Stop relay after first convergence
+            - "NConv": Stop after N convergences (requires `stop_nconv` parameter)
+          - `stop_nconv` (int): Number of convergences to wait for before stopping
+            (required only when `stopping_criterion="NConv"`)
+        - `bp_seed` (int): Seed for random number generation used in `bp_method=3` (disordered
+          memory BP). Optional parameter, defaults to 42 if not provided. Introduced in 0.5.0.
         - `opt_results` (heterogeneous_map): Optional results to return. This field can be
           left empty if no additional results are desired. Choices are:
-            - `bp_llr_history` (int): Return the last `bp_llr_history` iterations
-              of the BP LLR history. Minimum value is 0 and maximum value is
-              max_iterations. The actual number of returned iterations might be fewer
-              than `bp_llr_history` if BP converges before the requested number of
-              iterations. Introduced in 0.4.0.
+
+          - `bp_llr_history` (int): Return the last `bp_llr_history` iterations
+            of the BP LLR history. Minimum value is 0 and maximum value is
+            max_iterations. The actual number of returned iterations might be fewer
+            than `bp_llr_history` if BP converges before the requested number of
+            iterations. Introduced in 0.4.0. Note: Not supported for `composition=1`.
+          - `num_iter` (bool): If true, return the number of BP iterations run.
+            Introduced in 0.5.0.
 
@@ -6,6 +6,7 @@ CUDA-Q Solvers C++ API
 
 .. doxygenclass:: cudaq::solvers::spin_complement_gsd 
 .. doxygenclass:: cudaq::solvers::uccsd 
+.. doxygenclass:: cudaq::solvers::uccgsd 
 .. doxygenclass:: cudaq::solvers::qaoa_pool 
 
 .. doxygenfunction:: cudaq::solvers::get_operator_pool 
@@ -67,6 +68,8 @@ CUDA-Q Solvers C++ API
 .. doxygenfunction:: cudaq::solvers::stateprep::double_excitation
 .. doxygenfunction:: cudaq::solvers::stateprep::uccsd(cudaq::qview<>, const std::vector<double>&, std::size_t, std::size_t)
 .. doxygenfunction:: cudaq::solvers::stateprep::uccsd(cudaq::qview<>, const std::vector<double>&, std::size_t)
+.. doxygenfunction:: cudaq::solvers::stateprep::get_uccgsd_pauli_lists
+.. doxygenfunction:: cudaq::solvers::stateprep::uccgsd(cudaq::qview<>, const std::vector<double>&, const std::vector<std::vector<cudaq::pauli_word>>&, const std::vector<std::vector<double>>&)
 
 
 .. doxygenstruct:: cudaq::solvers::qaoa_result
 
@@ -27,6 +27,8 @@ CUDA-Q Solvers Python API
 .. autofunction:: cudaq_solvers.stateprep.double_excitation
 .. autofunction:: cudaq_solvers.stateprep.get_num_uccsd_parameters
 .. autofunction:: cudaq_solvers.stateprep.get_uccsd_excitations    
+.. autofunction:: cudaq_solvers.stateprep.get_uccgsd_pauli_lists
+.. autofunction:: cudaq_solvers.stateprep.uccgsd
 
 .. autofunction:: cudaq_solvers.get_num_qaoa_parameters
 
 
@@ -640,13 +640,29 @@ The Quantum Low-Density Parity-Check (QLDPC) decoder leverages GPU-accelerated b
 Since belief propagation is an iterative method which may not converge, decoding can be improved with a second-stage post-processing step. The `nv-qldpc-decoder`
 API provides various post-processing options, which can be selected through its parameters.
 
+**Belief Propagation Methods:**
+
+The decoder supports multiple BP algorithms (configured via ``bp_method``):
+
+* **Sum-Product BP** (``bp_method=0``, default): Classic belief propagation algorithm that computes exact probabilities.
+* **Min-Sum BP** (``bp_method=1``): Approximation to sum-product that uses min operations instead of sum. Optionally accepts ``scale_factor``.
+* **Memory-based BP** (``bp_method=2``): Min-sum with uniform memory strength across all variable nodes. **Requires:** ``gamma0``.
+* **Disordered Memory BP** (``bp_method=3``): Min-sum with per-variable memory strengths. **Requires:** ``gamma_dist`` [min, max] OR ``explicit_gammas`` (2D vector).
+
+**Sequential Relay Decoding:**
+
+Starting with version 0.5.0, the decoder supports Sequential Relay BP (configured via ``composition=1``), which combines disordered memory BP 
+with multiple "relay legs" - sequential runs with different gamma configurations. **Requires:** ``bp_method=3``, ``gamma0``, ``srelay_config``, and either ``gamma_dist`` OR ``explicit_gammas``.
+
 The QLDPC decoder `nv-qldpc-decoder` requires a CUDA-Q compatible GPU. See the list below for dependencies and compatibility:
 https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#dependencies-and-compatibility
 
 The decoder is based on the following references:
 
 * https://arxiv.org/pdf/2005.07016 
-* https://github.com/quantumgizmos/ldpc
+* https://github.com/quantumgizmos/ldpc 
+* https://arxiv.org/pdf/2506.01779 
+* https://github.com/trmue/relay 
 
 
 Usage:
 
@@ -75,9 +75,6 @@ The :code:`molecule_options` structure provides extensive configuration for mole
 +---------------------+---------------+------------------+------------------------------------------+
 | integrals_casscf    | bool          | false            | Use CASSCF orbitals for integrals        |
 +---------------------+---------------+------------------+------------------------------------------+
-| potfile             | optional      | nullopt          | Path to external potential file          |
-|                     | <string>      |                  |                                          |
-+---------------------+---------------+------------------+------------------------------------------+
 | verbose             | bool          | false            | Enable detailed output logging           |
 +---------------------+---------------+------------------+------------------------------------------+
 
@@ -495,12 +492,32 @@ Available Operator Pools
 
 CUDA-QX provides several pre-built operator pools for ADAPT-VQE:
 
-* **spin_complement_gsd**: Spin-complemented generalized singles and doubles
-* **uccsd**: UCCSD operators
+* **spin_complement_gsd**: Spin-complemented generalized singles and doubles.
+    This operator pool combines generalized excitations with enforced spin symmetry. It is 
+    more powerful than UCCSD because its generalized operators capture more electron correlation,
+     and it is more reliable than both UCCSD and UCCGSD because its spin-complemented 
+     construction prevents the unphysical "spin-symmetry breaking".
+* **uccsd**: UCCSD operators. 
+    The standard, chemically-inspired ansatz. Excitation Space 
+    is Restricted. It only includes single and double excitations 
+    where electrons move from a reference-occupied orbital (i) 
+    to a reference-virtual orbital (a), 
+    relative to the starting Hartree-Fock state. Excellent at capturing dynamic correlation 
+    (short-range, instantaneous electron interactions).
+* **uccgsd**: UCC generalized singles and doubles.
+    More expressive than UCCSD, as it includes all possible 
+    single and double excitations, regardless of their occupied/virtual status in the reference state.
+    Capable of capturing both dynamic and static (strong) correlation
+    but at the cost of increased circuit depth and parameter count.
 * **qaoa**: QAOA mixer excitation operators
-
+    It generates all possible single-qubit X and Y terms, along with all possible 
+    two-qubit interaction terms (XX, YY, XY, YX, XZ, ZX, YZ, ZY) across every pair of qubits. 
+    This pool offers a rich basis for constructing the mixer Hamiltonian for ADAPT-QAOA algorithms.
+    
 .. code-block:: python
 
+    import cudaq_solvers as solvers
+
     # Generate different operator pools
     gsd_ops = solvers.get_operator_pool(
         "spin_complement_gsd",
@@ -513,6 +530,60 @@ CUDA-QX provides several pre-built operator pools for ADAPT-VQE:
         num_electrons=molecule.n_electrons
     )
 
+    uccgsd_ops = solvers.get_operator_pool(
+        "uccgsd",
+        num_orbitals=molecule.n_orbitals
+    )
+
+Available Ansatz
+^^^^^^^^^^^^^^^^^^
+
+CUDA-QX provides several state preparations ansatz for VQE.
+
+* **uccsd**: UCCSD operators
+* **uccgsd**: UCC generalized singles and doubles
+
+.. code-block:: python
+
+    import cudaq_solvers as solvers
+
+    # Using UCCSD ansatz
+    geometry = [('H', (0., 0., 0.)), ('H', (0., 0., .7474))]
+    molecule = solvers.create_molecule(geometry, 'sto-3g', 0, 0, casci=True)
+
+    numQubits = molecule.n_orbitals * 2
+    numElectrons = molecule.n_electrons
+    spin = 0
+
+    @cudaq.kernel
+    def ansatz(thetas: list[float]):
+        q = cudaq.qvector(numQubits)
+        for i in range(numElectrons):
+            x(q[i])
+        solvers.stateprep.uccsd(q, thetas, numElectrons, spin)
+
+    
+    # Using UCCGSD ansatz
+    geometry = [('H', (0., 0., 0.)), ('H', (0., 0., .7474))]
+    molecule = solvers.create_molecule(geometry, 'sto-3g', 0, 0, casci=True)
+
+    numQubits = molecule.n_orbitals * 2
+    numElectrons = molecule.n_electrons
+
+    # Get grouped Pauli words and coefficients from UCCGSD pool
+    pauliWordsList, coefficientsList = solvers.stateprep.get_uccgsd_pauli_lists(
+        numQubits, only_singles=False, only_doubles=False)
+    
+    @cudaq.kernel
+    def ansatz(numQubits: int, numElectrons: int, thetas: list[float],
+               pauliWordsList: list[list[cudaq.pauli_word]],
+               coefficientsList: list[list[float]]):
+        q = cudaq.qvector(numQubits)
+        for i in range(numElectrons):
+            x(q[i])
+        solvers.stateprep.uccgsd(q, thetas, pauliWordsList, coefficientsList)
+
+
 Algorithm Parameters
 ^^^^^^^^^^^^^^^^^^^^^^