feat: resource estimation workflow for benchmark jobs. #566

vprusso · 2025-10-03T17:47:27Z

Closes: #559

Enables the ability to acquire high-level resource estimates for a given benchmark before running on a hardware device (example below)
CLI workflow to run estimates for a given benchmark
Adds tests and documentation to cover the new estimate mgym workflow

Example:

uv run mgym job estimate metriq_gym/schemas/examples/wit.example.json --provider quantinuum

Gives the following:

No device specified; estimating resources without device-specific topology.
Resource estimate for WIT on quantinuum:(no device)

| Metric                  |   Value |
|-------------------------|---------|
| Jobs                    |    1    |
| Circuits                |    1    |
| Total shots             | 8192    |
| Max qubits              |    7    |
| Total 2q gates          |   24    |
| Total 1q gates          |   57    |
| Total multi-qubit gates |    0    |
| Total measurements      |    1    |
| Total resets            |    0    |
| Total HQCs              |  507.99 |

| Per-circuit metric            |     Min |     Max |     Avg |
|-------------------------------|---------|---------|---------|
| Shots per circuit             | 8192    | 8192    | 8192    |
| 2q gates per circuit          |   24    |   24    |   24    |
| 1q gates per circuit          |   57    |   57    |   57    |
| Multi-qubit gates per circuit |    0    |    0    |    0    |
| Measurements per circuit      |    1    |    1    |    1    |
| Resets per circuit            |    0    |    0    |    0    |
| Circuit depth                 |   39    |   39    |   39    |
| HQC per circuit               |  507.99 |  507.99 |  507.99 |

For certain benchmarks (e.g. BSEQ), the device topology is needed to provide a resource estimate:

uv run mgym job estimate metriq_gym/schemas/examples/bseq.example.json --provider ibm --device ibm_pittsburgh

Resource estimate for BSEQ on ibm:ibm_pittsburgh

| Metric                  | Value   |
|-------------------------|---------|
| Jobs                    | 3       |
| Circuits                | 12      |
| Total shots             | 120     |
| Max qubits              | 156     |
| Total 2q gates          | 704     |
| Total 1q gates          | 2_112   |
| Total multi-qubit gates | 0       |
| Total measurements      | 1_408   |
| Total resets            | 0       |
| Total HQCs              | n/a     |

| Per-circuit metric            |   Min |   Max |    Avg |
|-------------------------------|-------|-------|--------|
| Shots per circuit             |    10 |    10 |  10    |
| 2q gates per circuit          |    57 |    60 |  58.67 |
| 1q gates per circuit          |   114 |   240 | 176    |
| Multi-qubit gates per circuit |     0 |     0 |   0    |
| Measurements per circuit      |   114 |   120 | 117.33 |
| Resets per circuit            |     0 |     0 |   0    |
| Circuit depth                 |     4 |     5 |   4.5  |

(Note that HQC is n/a as the target provider was IBM and only Quantinuum uses HQC).

(tagging @nonhermitian for visibility)

Copilot

Pull Request Overview

Adds a resource estimation workflow to the CLI to compute pre-dispatch gate/shots/qubit metrics (and Quantinuum HQCs) for supported benchmarks. Key additions include a new resource_estimation module, a job estimate CLI subcommand, provider validation updates, and accompanying tests and docs.

New resource_estimation.py implementing circuit batching, gate counting, aggregation, HQC formula, and pretty-print output
Added mgym job estimate CLI workflow integrated into run.main dispatch table
Tests and documentation updated to cover estimation logic and behavior

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
metriq_gym/resource_estimation.py	Implements core resource estimation logic, aggregation, HQC formula, and formatted output.
metriq_gym/run.py	Adds estimate_job handler, provider allowlist, and dispatch table entry.
metriq_gym/cli.py	Introduces the job estimate subcommand and related arguments.
tests/unit/test_run.py	Adds tests for device validation and new estimate workflow behaviors.
tests/unit/test_resource_estimation.py	New tests validating WIT estimation counts and HQC calculation.
tests/unit/test_cli.py	Adjusts provider naming in expected output (ibmq -> ibm).
docs/source/cli_workflows.rst	Documents new estimation workflow and HQC formula.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

metriq_gym/run.py

metriq_gym/resource_estimation.py

docs/source/cli_workflows.rst

Co-authored-by: Copilot <[email protected]>

cosenal

It would be nice to have a single crisp proxy-to-the-cost number for each provider. However, I am not sure if this is possible with some providers. And I am not even sure if this is in the scope of metriq-gym.
Let's hold on this for now.

metriq_gym/resource_estimation.py

nathanshammah · 2025-10-22T15:56:49Z

It would be nice to explicitly spell out "Hardware Quantum Credits" as HQCs.

Changhao-Li

LGTM!

cosenal · 2025-11-07T15:10:39Z

notes from @nonhermitian:

confirm that circuit depth is 2q gates depth
more explicit on multi-qubit (3+) definition

metriq_gym/resource_estimation.py

cosenal

What's left here is that the code in the estimate handler of each benchmark is a duplication of the code that construct the circuits in the dispatch_handler.
Instead, in each benchmark, we want to have a single function that construct the circuits, and which is used by both dispatch and estimate.
Alternatively, we could design it in a way that the estimate action is just a "dry-run" of the dispatch function.
Either way, we don't want two sources of truth for the circuit construction phase, in each benchmark.

vprusso · 2025-11-24T19:02:00Z

What's left here is that the code in the estimate handler of each benchmark is a duplication of the code that construct the circuits in the dispatch_handler. Instead, in each benchmark, we want to have a single function that construct the circuits, and which is used by both dispatch and estimate. Alternatively, we could design it in a way that the estimate action is just a "dry-run" of the dispatch function. Either way, we don't want two sources of truth for the circuit construction phase, in each benchmark.

To make this a bit cleaner, I extracted the circuit construction into a shared helper method that both of the handlers call. I think this might be "cleaner" than the dry-run approach this gives more of an explicit separation and allows for easier testing.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-25T16:18:31Z

tests/unit/test_run.py

+# def test_estimate_job_quantinuum_defaults(monkeypatch, capsys):
+#     class DummyParams(BaseModel):
+#         benchmark_name: str = "WIT"
+#         num_qubits: int = 6
+#         shots: int = 16
+
+#     captured = {}
+
+#     monkeypatch.setattr("os.path.exists", lambda _: True)
+#     monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())
+#     monkeypatch.setattr(
+#         "metriq_gym.run.setup_device",
+#         lambda *_, **__: SimpleNamespace(id="H1-1", profile=SimpleNamespace(basis_gates=[])),
+#     )
+
+#     def fake_estimate(job_type, params, device, hqc_fn=None):
+#         counts = GateCounts()
+#         hqc_value = hqc_fn(counts, 16) if hqc_fn else None
+#         captured["hqc"] = hqc_value
+#         circuit_estimate = CircuitEstimate(
+#             job_index=0,
+#             circuit_index=0,
+#             qubit_count=6,
+#             shots=16,
+#             gate_counts=counts,
+#             depth=1,
+#             hqc=hqc_value,
+#         )
+#         return ResourceEstimate(
+#             job_count=1,
+#             circuit_count=1,
+#             total_shots=16,
+#             max_qubits=6,
+#             total_gate_counts=counts,
+#             hqc_total=hqc_value,
+#             per_circuit=[circuit_estimate],
+#         )
+
+#     monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)
+
+#     args = SimpleNamespace(
+#         config="foo.json",
+#         provider="quantinuum",
+#         device="H1-1",
+#     )
+
+#     estimate_job(args, MagicMock())
+
+#     expected = quantinuum_hqc_formula(GateCounts(), 16)
+#     assert abs(captured["hqc"] - expected) < 1e-6
+
+
+# def test_estimate_job_without_device_wit(monkeypatch, capsys):
+#     class DummyParams(BaseModel):
+#         benchmark_name: str = "WIT"
+#         num_qubits: int = 6
+#         shots: int = 16
+
+#     captured = {}
+
+#     monkeypatch.setattr("os.path.exists", lambda *_: True)
+#     monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())
+
+#     def fail_setup(*_args, **_kwargs):
+#         raise AssertionError("setup_device should not be called when device is omitted")
+
+#     monkeypatch.setattr("metriq_gym.run.setup_device", fail_setup)
+
+#     def fake_estimate(job_type, params, device, hqc_fn=None):
+#         counts = GateCounts()
+#         captured["device"] = device
+#         circuit_estimate = CircuitEstimate(
+#             job_index=0,
+#             circuit_index=0,
+#             qubit_count=6,
+#             shots=16,
+#             gate_counts=counts,
+#             depth=1,
+#             hqc=None,
+#         )
+#         return ResourceEstimate(
+#             job_count=1,
+#             circuit_count=1,
+#             total_shots=16,
+#             max_qubits=6,
+#             total_gate_counts=counts,
+#             hqc_total=None,
+#             per_circuit=[circuit_estimate],
+#         )
+
+#     monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)
+
+#     args = SimpleNamespace(
+#         config="foo.json",
+#         provider="quantinuum",
+#         device=None,
+#     )
+
+#     estimate_job(args, MagicMock())
+
+#     output = capsys.readouterr().out
+#     assert "Resource estimate for WIT" in output
+#     assert "(no device)" in output
+#     assert captured["device"] is None
+
+
+# def test_estimate_job_requires_device(monkeypatch, capsys):
+#     class DummyParams(BaseModel):
+#         benchmark_name: str = "BSEQ"
+#         shots: int = 10
+
+#     monkeypatch.setattr("os.path.exists", lambda *_: True)
+#     monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())
+
+#     def fake_estimate(*_args, **_kwargs):
+#         raise ValueError("BSEQ benchmark requires a device to estimate resources.")
+
+#     monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)
+
+#     args = SimpleNamespace(
+#         config="foo.json",
+#         provider="aws",
+#         device=None,
+#     )
+
+#     estimate_job(args, MagicMock())
+
+#     output = capsys.readouterr().out
+#     assert "✗ BSEQ" in output
+#     assert "requires a device" in output


Three test functions for the new estimate_job functionality are commented out. These tests cover important scenarios including Quantinuum HQC calculation, device-optional estimation, and device-required benchmarks. Uncomment and ensure these tests pass to validate the resource estimation feature.

Oh, right. @cosenal , do you recall where we landed on these? I think this was part of our collective "merge main adventure".

I think we just postponed fixing them, because we didn't know what code design we were going to land on.

Ah, yeah that makes sense. Okay, fixed in 1698686

metriq_gym/resource_estimation.py

Copilot · 2025-11-25T16:18:32Z

metriq_gym/resource_estimation.py

+        ("Total 1q gates", fmt_int(estimate.total_gate_counts.one_qubit)),
+        ("Total 2q gates", fmt_int(estimate.total_gate_counts.two_qubit)),


[nitpick] The order of 1q and 2q gate counts is inconsistent between the summary table (1q before 2q) and the per-circuit statistics table (2q before 1q, lines 206-207). Consider using the same ordering in both tables for consistency.

Copilot · 2025-11-25T16:18:32Z

metriq_gym/benchmarks/qml_kernel.py



 def create_inner_product_circuit(num_qubits: int, seed: int = 0) -> QuantumCircuit:
+    # TODO: Allow seed to be set externally for reproducibility


This TODO comment was introduced but doesn't relate to the resource estimation feature. If this is a pre-existing issue, it should be tracked separately rather than included in this PR.

Suggested change

# TODO: Allow seed to be set externally for reproducibility

This seems like it should be tracked as a separate issue. Did we add this, @cosenal ? (I don't quite recall).

I don't recall adding this comment either, but it's out of scope for this PR anyway, so feel free to delete it.

Done in fc5c71a

metriq_gym/resource_estimation.py

feat: resource estimation workflow for benchmark jobs.

9974efd

vprusso marked this pull request as ready for review October 3, 2025 17:47

Copilot AI review requested due to automatic review settings October 3, 2025 17:47

Copilot AI reviewed Oct 3, 2025

View reviewed changes

metriq_gym/run.py Outdated Show resolved Hide resolved

metriq_gym/resource_estimation.py Outdated Show resolved Hide resolved

metriq_gym/resource_estimation.py Outdated Show resolved Hide resolved

docs/source/cli_workflows.rst Outdated Show resolved Hide resolved

vprusso and others added 6 commits October 3, 2025 14:03

Update metriq_gym/run.py

4f19079

Co-authored-by: Copilot <[email protected]>

Update metriq_gym/resource_estimation.py

8892ac7

Co-authored-by: Copilot <[email protected]>

Update metriq_gym/resource_estimation.py

b497d18

Co-authored-by: Copilot <[email protected]>

Update docs/source/cli_workflows.rst

0ef5469

Co-authored-by: Copilot <[email protected]>

feat: resource estimation workflow for benchmark jobs.

ca3f220

feat: resource estimation workflow for benchmark jobs.

d5a541d

cosenal reviewed Oct 7, 2025

View reviewed changes

metriq_gym/resource_estimation.py Outdated Show resolved Hide resolved

metriq_gym/resource_estimation.py Outdated Show resolved Hide resolved

willzeng requested a review from Changhao-Li November 3, 2025 14:17

Changhao-Li approved these changes Nov 3, 2025

View reviewed changes

vprusso requested a review from cosenal November 3, 2025 21:18

cosenal added 8 commits November 4, 2025 18:37

Merge branch 'main' into 559-resource-estimation-benchmark

f420303

file added by mistake

996d24c

Merge branch 'main' into 559-resource-estimation-benchmark

3e93c4a

init device variable

790be3d

Merge branch 'main' into 559-resource-estimation-benchmark

4397573

delegate resource estimation to benchmarks

67c07df

clean up unused fns

040070f

update all benchmarks to new hook

1f8b783

cosenal and others added 2 commits November 10, 2025 18:26

better label

6c37c26

chore: support for quantinuum and bseq

db096df

bachase reviewed Nov 14, 2025

View reviewed changes

metriq_gym/resource_estimation.py Outdated Show resolved Hide resolved

vprusso added 3 commits November 18, 2025 07:48

refactor: use misra-gries algorithm for bseq

bc40162

Merge branch 'main' into 559-resource-estimation-benchmark

3db430a

chore: update n_measure

10f9000

fix: num_qubits add to hqc formula

e4e35ee

cosenal requested changes Nov 23, 2025

View reviewed changes

vprusso added 4 commits November 24, 2025 07:17

chore: mege main

07d908d

chore: merge main

868fe2a

chore: adding resource estimates to other benchmarks

a6e17f9

fix: mypy

b28d019

cosenal requested review from bachase, Copilot and cosenal November 25, 2025 16:17

Copilot AI reviewed Nov 25, 2025

View reviewed changes

vprusso added 5 commits November 25, 2025 15:17

chore: fixing blocking dispatch and polling

6ddc73c

chore: undo wrong commit fileg

8ee389c

chore: removing TODO

fc5c71a

Merge branch 'main' into 559-resource-estimation-benchmark

17304e3

fix: commented breaking tests now work

1698686

		("Total 1q gates", fmt_int(estimate.total_gate_counts.one_qubit)),
		("Total 2q gates", fmt_int(estimate.total_gate_counts.two_qubit)),



		def create_inner_product_circuit(num_qubits: int, seed: int = 0) -> QuantumCircuit:
		# TODO: Allow seed to be set externally for reproducibility

feat: resource estimation workflow for benchmark jobs. #566

Are you sure you want to change the base?

feat: resource estimation workflow for benchmark jobs. #566

Uh oh!

Conversation

vprusso commented Oct 3, 2025

Example:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cosenal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nathanshammah commented Oct 22, 2025

Uh oh!

Changhao-Li left a comment

Choose a reason for hiding this comment

Uh oh!

cosenal commented Nov 7, 2025

Uh oh!

Uh oh!

cosenal left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vprusso commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

vprusso Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cosenal Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

vprusso Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

vprusso Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cosenal Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

vprusso Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cosenal left a comment •

edited

Loading