Skip to content

Conversation

@vprusso
Copy link
Contributor

@vprusso vprusso commented Oct 3, 2025

Closes: #559

  • Enables the ability to acquire high-level resource estimates for a given benchmark before running on a hardware device (example below)
  • CLI workflow to run estimates for a given benchmark
  • Adds tests and documentation to cover the new estimate mgym workflow

Example:

uv run mgym job estimate metriq_gym/schemas/examples/wit.example.json --provider quantinuum

Gives the following:

No device specified; estimating resources without device-specific topology.
Resource estimate for WIT on quantinuum:(no device)

| Metric                  |   Value |
|-------------------------|---------|
| Jobs                    |    1    |
| Circuits                |    1    |
| Total shots             | 8192    |
| Max qubits              |    7    |
| Total 2q gates          |   24    |
| Total 1q gates          |   57    |
| Total multi-qubit gates |    0    |
| Total measurements      |    1    |
| Total resets            |    0    |
| Total HQCs              |  507.99 |

| Per-circuit metric            |     Min |     Max |     Avg |
|-------------------------------|---------|---------|---------|
| Shots per circuit             | 8192    | 8192    | 8192    |
| 2q gates per circuit          |   24    |   24    |   24    |
| 1q gates per circuit          |   57    |   57    |   57    |
| Multi-qubit gates per circuit |    0    |    0    |    0    |
| Measurements per circuit      |    1    |    1    |    1    |
| Resets per circuit            |    0    |    0    |    0    |
| Circuit depth                 |   39    |   39    |   39    |
| HQC per circuit               |  507.99 |  507.99 |  507.99 |

For certain benchmarks (e.g. BSEQ), the device topology is needed to provide a resource estimate:

uv run mgym job estimate metriq_gym/schemas/examples/bseq.example.json --provider ibm --device ibm_pittsburgh
Resource estimate for BSEQ on ibm:ibm_pittsburgh

| Metric                  | Value   |
|-------------------------|---------|
| Jobs                    | 3       |
| Circuits                | 12      |
| Total shots             | 120     |
| Max qubits              | 156     |
| Total 2q gates          | 704     |
| Total 1q gates          | 2_112   |
| Total multi-qubit gates | 0       |
| Total measurements      | 1_408   |
| Total resets            | 0       |
| Total HQCs              | n/a     |

| Per-circuit metric            |   Min |   Max |    Avg |
|-------------------------------|-------|-------|--------|
| Shots per circuit             |    10 |    10 |  10    |
| 2q gates per circuit          |    57 |    60 |  58.67 |
| 1q gates per circuit          |   114 |   240 | 176    |
| Multi-qubit gates per circuit |     0 |     0 |   0    |
| Measurements per circuit      |   114 |   120 | 117.33 |
| Resets per circuit            |     0 |     0 |   0    |
| Circuit depth                 |     4 |     5 |   4.5  |

(Note that HQC is n/a as the target provider was IBM and only Quantinuum uses HQC).

(tagging @nonhermitian for visibility)

@vprusso vprusso marked this pull request as ready for review October 3, 2025 17:47
Copilot AI review requested due to automatic review settings October 3, 2025 17:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a resource estimation workflow to the CLI to compute pre-dispatch gate/shots/qubit metrics (and Quantinuum HQCs) for supported benchmarks. Key additions include a new resource_estimation module, a job estimate CLI subcommand, provider validation updates, and accompanying tests and docs.

  • New resource_estimation.py implementing circuit batching, gate counting, aggregation, HQC formula, and pretty-print output
  • Added mgym job estimate CLI workflow integrated into run.main dispatch table
  • Tests and documentation updated to cover estimation logic and behavior

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
metriq_gym/resource_estimation.py Implements core resource estimation logic, aggregation, HQC formula, and formatted output.
metriq_gym/run.py Adds estimate_job handler, provider allowlist, and dispatch table entry.
metriq_gym/cli.py Introduces the job estimate subcommand and related arguments.
tests/unit/test_run.py Adds tests for device validation and new estimate workflow behaviors.
tests/unit/test_resource_estimation.py New tests validating WIT estimation counts and HQC calculation.
tests/unit/test_cli.py Adjusts provider naming in expected output (ibmq -> ibm).
docs/source/cli_workflows.rst Documents new estimation workflow and HQC formula.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@cosenal cosenal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a single crisp proxy-to-the-cost number for each provider. However, I am not sure if this is possible with some providers. And I am not even sure if this is in the scope of metriq-gym.
Let's hold on this for now.

@nathanshammah
Copy link
Member

It would be nice to explicitly spell out "Hardware Quantum Credits" as HQCs.

@willzeng willzeng requested a review from Changhao-Li November 3, 2025 14:17
Copy link
Contributor

@Changhao-Li Changhao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vprusso vprusso requested a review from cosenal November 3, 2025 21:18
@cosenal
Copy link
Contributor

cosenal commented Nov 7, 2025

notes from @nonhermitian:

  1. confirm that circuit depth is 2q gates depth
  2. more explicit on multi-qubit (3+) definition

Copy link
Contributor

@cosenal cosenal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's left here is that the code in the estimate handler of each benchmark is a duplication of the code that construct the circuits in the dispatch_handler.
Instead, in each benchmark, we want to have a single function that construct the circuits, and which is used by both dispatch and estimate.
Alternatively, we could design it in a way that the estimate action is just a "dry-run" of the dispatch function.
Either way, we don't want two sources of truth for the circuit construction phase, in each benchmark.

@vprusso
Copy link
Contributor Author

vprusso commented Nov 24, 2025

What's left here is that the code in the estimate handler of each benchmark is a duplication of the code that construct the circuits in the dispatch_handler. Instead, in each benchmark, we want to have a single function that construct the circuits, and which is used by both dispatch and estimate. Alternatively, we could design it in a way that the estimate action is just a "dry-run" of the dispatch function. Either way, we don't want two sources of truth for the circuit construction phase, in each benchmark.

To make this a bit cleaner, I extracted the circuit construction into a shared helper method that both of the handlers call. I think this might be "cleaner" than the dry-run approach this gives more of an explicit separation and allows for easier testing.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 154 to 283
# def test_estimate_job_quantinuum_defaults(monkeypatch, capsys):
# class DummyParams(BaseModel):
# benchmark_name: str = "WIT"
# num_qubits: int = 6
# shots: int = 16

# captured = {}

# monkeypatch.setattr("os.path.exists", lambda _: True)
# monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())
# monkeypatch.setattr(
# "metriq_gym.run.setup_device",
# lambda *_, **__: SimpleNamespace(id="H1-1", profile=SimpleNamespace(basis_gates=[])),
# )

# def fake_estimate(job_type, params, device, hqc_fn=None):
# counts = GateCounts()
# hqc_value = hqc_fn(counts, 16) if hqc_fn else None
# captured["hqc"] = hqc_value
# circuit_estimate = CircuitEstimate(
# job_index=0,
# circuit_index=0,
# qubit_count=6,
# shots=16,
# gate_counts=counts,
# depth=1,
# hqc=hqc_value,
# )
# return ResourceEstimate(
# job_count=1,
# circuit_count=1,
# total_shots=16,
# max_qubits=6,
# total_gate_counts=counts,
# hqc_total=hqc_value,
# per_circuit=[circuit_estimate],
# )

# monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)

# args = SimpleNamespace(
# config="foo.json",
# provider="quantinuum",
# device="H1-1",
# )

# estimate_job(args, MagicMock())

# expected = quantinuum_hqc_formula(GateCounts(), 16)
# assert abs(captured["hqc"] - expected) < 1e-6


# def test_estimate_job_without_device_wit(monkeypatch, capsys):
# class DummyParams(BaseModel):
# benchmark_name: str = "WIT"
# num_qubits: int = 6
# shots: int = 16

# captured = {}

# monkeypatch.setattr("os.path.exists", lambda *_: True)
# monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())

# def fail_setup(*_args, **_kwargs):
# raise AssertionError("setup_device should not be called when device is omitted")

# monkeypatch.setattr("metriq_gym.run.setup_device", fail_setup)

# def fake_estimate(job_type, params, device, hqc_fn=None):
# counts = GateCounts()
# captured["device"] = device
# circuit_estimate = CircuitEstimate(
# job_index=0,
# circuit_index=0,
# qubit_count=6,
# shots=16,
# gate_counts=counts,
# depth=1,
# hqc=None,
# )
# return ResourceEstimate(
# job_count=1,
# circuit_count=1,
# total_shots=16,
# max_qubits=6,
# total_gate_counts=counts,
# hqc_total=None,
# per_circuit=[circuit_estimate],
# )

# monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)

# args = SimpleNamespace(
# config="foo.json",
# provider="quantinuum",
# device=None,
# )

# estimate_job(args, MagicMock())

# output = capsys.readouterr().out
# assert "Resource estimate for WIT" in output
# assert "(no device)" in output
# assert captured["device"] is None


# def test_estimate_job_requires_device(monkeypatch, capsys):
# class DummyParams(BaseModel):
# benchmark_name: str = "BSEQ"
# shots: int = 10

# monkeypatch.setattr("os.path.exists", lambda *_: True)
# monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams())

# def fake_estimate(*_args, **_kwargs):
# raise ValueError("BSEQ benchmark requires a device to estimate resources.")

# monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate)

# args = SimpleNamespace(
# config="foo.json",
# provider="aws",
# device=None,
# )

# estimate_job(args, MagicMock())

# output = capsys.readouterr().out
# assert "✗ BSEQ" in output
# assert "requires a device" in output
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three test functions for the new estimate_job functionality are commented out. These tests cover important scenarios including Quantinuum HQC calculation, device-optional estimation, and device-required benchmarks. Uncomment and ensure these tests pass to validate the resource estimation feature.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. @cosenal , do you recall where we landed on these? I think this was part of our collective "merge main adventure".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just postponed fixing them, because we didn't know what code design we were going to land on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah that makes sense. Okay, fixed in 1698686

Comment on lines +179 to +180
("Total 1q gates", fmt_int(estimate.total_gate_counts.one_qubit)),
("Total 2q gates", fmt_int(estimate.total_gate_counts.two_qubit)),
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The order of 1q and 2q gate counts is inconsistent between the summary table (1q before 2q) and the per-circuit statistics table (2q before 1q, lines 206-207). Consider using the same ordering in both tables for consistency.

Copilot uses AI. Check for mistakes.


def create_inner_product_circuit(num_qubits: int, seed: int = 0) -> QuantumCircuit:
# TODO: Allow seed to be set externally for reproducibility
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment was introduced but doesn't relate to the resource estimation feature. If this is a pre-existing issue, it should be tracked separately rather than included in this PR.

Suggested change
# TODO: Allow seed to be set externally for reproducibility

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be tracked as a separate issue. Did we add this, @cosenal ? (I don't quite recall).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall adding this comment either, but it's out of scope for this PR anyway, so feel free to delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fc5c71a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tests should include a resouce estimate

6 participants