-
Notifications
You must be signed in to change notification settings - Fork 12
feat: resource estimation workflow for benchmark jobs. #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a resource estimation workflow to the CLI to compute pre-dispatch gate/shots/qubit metrics (and Quantinuum HQCs) for supported benchmarks. Key additions include a new resource_estimation module, a job estimate CLI subcommand, provider validation updates, and accompanying tests and docs.
- New resource_estimation.py implementing circuit batching, gate counting, aggregation, HQC formula, and pretty-print output
- Added mgym job estimate CLI workflow integrated into run.main dispatch table
- Tests and documentation updated to cover estimation logic and behavior
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| metriq_gym/resource_estimation.py | Implements core resource estimation logic, aggregation, HQC formula, and formatted output. |
| metriq_gym/run.py | Adds estimate_job handler, provider allowlist, and dispatch table entry. |
| metriq_gym/cli.py | Introduces the job estimate subcommand and related arguments. |
| tests/unit/test_run.py | Adds tests for device validation and new estimate workflow behaviors. |
| tests/unit/test_resource_estimation.py | New tests validating WIT estimation counts and HQC calculation. |
| tests/unit/test_cli.py | Adjusts provider naming in expected output (ibmq -> ibm). |
| docs/source/cli_workflows.rst | Documents new estimation workflow and HQC formula. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
cosenal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a single crisp proxy-to-the-cost number for each provider. However, I am not sure if this is possible with some providers. And I am not even sure if this is in the scope of metriq-gym.
Let's hold on this for now.
|
It would be nice to explicitly spell out "Hardware Quantum Credits" as HQCs. |
Changhao-Li
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
notes from @nonhermitian:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's left here is that the code in the estimate handler of each benchmark is a duplication of the code that construct the circuits in the dispatch_handler.
Instead, in each benchmark, we want to have a single function that construct the circuits, and which is used by both dispatch and estimate.
Alternatively, we could design it in a way that the estimate action is just a "dry-run" of the dispatch function.
Either way, we don't want two sources of truth for the circuit construction phase, in each benchmark.
To make this a bit cleaner, I extracted the circuit construction into a shared helper method that both of the handlers call. I think this might be "cleaner" than the dry-run approach this gives more of an explicit separation and allows for easier testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/unit/test_run.py
Outdated
| # def test_estimate_job_quantinuum_defaults(monkeypatch, capsys): | ||
| # class DummyParams(BaseModel): | ||
| # benchmark_name: str = "WIT" | ||
| # num_qubits: int = 6 | ||
| # shots: int = 16 | ||
|
|
||
| # captured = {} | ||
|
|
||
| # monkeypatch.setattr("os.path.exists", lambda _: True) | ||
| # monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams()) | ||
| # monkeypatch.setattr( | ||
| # "metriq_gym.run.setup_device", | ||
| # lambda *_, **__: SimpleNamespace(id="H1-1", profile=SimpleNamespace(basis_gates=[])), | ||
| # ) | ||
|
|
||
| # def fake_estimate(job_type, params, device, hqc_fn=None): | ||
| # counts = GateCounts() | ||
| # hqc_value = hqc_fn(counts, 16) if hqc_fn else None | ||
| # captured["hqc"] = hqc_value | ||
| # circuit_estimate = CircuitEstimate( | ||
| # job_index=0, | ||
| # circuit_index=0, | ||
| # qubit_count=6, | ||
| # shots=16, | ||
| # gate_counts=counts, | ||
| # depth=1, | ||
| # hqc=hqc_value, | ||
| # ) | ||
| # return ResourceEstimate( | ||
| # job_count=1, | ||
| # circuit_count=1, | ||
| # total_shots=16, | ||
| # max_qubits=6, | ||
| # total_gate_counts=counts, | ||
| # hqc_total=hqc_value, | ||
| # per_circuit=[circuit_estimate], | ||
| # ) | ||
|
|
||
| # monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate) | ||
|
|
||
| # args = SimpleNamespace( | ||
| # config="foo.json", | ||
| # provider="quantinuum", | ||
| # device="H1-1", | ||
| # ) | ||
|
|
||
| # estimate_job(args, MagicMock()) | ||
|
|
||
| # expected = quantinuum_hqc_formula(GateCounts(), 16) | ||
| # assert abs(captured["hqc"] - expected) < 1e-6 | ||
|
|
||
|
|
||
| # def test_estimate_job_without_device_wit(monkeypatch, capsys): | ||
| # class DummyParams(BaseModel): | ||
| # benchmark_name: str = "WIT" | ||
| # num_qubits: int = 6 | ||
| # shots: int = 16 | ||
|
|
||
| # captured = {} | ||
|
|
||
| # monkeypatch.setattr("os.path.exists", lambda *_: True) | ||
| # monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams()) | ||
|
|
||
| # def fail_setup(*_args, **_kwargs): | ||
| # raise AssertionError("setup_device should not be called when device is omitted") | ||
|
|
||
| # monkeypatch.setattr("metriq_gym.run.setup_device", fail_setup) | ||
|
|
||
| # def fake_estimate(job_type, params, device, hqc_fn=None): | ||
| # counts = GateCounts() | ||
| # captured["device"] = device | ||
| # circuit_estimate = CircuitEstimate( | ||
| # job_index=0, | ||
| # circuit_index=0, | ||
| # qubit_count=6, | ||
| # shots=16, | ||
| # gate_counts=counts, | ||
| # depth=1, | ||
| # hqc=None, | ||
| # ) | ||
| # return ResourceEstimate( | ||
| # job_count=1, | ||
| # circuit_count=1, | ||
| # total_shots=16, | ||
| # max_qubits=6, | ||
| # total_gate_counts=counts, | ||
| # hqc_total=None, | ||
| # per_circuit=[circuit_estimate], | ||
| # ) | ||
|
|
||
| # monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate) | ||
|
|
||
| # args = SimpleNamespace( | ||
| # config="foo.json", | ||
| # provider="quantinuum", | ||
| # device=None, | ||
| # ) | ||
|
|
||
| # estimate_job(args, MagicMock()) | ||
|
|
||
| # output = capsys.readouterr().out | ||
| # assert "Resource estimate for WIT" in output | ||
| # assert "(no device)" in output | ||
| # assert captured["device"] is None | ||
|
|
||
|
|
||
| # def test_estimate_job_requires_device(monkeypatch, capsys): | ||
| # class DummyParams(BaseModel): | ||
| # benchmark_name: str = "BSEQ" | ||
| # shots: int = 10 | ||
|
|
||
| # monkeypatch.setattr("os.path.exists", lambda *_: True) | ||
| # monkeypatch.setattr("metriq_gym.run.load_and_validate", lambda *_: DummyParams()) | ||
|
|
||
| # def fake_estimate(*_args, **_kwargs): | ||
| # raise ValueError("BSEQ benchmark requires a device to estimate resources.") | ||
|
|
||
| # monkeypatch.setattr("metriq_gym.run.estimate_resources", fake_estimate) | ||
|
|
||
| # args = SimpleNamespace( | ||
| # config="foo.json", | ||
| # provider="aws", | ||
| # device=None, | ||
| # ) | ||
|
|
||
| # estimate_job(args, MagicMock()) | ||
|
|
||
| # output = capsys.readouterr().out | ||
| # assert "✗ BSEQ" in output | ||
| # assert "requires a device" in output |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three test functions for the new estimate_job functionality are commented out. These tests cover important scenarios including Quantinuum HQC calculation, device-optional estimation, and device-required benchmarks. Uncomment and ensure these tests pass to validate the resource estimation feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. @cosenal , do you recall where we landed on these? I think this was part of our collective "merge main adventure".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just postponed fixing them, because we didn't know what code design we were going to land on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah that makes sense. Okay, fixed in 1698686
| ("Total 1q gates", fmt_int(estimate.total_gate_counts.one_qubit)), | ||
| ("Total 2q gates", fmt_int(estimate.total_gate_counts.two_qubit)), |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The order of 1q and 2q gate counts is inconsistent between the summary table (1q before 2q) and the per-circuit statistics table (2q before 1q, lines 206-207). Consider using the same ordering in both tables for consistency.
metriq_gym/benchmarks/qml_kernel.py
Outdated
|
|
||
|
|
||
| def create_inner_product_circuit(num_qubits: int, seed: int = 0) -> QuantumCircuit: | ||
| # TODO: Allow seed to be set externally for reproducibility |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO comment was introduced but doesn't relate to the resource estimation feature. If this is a pre-existing issue, it should be tracked separately rather than included in this PR.
| # TODO: Allow seed to be set externally for reproducibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be tracked as a separate issue. Did we add this, @cosenal ? (I don't quite recall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall adding this comment either, but it's out of scope for this PR anyway, so feel free to delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fc5c71a
Closes: #559
estimatemgym workflowExample:
Gives the following:
For certain benchmarks (e.g. BSEQ), the device topology is needed to provide a resource estimate:
(Note that HQC is n/a as the target provider was IBM and only Quantinuum uses HQC).
(tagging @nonhermitian for visibility)