feat: report golf compile cost (heartbeats + time) in /check-golf#1356
feat: report golf compile cost (heartbeats + time) in /check-golf#1356Vilin97 wants to merge 6 commits into
Conversation
Add scripts/check_golf.py and a check-golf GitHub workflow. When a `/check-golf` comment is posted on a PR, the bot compares the PR's base and head, checks that no declaration statement (signature/type) changed and only proofs/bodies changed, and upserts a single findings comment (posting it once, editing it on later runs). The Lean sources are parsed textually (no build): comments and whitespace are ignored, declarations are segmented on column-0 command starts, each statement is split from its body at the top-level `:=`, `where`, or equation `|` arm, and `by` proof terms embedded inside a type are treated as proof-irrelevant. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
|
Thank you for this PR, which will now be reviewed. If submitting to ./Physlib or ./QuantumInfo, please see our review guidelines if you are not familiar with the process. You should expect a back and forth with a reviewer before your PR is merged. See also that link for how to add appropriate labels to your PR. The PR will also go through a number of automated checks. You can learn more about these here, including how to run them locally. If you are submitting to ./PhyslibAlpha there will be a lighter review process, though your PR must still pass the automated checks. If you want to bring attention to this PR, please write a message on this thread of the Lean Zulip. Important: If a reviewer adds an |
The report now also counts, among statement-preserving changes: - proofs where only a newline was removed (identical tokens, fewer lines), which were previously not surfaced at all; and - proofs where tactics were crammed onto one line with a `;` (excluding the `<;>` combinator), with the number of joins introduced. Also relabel "definition bodies changed" to "definition values changed" and spell out in the footer that it means a def/instance/abbrev whose type is unchanged but whose defining term changed. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
9981797 to
78845d0
Compare
A `def`/`instance`/`abbrev` often carries proof obligations discharged by `by` blocks. Golfing only those does not change the definition's data, so it should not read as "the definition changed". The report now masks a definition's `by` proof blocks and splits its changes into: - "definition proofs golfed" (data unchanged), and - "definition values changed" (the defining term changed outside proofs). On PR leanprover-community#1332 this moves 76 of the 110 former "definition body" changes into the proof-only bucket, leaving 34 genuine value changes. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
78845d0 to
8854ee6
Compare
A `def foo : T := by refine …` builds its *value* with tactics, so a change inside that `by` block can change the definition itself, not just a discharged proof obligation. Masking it as a proof made such edits look like harmless proof golfs. Only classify a changed def body as "proof golfed" when the term outside `by` proof blocks is identical AND the body is not itself a single `by` block (mask != "<by>"). Otherwise report it as a definition-value change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
8854ee6 to
9172d2f
Compare
…ve changed)" The bucket mixes genuine defining-term changes with defs where only proof parts changed but that can't be proven textually — e.g. a `def := by refine … ?_` whose data is untouched, or a structure-instance whose proof fields were rewritten from tactic to term mode. Asserting "data changed" over-claims those. Relabel to "Definition bodies changed (data *may* have changed, type unchanged)" and add an explanatory note; the high-confidence "Definition proofs golfed (data unchanged)" bucket is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
Extend check_golf.py with `--measure`: for every changed file it compiles the base and head versions and diffs their heartbeats and wall-clock time, adding a "Compile cost" section to the PR comment. Heartbeats are measured with Mathlib's `#count_heartbeats in`, which is only accurate under `Elab.async false` (otherwise the proof elaborates in a background task the counter cannot see). The instrumenter forces synchronous elaboration and prefixes the counter before each declaration, placing it above any attached doc comment so the declaration stays well-formed. The check-golf workflow now builds the head revision (`lake exe cache get` + `lake build`) before measuring, so the base sources can be compiled against the built environment. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
9172d2f to
27286f3
Compare
Stacked on #1355 — please review that first. This PR adds one commit on top of #1355; its net diff over
mastertherefore also contains #1355's script. Once #1355 merges, this becomes a small delta.What
Extends the
/check-golfbot so its comment also reports the compile cost of a golf: for every changed file it compiles the base and head versions and diffs their heartbeats and wall-clock time.Changes (on top of #1355)
scripts/check_golf.py: new--measureflag.Mathlib.Util.CountHeartbeats, forcingset_option Elab.async false, and prefixing#count_heartbeats inbefore every declaration (placed above any attached doc comment so the declaration stays well-formed)..github/workflows/check-golf.yml: builds the head revision (lake exe cache get+lake build) before measuring, then runs with--measure. The base sources are compiled against the built head environment, which is sound because the check has already verified the statements are unchanged.scripts/README.md: documents--measure.Why the async gymnastics
#count_heartbeatsonly sees ~0 heartbeats under Lean's default async elaboration (the proof runs in a background task).set_option Elab.async falsemakes the count accurate — confirmed by measuring real proofs (e.g. aringproof reports ~1.5k heartbeats instead of ~40).Testing
Validated locally on golfed files, e.g.
HarmonicOscillator/Solution.lean: base 71,914 → head 71,979 heartbeats (+65) and 7.39s → 7.17s — i.e. this golf left compile cost essentially unchanged.