Skip to content

feat: report golf compile cost (heartbeats + time) in /check-golf#1356

Open
Vilin97 wants to merge 6 commits into
leanprover-community:masterfrom
Vilin97:check-golf-timings
Open

feat: report golf compile cost (heartbeats + time) in /check-golf#1356
Vilin97 wants to merge 6 commits into
leanprover-community:masterfrom
Vilin97:check-golf-timings

Conversation

@Vilin97

@Vilin97 Vilin97 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Stacked on #1355 — please review that first. This PR adds one commit on top of #1355; its net diff over master therefore also contains #1355's script. Once #1355 merges, this becomes a small delta.

What

Extends the /check-golf bot so its comment also reports the compile cost of a golf: for every changed file it compiles the base and head versions and diffs their heartbeats and wall-clock time.

Changes (on top of #1355)

  • scripts/check_golf.py: new --measure flag.
    • Instruments each file by importing Mathlib.Util.CountHeartbeats, forcing set_option Elab.async false, and prefixing #count_heartbeats in before every declaration (placed above any attached doc comment so the declaration stays well-formed).
    • Heartbeats are deterministic; wall-clock time is reported as a secondary, noisier signal.
    • Adds a "Compile cost (base → head)" section: totals (with %) and a table of the largest heartbeat movers.
  • .github/workflows/check-golf.yml: builds the head revision (lake exe cache get + lake build) before measuring, then runs with --measure. The base sources are compiled against the built head environment, which is sound because the check has already verified the statements are unchanged.
  • scripts/README.md: documents --measure.

Why the async gymnastics

#count_heartbeats only sees ~0 heartbeats under Lean's default async elaboration (the proof runs in a background task). set_option Elab.async false makes the count accurate — confirmed by measuring real proofs (e.g. a ring proof reports ~1.5k heartbeats instead of ~40).

Testing

Validated locally on golfed files, e.g. HarmonicOscillator/Solution.lean: base 71,914 → head 71,979 heartbeats (+65) and 7.39s → 7.17s — i.e. this golf left compile cost essentially unchanged.

Add scripts/check_golf.py and a check-golf GitHub workflow. When a
`/check-golf` comment is posted on a PR, the bot compares the PR's base
and head, checks that no declaration statement (signature/type) changed
and only proofs/bodies changed, and upserts a single findings comment
(posting it once, editing it on later runs).

The Lean sources are parsed textually (no build): comments and
whitespace are ignored, declarations are segmented on column-0 command
starts, each statement is split from its body at the top-level `:=`,
`where`, or equation `|` arm, and `by` proof terms embedded inside a
type are treated as proof-irrelevant.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Thank you for this PR, which will now be reviewed. If submitting to ./Physlib or ./QuantumInfo, please see our review guidelines if you are not familiar with the process. You should expect a back and forth with a reviewer before your PR is merged. See also that link for how to add appropriate labels to your PR. The PR will also go through a number of automated checks. You can learn more about these here, including how to run them locally.

If you are submitting to ./PhyslibAlpha there will be a lighter review process, though your PR must still pass the automated checks.

If you want to bring attention to this PR, please write a message on this thread of the Lean Zulip.

Important: If a reviewer adds an awaiting-author label to your PR, once you have addressed the review comments, please remove that label by adding a comment with -awaiting-author. This helps us keep track of reviews.

@jstoobysmith jstoobysmith self-assigned this Jul 3, 2026
The report now also counts, among statement-preserving changes:
- proofs where only a newline was removed (identical tokens, fewer lines),
  which were previously not surfaced at all; and
- proofs where tactics were crammed onto one line with a `;` (excluding the
  `<;>` combinator), with the number of joins introduced.

Also relabel "definition bodies changed" to "definition values changed" and
spell out in the footer that it means a def/instance/abbrev whose type is
unchanged but whose defining term changed.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@Vilin97 Vilin97 force-pushed the check-golf-timings branch from 9981797 to 78845d0 Compare July 3, 2026 07:27
A `def`/`instance`/`abbrev` often carries proof obligations discharged by
`by` blocks. Golfing only those does not change the definition's data, so it
should not read as "the definition changed". The report now masks a
definition's `by` proof blocks and splits its changes into:
- "definition proofs golfed" (data unchanged), and
- "definition values changed" (the defining term changed outside proofs).

On PR leanprover-community#1332 this moves 76 of the 110 former "definition body" changes into
the proof-only bucket, leaving 34 genuine value changes.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@Vilin97 Vilin97 force-pushed the check-golf-timings branch from 78845d0 to 8854ee6 Compare July 3, 2026 07:39
A `def foo : T := by refine …` builds its *value* with tactics, so a
change inside that `by` block can change the definition itself, not just
a discharged proof obligation. Masking it as a proof made such edits
look like harmless proof golfs.

Only classify a changed def body as "proof golfed" when the term outside
`by` proof blocks is identical AND the body is not itself a single `by`
block (mask != "<by>"). Otherwise report it as a definition-value change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@Vilin97 Vilin97 force-pushed the check-golf-timings branch from 8854ee6 to 9172d2f Compare July 3, 2026 17:16
Vilin97 and others added 2 commits July 3, 2026 10:21
…ve changed)"

The bucket mixes genuine defining-term changes with defs where only proof
parts changed but that can't be proven textually — e.g. a `def := by
refine … ?_` whose data is untouched, or a structure-instance whose proof
fields were rewritten from tactic to term mode. Asserting "data changed"
over-claims those. Relabel to "Definition bodies changed (data *may* have
changed, type unchanged)" and add an explanatory note; the high-confidence
"Definition proofs golfed (data unchanged)" bucket is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
Extend check_golf.py with `--measure`: for every changed file it compiles
the base and head versions and diffs their heartbeats and wall-clock time,
adding a "Compile cost" section to the PR comment.

Heartbeats are measured with Mathlib's `#count_heartbeats in`, which is only
accurate under `Elab.async false` (otherwise the proof elaborates in a
background task the counter cannot see). The instrumenter forces synchronous
elaboration and prefixes the counter before each declaration, placing it
above any attached doc comment so the declaration stays well-formed.

The check-golf workflow now builds the head revision (`lake exe cache get`
+ `lake build`) before measuring, so the base sources can be compiled
against the built environment.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@Vilin97 Vilin97 force-pushed the check-golf-timings branch from 9172d2f to 27286f3 Compare July 3, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants