Skip to content

Conversation

alexkeizer
Copy link
Collaborator

@alexkeizer alexkeizer commented May 14, 2025

This PR changes the implementation of simp_alive_split---which used to call split and split_ifs and thus introduced new subgoals---with a single simp call (and some intros to preserve that behaviour). This is possible as the if statements used in the semantics follow a strict pattern, where one branch of the if is always poison. Such ifs with poison admit straightforward simplification lemmas, which this PR adds, thus removing the need to split on the condition.

NOTE: I've confirmed the new implementation of simp_alive_split successfully closes all the expected goals inAliveStatements, but we ought to wait for the full evaluation to run before merging this PR, as there could be breakage there.

Keep in mind that we do expect the numbers of solved theorems reported by the evaluation to go down, given that we are actively trying to reduce the number of subgoals generated by our automation (hence reducing the number of distinct calls to the SAT solvers).

This PR changes the implementation of `simp_alive_split`---which used to call `split` and `split_ifs` and thus introduced new subgoals---with a single simp call (and some `intros` to preserve that behaviour). This is possible as the `if` statements used in the semantics follow a strict pattern, where one branch of the if is always `poison`. Suchs ifs with poison admit straightforward simplification lemmas, which this PR adds, thus removing the need to split on the condition.
The latter is the simp normal form, so it's better we use that to define semantics
Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

bitwuzla proved and leanSAT failed theorem 2 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 2 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 25 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 26 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 8 in file gapinthsub_proof.lean
bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file grem_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file greassociatehnuw_proof.lean
leanSAT and Bitwuzla solved: 5020
leanSAT and Bitwuzla provided 25 counterexamples
leanSAT and Bitwuzla both failed on 34 theorems
leanSAT failed and Bitwuzla succeeded on 17 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 112
ran rg 'Bitwuzla failed' | wc -l, expected 34, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 51, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 25, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 25, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 5020, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 5037, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error None of the hypotheses are in the supported BitVec fragment after applying preprocessing. was raised 8 times
error The SAT solver timed out while solving the problem. was raised 49 times
error The SMT solver timed out while solving the problem. was raised 30 times

Copy link

Alive Statistics: 90 / 93 (3 failed)

1 similar comment
Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

bitwuzla proved and leanSAT failed theorem 2 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 2 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 25 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 26 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 8 in file gapinthsub_proof.lean
bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthrem2_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file grem_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla provided counterexample and leanSAT failed theorem 1 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file greassociatehnuw_proof.lean
leanSAT and Bitwuzla solved: 5054
leanSAT and Bitwuzla provided 5 counterexamples
leanSAT and Bitwuzla both failed on 30 theorems
leanSAT failed and Bitwuzla succeeded on 17 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 104
ran rg 'Bitwuzla failed' | wc -l, expected 30, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 47, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 5, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 5, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 5054, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 5071, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 49 times
error The SMT solver timed out while solving the problem. was raised 30 times

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

bitwuzla proved and leanSAT failed theorem 2 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 2 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 25 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 26 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 8 in file gapinthsub_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file grem_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file greassociatehnuw_proof.lean
leanSAT and Bitwuzla solved: 4941
leanSAT and Bitwuzla provided 2 counterexamples
leanSAT and Bitwuzla both failed on 30 theorems
leanSAT failed and Bitwuzla succeeded on 17 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 102
ran rg 'Bitwuzla failed' | wc -l, expected 30, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 47, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 2, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 2, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 4941, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 4958, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 47 times
error The SMT solver timed out while solving the problem. was raised 30 times

Copy link

bitwuzla proved and leanSAT failed theorem 2 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 2 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 27 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 28 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gsremhcanonicalize_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 8 in file gapinthsub_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file grem_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file greassociatehnuw_proof.lean
leanSAT and Bitwuzla solved: 6084
leanSAT and Bitwuzla provided 3 counterexamples
leanSAT and Bitwuzla both failed on 16 theorems
leanSAT failed and Bitwuzla succeeded on 17 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 74
ran rg 'Bitwuzla failed' | wc -l, expected 16, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 33, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 3, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 3, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 6084, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 6101, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 33 times
error The SMT solver timed out while solving the problem. was raised 16 times

@alexkeizer alexkeizer marked this pull request as draft May 21, 2025 10:46
@alexkeizer
Copy link
Collaborator Author

This seems to work well enough, but to avoid causing problems with the evaluation I'll not replace the implementation of the existing tactic, but rather make this into a new tactic, which will be specialized for the new SLLVM dialect (which models proper immediate UB).

github-merge-queue bot pushed a commit that referenced this pull request May 22, 2025
This PR adds a parser and proof automation for the new SLLVM dialect,
based on the existing LLVM parser&proof automation, and incorporating
some new stuff I proposed in #1220.

I copied AliveStatements so I wouldn't interfere with the existing
evaluation, and then semi-manually changed it to
* use the SLLVM dialect (i.e., proper UB semantics)
* instantiate rewrites to 64 bits (as I haven't bothered implementing
MetaSLLVM yet), except for one testcase where bitblasting timed-out for
any width > 7.

All test cases that were proven before are now proven with proper UB
semantics. Two rewrites that we sorried before turned out to just be
plain false (even without UB), so I noted down the counter-example and
ignored those test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant