Skip to content

Conversation

alexkeizer
Copy link
Collaborator

This adds a dockerfile that has the right dependencies to run the experiment scripts.

Mostly taken from the paper repo, except here I use the actual files from the checkout, rather than downloading a new clone of the repo, hence I moved it to the lean-mlir repository.

I also added a run-script.sh wrapper, that you can use as ./run-script.sh collect-data-llvm-symbolic.py to run the specified script inside a docker container (assuming the image was previously built, e.g., with make docker-image).

There is still some work left; primarily making sure whatever directories we use to output data to in the scripts are actually mounted to the host fs.

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

2 similar comments
Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

2 similar comments
Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

Alive Statistics: 90 / 93 (3 failed)

Copy link

bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 19 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file gnarrowhmath_proof.lean
bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean
bitwuzla proved and leanSAT failed theorem 17 in file grem_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gtrunc_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file gshifthshift_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gadd2_proof.lean
bitwuzla proved and leanSAT failed theorem 5 in file g2012h08h28hudiv_ashl_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean
leanSAT and Bitwuzla solved: 7914
leanSAT and Bitwuzla provided 8 counterexamples
leanSAT and Bitwuzla both failed on 28 theorems
leanSAT failed and Bitwuzla succeeded on 26 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 107
ran rg 'Bitwuzla failed' | wc -l, expected 28, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 54, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 7914, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 7940, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 54 times
error The SMT solver timed out while solving the problem. was raised 28 times

Copy link

bitwuzla and leanSAT provided counterexample for theorem 3 in file gapinthcast_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 19 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file gnarrowhmath_proof.lean
bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean
bitwuzla proved and leanSAT failed theorem 17 in file grem_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 13 in file gapinthselect_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 15 in file gapinthselect_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 1 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 5 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 7 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file gshifthshift_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gtrunchinseltpoison_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gadd2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthadd_proof.lean
bitwuzla proved and leanSAT failed theorem 5 in file g2012h08h28hudiv_ashl_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthand_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean
In total we found 7976 goals.
leanSAT and Bitwuzla solved: 7914
leanSAT and Bitwuzla provided 8 counterexamples
leanSAT and Bitwuzla both failed on 29 theorems
leanSAT failed and Bitwuzla succeeded on 25 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 108
ran rg 'Bitwuzla failed' | wc -l, expected 29, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 54, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 7914, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 7939, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 54 times
error The SMT solver timed out while solving the problem. was raised 29 times

github-merge-queue bot pushed a commit that referenced this pull request Jun 22, 2025
This PR adds a minimal dockerfile for the artifact evaluation. #898 had
a rather large diff with main and included a few changes non strictly
docker-related. However, for the time being, I'd like to start adding
this minimal docker, which I will then update step-by-step as we
finalize the structure of our evaluation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants