Skip to content

Add Yul AST comparator tool#16528

Open
clonker wants to merge 3 commits intodevelopfrom
yul_cmp_ast
Open

Add Yul AST comparator tool#16528
clonker wants to merge 3 commits intodevelopfrom
yul_cmp_ast

Conversation

@clonker
Copy link
Copy Markdown
Member

@clonker clonker commented Mar 17, 2026

Adds a Yul AST comparator tool yulASTComparator that structurally compares two Yul ASTs, treating variable and function names as equivalent if they correspond 1:1 via a scoped bidirectional map. This is useful for verifying that optimizer changes or internal renaming (e.g., switching name-generation schemes) preserve semantic equivalence. The tool reports the first point of divergence with a path and a reason.

@clonker clonker force-pushed the yul_cmp_ast branch 2 times, most recently from 05bedb8 to fc717f0 Compare March 17, 2026 13:08
@clonker clonker marked this pull request as ready for review March 17, 2026 13:13
Copy link
Copy Markdown
Contributor

@msooseth msooseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool! Maybe we could also write a printer that allows us to pretty-print the AST and then we can diff it? May be helpful in the long run to debug/understand differences?

I am stupid, that's yul.

@clonker clonker force-pushed the yul_cmp_ast branch 2 times, most recently from 2562bfa to af8eb53 Compare March 17, 2026 17:24
Copy link
Copy Markdown
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I only checked the overall structure. I will need another pass to actually check the implementation for correctness.

Generally the tool looks like a good idea. Would be even better if this was a proper diff that does not stop at the first difference, but that's much more work. For what we need this is already good enough.

TBH I don't like how long the Python script is, but it still looks useful. I'd just prefer it to be less fuzzy with that it does. I bet it has tons of false positives and false negatives in general usage outside of the AST ID PR.

I left some initial comments, but that's not a full review yet.

)
target_link_libraries(libYulASTComparator PUBLIC solidity)

add_executable(yulASTComparator main.cpp)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_executable(yulASTComparator main.cpp)
add_executable(yul-ast-comparator main.cpp)

Or maybe we should give the executable a shorter name. E.g. yulcmp oryuldiff?

return object;
}

int main(int argc, char* argv[])
Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All our executables should have standard top-level error handlers to catch failed asserts and unhandled exceptions from lower layers. See for example solc/main.cpp.

print(f" Skipped: {len(skipped)}")
print("=" * 50)

if mismatches:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthy comparisons are evil.

Suggested change
if mismatches:
if len(mismatches) != 0:

equiv, msg = run_comparator(comparator, yul_a, yul_b)
if equiv:
continue
if "PARSE_ERROR" in msg or "TIMEOUT" in msg or "ERROR" in msg:
Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how fuzzy these comparisons are. Note how "PARSE_ERROR" in msg overlaps with "ERROR" in msg. If not for the fact that you handle both the same way, they'd not be even usable, because these two errors are not distinguishable this way. Why isn't the kind of error just a separate return variable?

Comment on lines +160 to +165
if "cmdlineTests" in path:
if path.endswith("output.json"):
return "json"
if path.endswith("/output") or path.endswith("/err"):
return "text"
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So e.g. input.json will not be classified as 'json'? Is that intentional?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks you have here are a bit too fuzzy for my taste.

You should at least use pathlib. For example your check will match somethingoutput.json while with pathlib a simple path.name == 'output.json' by default does the right thing. Same with matching the directory names - I'd not use in for those.

The fact that things that are not classified are just skipped is not great either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants