Skip to content

Add builtin erc7201#15968

Merged
matheusaaguiar merged 3 commits intodevelopfrom
builtinERC7201
Mar 11, 2026
Merged

Add builtin erc7201#15968
matheusaaguiar merged 3 commits intodevelopfrom
builtinERC7201

Conversation

@matheusaaguiar
Copy link
Copy Markdown
Contributor

part of #15727.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 10, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 10, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 25, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 25, 2025
@argotorg argotorg deleted a comment from github-actions bot Apr 25, 2025
@argotorg argotorg deleted a comment from github-actions bot Apr 25, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label May 10, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label May 10, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label May 25, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label May 26, 2025
@matheusaaguiar matheusaaguiar force-pushed the builtinERC7201 branch 2 times, most recently from f3cb5c0 to b7d9916 Compare May 29, 2025 09:54
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Jun 26, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Jun 26, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Jul 23, 2025
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Jul 23, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Aug 7, 2025
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Aug 7, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Aug 22, 2025
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Aug 22, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Sep 6, 2025
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Sep 7, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Sep 22, 2025
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Sep 22, 2025
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Oct 7, 2025
@matheusaaguiar
Copy link
Copy Markdown
Contributor Author

@cameel I think I have covered all of your review comments.

Copy link
Copy Markdown
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need another pass to check the tests, but the code already looks good for the most part.

I found one bug and the evmasm implementation can be simplified, but other than that the comments are mostly wording and readability tweaks at this point. We're pretty close to being done here.

Type const* type;
rational value;
std::variant<std::monostate, rational, std::string> value;
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to do now, but just FYI, seeing how tedious using this is and how the two fields can easily become inconsistent, we should turn it into a properly encapsulated class some day. That would mean:

  • a constructor that ensures that the value variant matches type (especially that it's monostate IFF type is null.
  • Helper for checking if it's empty.
  • Getters
  • Helpers that assert that variant contains a specific type and get it from the variant. E.g. asRational() or get<rational>().

<erc7201Builtin>(<namespaceID>)
)");

IRVariable stringArg = convert(*arguments[0], *TypeProvider::stringMemory());
Copy link
Copy Markdown
Collaborator

@cameel cameel Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how the evmasm version avoids allocation by putting the encoded data past free mem ptr and not finalizing, we could optimize this by doing the same. I don't think it would be worth it just for erc7201(), but keccak256() above uses the same mechanism.

This would make erc7201() and keccak256() with storage and calldata arguments a bit more efficient.

I would not do it in this PR, but it's worth considering as a separate optimization.

@cameel
Copy link
Copy Markdown
Collaborator

cameel commented Mar 2, 2026

Regarding tests, the main missing thing is the distinction between tests for comptime and runtime environment. We should be testing both separately, because the behavior and checks are different.

The test naming could also be a bit more regular. Tests that have something in common could use a common prefix - that always makes it easier to check if we have coverage for something. E.g. tests covering args could all start with erc7201_arg_.

@matheusaaguiar
Copy link
Copy Markdown
Contributor Author

@cameel, I organized the tests as you suggested and then I noticed that the legacy codegen has a problem when the param is in storage (and is not constant).
semanticTests/builtinFunctions/erc7201_equivalent_solidity_spec_comptime.sol and syntaxTests/globalFunctions/erc7201_builtin_param_unicode_literal.sol are two tests that are failing because of that.
I spent quite some time and haven't yet figured out what is wrong. The hand-crafted version from previous commits works, but not this recent version reusing the Yul function from IR pipeline. I was suspecting something with the way CompilerUtils::packedEncode/encodeToMemory is converting from storage but can't see anything wrong.

// stack layout: string_slot
utils().fetchFreeMemoryPointer();
// stack layout: string_slot mem_start_ptr
m_context << Instruction::SWAP1 << Instruction::DUP2;
Copy link
Copy Markdown
Contributor Author

@matheusaaguiar matheusaaguiar Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameel , Before you had suggested Instruction::DUP1, but I think that then packedEncoded was reading the duplicated mem_start_ptr value as the position of the string in storage (which I assume here is already on the stack as a result of visiting the arg previously).

Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true. In #15968 (comment) I did not properly account for the fact that we don't start with an empty stack here. It contains builtin's arguments, but those were not originally shown by your "stack layout" comments, so I missed that. So it's good that we tried this actually - we were working with a wrong assumption and that would have bitten us later, possibly in a worse way.

From changes in the code I see you mostly figured that out already. I think the final piece you're missing is that a single argument does not always occupy one slot. For example, a calldata string takes two, because we also store its length on the stack. And a string literal takes zero because we store it in the code (via IR you get helpers like store_literal_in_memory_0ed2dd5b47abd17a4e44572cbe6fa1a37667205d927a6a1fba1e6feae1bdbace() that produce it on demand).

This means that using DUP to avoid reading free mem ptr twice is trickier than I assumed. One solution is to check the number of stack slots occupied by argType and choose the right DUP instruction based on that. Another is to make it simpler and just use fetchFreeMemoryPointer() again after encoding. I think I'd go for the latter. I only tried to optimize it this way because it seemed straightforward. If it's not and makes the code more complicated it's not worth it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending some time staring at the assembly, I finally saw what the problem was (and it was very simple and obvious now in hindsight 😄 )
I hadn't realized that encodePacked writes the data in memory but doesn't write the size first. Before, in the hand-crafted assembly, the pointer to the data and the size were directly written in the stack for the use of keccak256 instructions.
Now, using the Yul function, it expects to find the string properly encoded in memory, prefixed by the size and followed by the data.
So it was just a matter of skipping the first 32 bytes of the free memory area before encoding, and then write the size at the start.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Now we have it figured out.

Though, instead of storing the length in memory in evmasm codegen, might be better to refactor the erc7201() helper so that it just expects length on stack. Then in the IR codegen just move the bit that fetches the length to the call site.

Also, does the number of slots taken by encodePacked()'s input not make a difference? The string literal case is handled separately, but I'd expect at least calldata strings or their slices to take two slots there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am checking the code to make sure, but looks like it properly takes into account the number of slots depending on the type. Tests with calldata are passing and the size seems to be correctly calculated. Still, I will confirm it by inspecting the compiler code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, now that we've gone back to a version where you just put free mem ptr on the stack and don't touch the string reference itself, it should work even if we're counting the slots wrong. packedEncode() will just grab however many there are on the stack for the expected type. So at most the stack layout comments could be wrong, but if tests are passing, it should be fine.

So not a blocker for merging the PR, but I think we should still understand what's happening in this case. Annoyingly, that requires inspecting the generated asm and going through the encoding routine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I checked packedEncoded and printed the encoding functions it uses (it calls abi encode Yul functions from the IR codegen). It ends up calling this function:

function abi_encode_t_string_calldata_ptr_to_t_string_memory_ptr_nonPadded_inplace_fromStack(start, length, pos) -> end {
	pos := array_storeLengthForEncoding_t_string_memory_ptr_nonPadded_inplace_fromStack(pos, length)
	copy_calldata_to_memory_with_cleanup(start, pos, length)
	end := add(pos, length)
}

I tested with many parameters and changing the position of the string, something like, function test(uint x, string calldata namespaceID, bool c) ... and it still gets the string right.

Copy link
Copy Markdown
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost ready to go. Just a few more test tweaks (please also check previous unresolved comments) and I'd also prefer the Yul helper to be refactored a bit to take length on stack. That would make the evmasm part simpler, which in turn would make me more confident it's correct (especially the use appendInlineAssembly() IMO muddles things a bit).

Copy link
Copy Markdown
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things related to the last refactor.

But overall it's good. I think we should be able to merge this today.

Copy link
Copy Markdown
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants