Skip to content

Fix negative calldata offset validations#16522

Draft
cameel wants to merge 2 commits intoreorganize-calldata-semantic-testsfrom
fix-negative-calldata-offset-validations
Draft

Fix negative calldata offset validations#16522
cameel wants to merge 2 commits intoreorganize-calldata-semantic-testsfrom
fix-negative-calldata-offset-validations

Conversation

@cameel
Copy link
Copy Markdown
Collaborator

@cameel cameel commented Mar 12, 2026

Depends on #16483.

Fixes inconsistent offset validations in ABI coder v2. These sometimes miss cases that result in reading past calldatasize() and sometimes reject ones where negative offsets do not cause that and are therefore valid.

This is not a single bug, but rather a combination of multiple things. In some cases behavior even differs between evmasm and IR pipeline.

@cameel cameel self-assigned this Mar 12, 2026
@cameel cameel added codegen error Compiler generates invalid code. Critical. has dependencies The PR depends on other PRs that must be merged first labels Mar 12, 2026
@cameel
Copy link
Copy Markdown
Collaborator Author

cameel commented Mar 12, 2026

Note that this is still work in progress. For now it managed to add coverage that detects the inconsistencies and track one of the bugs (the one that caused evmasm and IR differences). The fix for that bug is still incomplete. The other inconsistencies, especially ones resulting from not using signed comparisons are not yet addressed.

Copy link
Copy Markdown
Collaborator Author

@cameel cameel Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _abi_encode.sol test variants do not add much. I will probably remove them unless someone thinks they are are worth keeping.

Each one has a corresponding, slightly simpler test that just returns the value as calldata. I wrote the _abi_encode.sol ones first and only later realized that returning calldata will not trigger decoding either.

}
m_context.callYulFunction(
m_context.utilFunctions().conversionFunction(typeOnStack, targetType),
typeOnStack.isDynamicallySized() ? 2 : 1,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix for the first bug. The problem here is that the evmasm codegen is sometimes still using code written for ABI coder v1 even when v2 is selected. This is fine in terms of correctness of decoding, but does not account for the validations, since most of them were only introduced in v2.

This will happen for example when you try to access a bytes element of bytes[] calldata. Note that this is not even a v1 type, but the routine kicks in, because we're only accessing bytes inside it and that one is.

I'm not yet sure this change (i.e. replacing everything with a call to v2) is the right solution here. It will likely significantly increase the cost and still needs adjustment for types with a different number of slots (assertions are failing in tests). It at least fix the validations though.

Comment on lines +15 to +33
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 1, "\xff" -> 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 1, "\xff"
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessTop(bytes[][][]): 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"

// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 1, "\xff" -> 0x20, 1, 0x20, 1, 0x20, 1, "\xff"
// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessMiddle(bytes[][][]): 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"

// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 1, "\xff" -> 0x20, 1, 0x20, 1, "\xff"
// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"
// accessBottom(bytes[][][]): 0x20, 1, -32 -> FAILURE, hex"08c379a0", 0x20, 43, "ABI decoding: invalid calldata a", "rray offset"

// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 1, "\xff" -> 0x20, 1, "\xff"
// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -32 -> 0x20, 1, "\xff"
// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, -32 -> 0x20, 1, "\xff"
// accessContent(bytes[][][]): 0x20, 1, -32 -> 0x20, 1, "\xff"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second bug. The input has negative offsets, but still pointing within calldata. None of these should result in a revert.

Comment on lines +14 to +36
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0 -> 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -999 -> 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0
// accessTop(bytes[][][]): 0x20, 1, 0x20, 1, -999 -> 0x20, 1, 0x20, 1, 0x20, 0
// accessTop(bytes[][][]): 0x20, 1, -999 -> 0x20, 1, 0x20, 0
// accessTop(bytes[][][]): -999 -> FAILURE, hex"08c379a0", 0x20, 34, "ABI decoding: invalid tuple offs", "et"

// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0 -> 0x20, 1, 0x20, 1, 0x20, 0
// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -999 -> 0x20, 1, 0x20, 1, 0x20, 0
// accessMiddle(bytes[][][]): 0x20, 1, 0x20, 1, -999 -> 0x20, 1, 0x20, 0
// accessMiddle(bytes[][][]): 0x20, 1, -999 -> 0x20, 0
// accessMiddle(bytes[][][]): -999 -> FAILURE, hex"08c379a0", 0x20, 34, "ABI decoding: invalid tuple offs", "et"

// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0 -> 0x20, 1, 0x20, 0
// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -999 -> 0x20, 1, 0x20, 0
// accessBottom(bytes[][][]): 0x20, 1, 0x20, 1, -999 -> 0x20, 0
// accessBottom(bytes[][][]): 0x20, 1, -999 -> FAILURE, hex"4e487b71", 0x32
// accessBottom(bytes[][][]): -999 -> FAILURE, hex"08c379a0", 0x20, 34, "ABI decoding: invalid tuple offs", "et"

// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, 0x20, 0 -> 0x20, 0
// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, 0x20, 1, -999 -> 0x20, 0
// accessContent(bytes[][][]): 0x20, 1, 0x20, 1, -999 -> FAILURE, hex"4e487b71", 0x32
// accessContent(bytes[][][]): 0x20, 1, -999 -> FAILURE, hex"4e487b71", 0x32
// accessContent(bytes[][][]): -999 -> FAILURE, hex"08c379a0", 0x20, 34, "ABI decoding: invalid tuple offs", "et"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially the bug from the report. The output here should match negative_offset_negative_tail_position_dynamic_array.sol, i.e. every case except the first one in each group (which is there only to show what the output looks like in the equivalent non-negative-offset case) should revert.

Comment on lines -1074 to -1083
// stack: <mem start> <source ref> (variably sized) <length> <mem data pos>
if (targetType.baseType()->isValueType())
{
copyToStackTop(2 + stackSize, stackSize);
ArrayUtils(m_context).copyArrayToMemory(typeOnStack);
}
else
{
m_context << u256(0) << Instruction::SWAP1;
// stack: <mem start> <source ref> (variably sized) <length> <counter> <mem data pos>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else here is actually yet another case that I only discovered while doing the fix - I don't have test coverage for it yet. It's the case with a dynamic array with statically-encoded elements that are not value types. E.g. S[] calldata, where S is a static struct.

// ====
// revertStrings: debug
// ----
// accessAndReturn(((uint256,bytes,uint256))): 32, -100 -> FAILURE, hex"08c379a0", 0x20, 39, "ABI decoding: invalid byte array", " length"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fourth bug. This one may result in accepting input that will result in reading beyond calldatasize(). This one is not covered by tests yet, but the problem is the faulty condition that produces a revert with this message.

The condition is:

if gt(add(src, length), end) { <revertStringLength>() }

Firstly, the error message is inaccurate. Note that it's not only invalid length that may cause this. It can also be the offset. If the offset is negative enough, the absolute start position ends up past the end due to underflow and the (unsigned) comparison fails.

However, when src is negative, but src + length still positive, the condition may be satisfied. This is wrong, because the initial part of the string is likely past calldatasize() in that case.

@cameel cameel force-pushed the fix-negative-calldata-offset-validations branch from 682dc66 to 13dfe30 Compare March 12, 2026 18:16
@cameel cameel force-pushed the reorganize-calldata-semantic-tests branch from d45a45e to 0df51f7 Compare March 12, 2026 18:16
@cameel cameel force-pushed the reorganize-calldata-semantic-tests branch from 0df51f7 to 8b17183 Compare March 12, 2026 18:20
@cameel cameel force-pushed the fix-negative-calldata-offset-validations branch from 13dfe30 to 20ef674 Compare March 12, 2026 18:20
@github-actions
Copy link
Copy Markdown

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 27, 2026
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codegen error Compiler generates invalid code. Critical. has dependencies The PR depends on other PRs that must be merged first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant