Skip to content

Fix O(n²) in VarNameCleaner::findCleanName using per-base-name counter#16563

Open
msooseth wants to merge 1 commit intodevelopfrom
fix-on2-namecleaner
Open

Fix O(n²) in VarNameCleaner::findCleanName using per-base-name counter#16563
msooseth wants to merge 1 commit intodevelopfrom
fix-on2-namecleaner

Conversation

@msooseth
Copy link
Copy Markdown
Contributor

When many variables share the same stripped base name, findCleanName would probe name_1, name_2, ..., name_N sequentially for each variable, resulting in O(n²) calls to isUsedName (which involves regex matching via isRestrictedIdentifier). This was the dominant cost in fuzzer timeouts where the optimizer generated many variables.

Track a counter per base name (m_nextSuffix) so we resume from the last assigned suffix instead of reprobing from 1 each time.

Found by the fuzzer with custom optimization ordering.

When many variables share the same stripped base name, findCleanName
would probe name_1, name_2, ..., name_N sequentially for each
variable, resulting in O(n²) calls to isUsedName (which involves
regex matching via isRestrictedIdentifier). This was the dominant
cost in fuzzer timeouts where the optimizer generated many variables.

Track a counter per base name (m_nextSuffix) so we resume from the
last assigned suffix instead of reprobing from 1 each time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix more

Update

More tests

More tests
@msooseth
Copy link
Copy Markdown
Contributor Author

if you are very happy about this, then let's push this through instead that will fix all of this:

#15969

Copy link
Copy Markdown
Contributor

@nikola-matic nikola-matic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clonker fixed the failing arch build - please rebase to fix the build.

for (size_t i = 1; i < std::numeric_limits<decltype(i)>::max(); ++i)
// Use a per-base-name counter to avoid O(n²) probing when many
// variables share the same stripped base name.
size_t& nextSuffix = m_nextSuffix[newName];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t& nextSuffix = m_nextSuffix[newName];
size_t& nextSuffix = m_nextSuffix.at(newName);

This should allow you to remove the mutable specifier from m_nextSuffix. Also, why are you storing a reference to an integral type here, i.e. size_t&. Just make it a plain size_t.

Comment on lines +108 to +109
if (nextSuffix == 0)
nextSuffix = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, you can just replace the whole thing with

size_t nextSuffix = util::valueOrDefault(m_nextSuffix, newName, 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has flaky written all over it, and I'd rather it not run in the CI every time someone pushes. You can delete this whole test, and we can benchmark separately. All of the optimizer tests are already available for benchmarking via the profiling.

* Metadata: Store the state of the experimental mode in JSON and CBOR metadata. In CBOR this broadens the meaning of the existing `experimental` field, which used to indicate only the presence of certain experimental pragmas in the source.
* Standard JSON Interface: Introduce `settings.experimental` setting required for enabling the experimental mode.
* Yul Optimizer: Improve performance of control flow side effects collector and function references resolver.
* Yul Optimizer: Fix O(n²) performance in ``VarNameCleaner`` when many variables share the same base name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's technically no semantic changes here (or at least there shouldn't be), so the only user visible change should be the performance improvement, which would at least require a default sequence (end to end) benchmark of pre vs. post improvement. If it's significant, then I'd be fine with a changelog entry - but if it's a significant improvement for the VarNameCleaner, but has no statistically perceptible impact on total optimization, I'd rather not include it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants