Fix O(n²) in VarNameCleaner::findCleanName using per-base-name counter#16563
Fix O(n²) in VarNameCleaner::findCleanName using per-base-name counter#16563
Conversation
When many variables share the same stripped base name, findCleanName would probe name_1, name_2, ..., name_N sequentially for each variable, resulting in O(n²) calls to isUsedName (which involves regex matching via isRestrictedIdentifier). This was the dominant cost in fuzzer timeouts where the optimizer generated many variables. Track a counter per base name (m_nextSuffix) so we resume from the last assigned suffix instead of reprobing from 1 each time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Fix more Update More tests More tests
|
if you are very happy about this, then let's push this through instead that will fix all of this: |
nikola-matic
left a comment
There was a problem hiding this comment.
@clonker fixed the failing arch build - please rebase to fix the build.
| for (size_t i = 1; i < std::numeric_limits<decltype(i)>::max(); ++i) | ||
| // Use a per-base-name counter to avoid O(n²) probing when many | ||
| // variables share the same stripped base name. | ||
| size_t& nextSuffix = m_nextSuffix[newName]; |
There was a problem hiding this comment.
| size_t& nextSuffix = m_nextSuffix[newName]; | |
| size_t& nextSuffix = m_nextSuffix.at(newName); |
This should allow you to remove the mutable specifier from m_nextSuffix. Also, why are you storing a reference to an integral type here, i.e. size_t&. Just make it a plain size_t.
| if (nextSuffix == 0) | ||
| nextSuffix = 1; |
There was a problem hiding this comment.
Even better, you can just replace the whole thing with
size_t nextSuffix = util::valueOrDefault(m_nextSuffix, newName, 1);
There was a problem hiding this comment.
This has flaky written all over it, and I'd rather it not run in the CI every time someone pushes. You can delete this whole test, and we can benchmark separately. All of the optimizer tests are already available for benchmarking via the profiling.
| * Metadata: Store the state of the experimental mode in JSON and CBOR metadata. In CBOR this broadens the meaning of the existing `experimental` field, which used to indicate only the presence of certain experimental pragmas in the source. | ||
| * Standard JSON Interface: Introduce `settings.experimental` setting required for enabling the experimental mode. | ||
| * Yul Optimizer: Improve performance of control flow side effects collector and function references resolver. | ||
| * Yul Optimizer: Fix O(n²) performance in ``VarNameCleaner`` when many variables share the same base name. |
There was a problem hiding this comment.
There's technically no semantic changes here (or at least there shouldn't be), so the only user visible change should be the performance improvement, which would at least require a default sequence (end to end) benchmark of pre vs. post improvement. If it's significant, then I'd be fine with a changelog entry - but if it's a significant improvement for the VarNameCleaner, but has no statistically perceptible impact on total optimization, I'd rather not include it.
When many variables share the same stripped base name, findCleanName would probe name_1, name_2, ..., name_N sequentially for each variable, resulting in O(n²) calls to isUsedName (which involves regex matching via isRestrictedIdentifier). This was the dominant cost in fuzzer timeouts where the optimizer generated many variables.
Track a counter per base name (m_nextSuffix) so we resume from the last assigned suffix instead of reprobing from 1 each time.
Found by the fuzzer with custom optimization ordering.