perf(cli): speed up catalog merge key partitioning#2540
perf(cli): speed up catalog merge key partitioning#2540almsh wants to merge 2 commits intolingui:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2540 +/- ##
===========================================
+ Coverage 77.05% 89.39% +12.34%
===========================================
Files 84 118 +34
Lines 2157 3385 +1228
Branches 555 1001 +446
===========================================
+ Hits 1662 3026 +1364
+ Misses 382 324 -58
+ Partials 113 35 -78 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Improves mergeCatalog performance in the CLI by optimizing how catalog keys are partitioned during merges, which is a hot path for large catalogs.
Changes:
- Replaces repeated
Array.includesscans withSet.haslookups when both catalogs are non-empty. - Adds a fast path for cases where either the previous or next catalog has no keys to avoid unnecessary
Setcreation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
In our project, we have around 20k messages in Lingui catalogs, and this change cuts our extraction time nearly in half.
mergeCatalogcurrently partitions catalog keys with repeatedArray.includeschecks:For large catalogs, this becomes expensive because each
includescall may scan the other key array. This PR usesSet.haswhen both catalogs contain keys, reducing key partitioning fromO(N * P)repeated scans toO(N + P)Set construction and lookup work. It also keeps a fast path for empty catalogs, avoiding unnecessary Set creation during initial extraction or fully-obsolete cases.Operation Count Estimate
Let:
Current implementation:
Optimized version:
In our case, both the previous and next catalogs contain around 20k messages. With the current implementation, each extraction can require hundreds of millions of array comparison steps while partitioning catalog keys. With this change, the same step is reduced to roughly 100k Set build/lookups.
This keeps the behavior unchanged while making large catalog merges significantly cheaper.
One question about the checklist: since this change keeps the existing behavior covered by the current
mergeCatalogtests and only changes the key partitioning strategy, would you like me to add a small performance check or benchmark test for this path?Types of changes
Checklist