Skip to content

perf(cli): speed up catalog merge key partitioning#2540

Open
almsh wants to merge 2 commits intolingui:mainfrom
almsh:merge-catalog-keys-perf
Open

perf(cli): speed up catalog merge key partitioning#2540
almsh wants to merge 2 commits intolingui:mainfrom
almsh:merge-catalog-keys-perf

Conversation

@almsh
Copy link
Copy Markdown

@almsh almsh commented Apr 30, 2026

Description

In our project, we have around 20k messages in Lingui catalogs, and this change cuts our extraction time nearly in half.

mergeCatalog currently partitions catalog keys with repeated Array.includes checks:

const newKeys = nextKeys.filter((key) => !prevKeys.includes(key))
const mergeKeys = nextKeys.filter((key) => prevKeys.includes(key))
const obsoleteKeys = prevKeys.filter((key) => !nextKeys.includes(key))

For large catalogs, this becomes expensive because each includes call may scan the other key array. This PR uses Set.has when both catalogs contain keys, reducing key partitioning from O(N * P) repeated scans to O(N + P) Set construction and lookup work. It also keeps a fast path for empty catalogs, avoiding unnecessary Set creation during initial extraction or fully-obsolete cases.

Operation Count Estimate

Let:

N = nextKeys.length
P = prevKeys.length

Current implementation:

newKeys:      N items * O(P) lookup in prevKeys
mergeKeys:   N items * O(P) lookup in prevKeys
obsoleteKeys: P items * O(N) lookup in nextKeys

total:       O(N * P)

Optimized version:

build Sets:  O(P + N)
newKeys:     N items * O(1) lookup in prevKeySet
mergeKeys:   N items * O(1) lookup in prevKeySet
obsoleteKeys: P items * O(1) lookup in nextKeySet

total:       O(N + P)

In our case, both the previous and next catalogs contain around 20k messages. With the current implementation, each extraction can require hundreds of millions of array comparison steps while partitioning catalog keys. With this change, the same step is reduced to roughly 100k Set build/lookups.

This keeps the behavior unchanged while making large catalog merges significantly cheaper.

One question about the checklist: since this change keeps the existing behavior covered by the current mergeCatalog tests and only changes the key partitioning strategy, would you like me to add a small performance check or benchmark test for this path?

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Examples update

Checklist

  • I have read the CONTRIBUTING and CODE_OF_CONDUCT docs
  • I have added tests that prove my fix is effective or that my feature works
  • I have added the necessary documentation (if appropriate)

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
js-lingui Ready Ready Preview Apr 30, 2026 2:53pm

Request Review

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.39%. Comparing base (6bb8983) to head (81f3ca0).
⚠️ Report is 328 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2540       +/-   ##
===========================================
+ Coverage   77.05%   89.39%   +12.34%     
===========================================
  Files          84      118       +34     
  Lines        2157     3385     +1228     
  Branches      555     1001      +446     
===========================================
+ Hits         1662     3026     +1364     
+ Misses        382      324       -58     
+ Partials      113       35       -78     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves mergeCatalog performance in the CLI by optimizing how catalog keys are partitioned during merges, which is a hot path for large catalogs.

Changes:

  • Replaces repeated Array.includes scans with Set.has lookups when both catalogs are non-empty.
  • Adds a fast path for cases where either the previous or next catalog has no keys to avoid unnecessary Set creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/cli/src/api/catalog/mergeCatalog.ts Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants