forked from chilts/mongodb-queue
-
Notifications
You must be signed in to change notification settings - Fork 0
💥 Improve database performance #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alecgibson
added a commit
that referenced
this pull request
Feb 6, 2025
This is a non-breaking change that forms part of the migration path to improving our performance in #15 When acking a job, we now unset the `visible` property, which makes `visible` and `deleted` mutually exclusive properties, so that the presence of one implies the absence of the other. We can do this, because the only time we query on `visible` is when we *also* query for `deleted: {$exists: false}`. Similarly, the only times we query for `deleted: {$exists: true}` are times when we don't query `visible`.
3331a93 to
53b6b09
Compare
This is a **BREAKING CHANGE** that aims to improve database performance
by changing the queries and indexes used to perform operations. There
will be no changes visible to consumers of the JavaScript API.
The break will:
- require consumers to run a migration query before upgrading
- add new indexes
- allow dropping of an old index
More details on migration are at the bottom of this commit message.
Motivation
----------
This library seems to have been written with the assumption that its
collections are small. However, we have hundreds of thousands of jobs in
various queues on Production, which causes some slow queries because of
the design choices made in the schema of this library.
In particular, we aim to address two issues:
1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by
practically every query in this library
2. counting inflight jobs has awful performance when there are many
available jobs
The result is that no further filtering is needed beyond the index on
any of the issued queries.
`$exists`
---------
MongoDB's `$exists` operator has [tricky index performance][1]. In the
best cases, `$exists: true` can use the index, but only if it's sparse,
and `$exists: false` can **never** just use the index: it always needs
to fetch documents.
In order to avoid the constant use of `$exists` in this library, we rely
on a logical paradigm shift: we `$unset` `visible` when acking the job,
so that `visible` and `deleted` are mutually exclusive fields. Therefore
we can:
- add a sparse index for both of these fields
- query on the field we care about, and **know** that it implies the
absence of the other field, allowing the removal of `$exists`
assertions
It should be noted that in local testing, I observed
[strange behaviour][2] when trying to use this partial index: we have to
use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to
leverage this index for some reason.
`inFlight()`
------------
The existing `inFlight()` query has particularly bad performance in
cases where there are many (hundreds of thousands) of jobs available to
pick up.
This is because the existing query uses the `deleted_1_visible_1` index,
but even after filtering by `deleted` and `visible` with the index, the
database will need to fetch every single job that could be picked up,
and check for `ack`, which is very slow.
We improve the performance here by:
- the removal of the `$exists` query (see above)
- the addition of a partial index that only contains unacked jobs that
have been retrieved at some point by `get()`. We can then filter
these by the current time to find in-flight jobs
Migration path
--------------
This performance improvement is built upon a shift in the assumptions
made about underlying job structure: namely, that `deleted` and
`visible` are now mutually exclusive properties (which was not true
before).
1. Bump patch version to [`7.1.1`][3]: this will start removing the
`visible` property from acked jobs in a non-breaking way
2. Deploy the patch to Production
3. Update any existing documents to match this new schema:
```js
db.collection.updateMany(
{deleted: {$exists: true}},
{$unset: {visible: 1}},
)
```
4. Bump major version to `8.0.0` and deploy
5. Drop old index `delted_1_visible_1`, which is no longer used
[1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance
[2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2
[3]: #16
53b6b09 to
795a8d9
Compare
dawidreedsy
approved these changes
Feb 10, 2025
Collaborator
Author
|
Migration checklist:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a BREAKING CHANGE that aims to improve database performance by changing the queries and indexes used to perform operations. There will be no changes visible to consumers of the JavaScript API.
The break will:
More details on migration are at the bottom of this commit message.
Motivation
This library seems to have been written with the assumption that its collections are small. However, we have hundreds of thousands of jobs in various queues on Production, which causes some slow queries because of the design choices made in the schema of this library.
In particular, we aim to address two issues:
{deleted: {$exists: boolean}}calls are inefficient, but used by practically every query in this libraryThe result is that no further filtering is needed beyond the index on any of the issued queries.
$existsMongoDB's
$existsoperator has tricky index performance. In the best cases,$exists: truecan use the index, but only if it's sparse, and$exists: falsecan never just use the index: it always needs to fetch documents.In order to avoid the constant use of
$existsin this library, we rely on a logical paradigm shift: we$unsetvisiblewhen acking the job, so thatvisibleanddeletedare mutually exclusive fields. Therefore we can:$existsassertionsIt should be noted that in local testing, I observed strange behaviour when trying to use this partial index: we have to use
ack: {$gt: ''}instead ofack: {$exists: true}to get MongoDB to leverage this index for some reason.inFlight()The existing
inFlight()query has particularly bad performance in cases where there are many (hundreds of thousands) of jobs available to pick up.This is because the existing query uses the
deleted_1_visible_1index, but even after filtering bydeletedandvisiblewith the index, the database will need to fetch every single job that could be picked up, and check forack, which is very slow.We improve the performance here by:
$existsquery (see above)get(). We can then filter these by the current time to find in-flight jobsMigration path
This performance improvement is built upon a shift in the assumptions made about underlying job structure: namely, that
deletedandvisibleare now mutually exclusive properties (which was not true before).Bump patch version to
7.1.1: this will start removing thevisibleproperty from acked jobs in a non-breaking wayDeploy the patch to Production
Update any existing documents to match this new schema:
Bump major version to
8.0.0and deployDrop old index
delted_1_visible_1, which is no longer usedDepends on:
visiblewhen acking jobs #16