-
Notifications
You must be signed in to change notification settings - Fork 14
optimized filtering #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
this greatly reduces the size of delta messages in stacks with large data changes.
65ce8d9
to
1854db3
Compare
this tries to optimize filtering and match detection in two ways: - by grouping on match pattern: this means we only do detection and filtering once for a given match pattern (takes the pattern and sendMatchesOnly into account) - by doing matching before origin filter, since the latter seems more costly
1854db3
to
34004ff
Compare
this is not the prettiest of solutions, but it does achieve the same thing. bonus is it only happens once on startup
it's mentioned in the expressjs best practices (and elsewhere) that process.env should be used sparingly: | If you need to write environment-specific code, you can check the value of NODE_ENV with process.env.NODE_ENV. Be aware that checking the value of any environment variable incurs a performance penalty, and so should be done sparingly. https://expressjs.com/th/advanced/best-practice-performance.html#set-node_env-to-production it seems this is because process.env is a getter that processes the ENV array each time
} | ||
} else { | ||
if (process.env["DEBUG_DELTA_SEND"] || process.env["DEBUG_DELTA_NOT_SENDING_EMPTY"]) | ||
if (DEBUG_DELTA_SEND || DEBUG_DELTA_NOT_SENDING_EMPTY) | ||
console.log(`Changeset empty. Not sending to ${entry.callback.method} ${entry.callback.url}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this, but I don't think this situation can occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some optimizations are possible. I added comments in adjusted code below, because the suggestions are interlinked with each other, so separate comments could become confusing:
async function informWatchers( changeSets, res, muCallIdTrail, muSessionId ){
/* This code used to be inside the full loop, but this is not needed.
* If the entry has `sendMatchesOnly` set to true, using changedTriples can be avoided (see below).
* If `sendMatchesOnly` is set to false, the code using `changedTriples` will always need all triples
* (because there is no extra "filtering").
* so there is no need to recalculate it every time.
*/
let allInserts = [];
let allDeletes = [];
changeSets.forEach( (change) => {
allInserts = [...allInserts, ...change.insert];
allDeletes = [...allDeletes, ...change.delete];
} );
const changedTriples = [...allInserts, ...allDeletes];
// Iterate over each unique match pattern
for (const matchKey in groupedServices) {
const firstEntry = groupedServices[matchKey][0];
// can use first entry since it's part of grouping
const sendMatchesOnly = firstEntry.options.sendMatchesOnly;
let maybePatternFilteredChangesets = changeSets;
if (sendMatchesOnly) {
maybePatternFilteredChangesets = filterChangesetsOnPattern(changeSets, firstEntry);
/* Currently, fully empty change sets are send too, but I don't think
* this is desired if `sendMatchesOnly` is set to true?
* Note: can also do this filtering in `filterChangesetsOnPattern` directly
* (by having an if-case or similar around `filteredChangesets.push(clonedChangeSet);`)
*/
maybePatternFilteredChangesets = maybePatternFilteredChangesets.filter(
(change) =>
change.insert.length > 0 ||
change.delete.length > 0 || change.effectiveInsert.length > 0 ||
change.effectiveDelete.length > 0
);
}
/* `filterChangesetsOnPattern` already does all the filtering for matches,
* so if `sendMatchesOnly` is true, we can just check if there are still
* non-empty change sets left to send. If all are empty (or none are left,
* in case of filtering them away as stated above), there are no matches with the spec.
*/
const someTripleMatchedSpec = sendMatchesOnly
? maybePatternFilteredChangesets.length > 0 /* makes the assumption that maybePatternFilteredChangesets has no empty change sets. */
: changedTriples.some((triple) =>
tripleMatchesSpec(triple, firstEntry.match)
);
/* The if can be taken out of the forEach loop, to avoid the loop all together
* if no triples will be send out. Note that `DEBUG_TRIPLE_MATCHES_SPEC` should
* be moved before the `if` case then. However, this does change the debug output:
* before it would print a line per match (true/false), now it will do this
* per service group. The difference is negligible, so this change could be skipped.
*/
if( someTripleMatchedSpec ) {
const matchingServices = groupedServices[matchKey];
matchingServices.forEach( async (entry) => {
/// rest of the code
});
}
// for each entity | ||
if( DEBUG_DELTA_MATCH ) | ||
console.log(`Checking if we want to send to ${entry.callback.url}`); | ||
const matchSpec = entry.match; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused code
const matchSpec = entry.match; |
builds on #20, but has some optimizations wrt to filtering:
options.sendMatchesOnly
into account as a grouping key)edit: can be tested using
nvdk/mu-delta-notifier:feature-optimized-filtering
. see https://github.com/lblod/app-lblod-harvester/pull/79/files for an example