Skip to content

perf: index deep wildcards by terminal tag for O(1) lookup in findMatch#1

Draft
macieklamberski wants to merge 1 commit intoNaturalIntelligence:mainfrom
macieklamberski:perf/index-deep-wildcards-by-terminal-tag
Draft

perf: index deep wildcards by terminal tag for O(1) lookup in findMatch#1
macieklamberski wants to merge 1 commit intoNaturalIntelligence:mainfrom
macieklamberski:perf/index-deep-wildcards-by-terminal-tag

Conversation

@macieklamberski
Copy link
Copy Markdown

Summary

findMatch() currently scans the entire _deepWildcards array on every call making the complexity of O(N) where N is the number of deep wildcard expressions. For configurations with many ..tagName patterns, this becomes heavy.

This PR adds a _deepByTerminalTag Map that indexes deep wildcard expressions by their last segment's tag name. Since virtually all real-world deep wildcard patterns end with a specific tag (..title, ..script, ..user), the lookup becomes O(1) instead of O(N).

This makes the findMatch() 13x faster based on the benchmark.

The existing _deepWildcards array is kept as a fallback for the unindexable patterns (..*, root..).

Real-world example

In my feed parsing library, ~280 namespace stopNodes use ..ns:tag patterns via fast-xml-parser.

Benchmark

Click to show
```js
import { performance } from 'node:perf_hooks';
import { Expression, Matcher, ExpressionSet } from '../src/index.js';

function bench(name, fn, iterations = 5000) {
// Warmup
for (let i = 0; i < 500; i++) fn();

const start = performance.now();
for (let i = 0; i < iterations; i++) fn();
const elapsed = performance.now() - start;

const opsPerSec = Math.round(iterations / (elapsed / 1000));
const avgMs = (elapsed / iterations).toFixed(4);
console.log(`  ${name}: ${elapsed.toFixed(1)}ms total, ${avgMs}ms/op, ${opsPerSec.toLocaleString()} ops/sec`);
}

// ---------------------------------------------------------------------------
// Build ExpressionSet with ~280 deep wildcard expressions
// ---------------------------------------------------------------------------
const stopNodes = new ExpressionSet();
const prefixes = [
'ns1', 'ns2', 'ns3', 'ns4', 'ns5', 'ns6', 'ns7',
'ns8', 'ns9', 'ns10', 'ns11', 'ns12', 'ns13', 'ns14',
];
const tags = [
'title', 'creator', 'date', 'description', 'encoded', 'link',
'author', 'category', 'comments', 'guid', 'source', 'pubdate',
'keywords', 'rating', 'copyright', 'summary', 'duration', 'subtitle',
'block', 'image',
];
for (const prefix of prefixes) {
for (const tag of tags) {
    stopNodes.add(new Expression(`..${prefix}:${tag}`));
}
}
stopNodes.seal();
console.log(`ExpressionSet: ${stopNodes.size} deep wildcard expressions\n`);

// ---------------------------------------------------------------------------
// Scenario 1: matchesAny — non-matching tag (worst case: scans all deep wildcards)
// ---------------------------------------------------------------------------
console.log('--- Scenario 1: matchesAny (non-matching tag, worst case) ---');
{
const m = new Matcher();
m.push('root');
m.push('container');
m.push('child');

bench('matchesAny miss', () => {
    m.push('nonexistent');
    stopNodes.matchesAny(m);
    m.pop();
});
}

// ---------------------------------------------------------------------------
// Scenario 2: matchesAny — matching tag (best case for indexed: immediate hit)
// ---------------------------------------------------------------------------
console.log('\n--- Scenario 2: matchesAny (matching tag) ---');
{
const m = new Matcher();
m.push('root');
m.push('container');
m.push('child');

bench('matchesAny hit', () => {
    m.push('ns7:title');
    stopNodes.matchesAny(m);
    m.pop();
});
}

// ---------------------------------------------------------------------------
// Scenario 3: Document traversal simulation (50 items x 20 tags)
// Simulates how fast-xml-parser uses Matcher: one persistent instance,
// push/matchesAny/pop for every tag in the document.
// ---------------------------------------------------------------------------
console.log('\n--- Scenario 3: Document traversal (50 items x 20 tags) ---');
const leafTags = [
'title', 'link', 'description', 'pubdate', 'guid',
'author', 'category', 'comments', 'source', 'enclosure',
];
const prefixedTags = [
'ns1:creator', 'ns1:date', 'ns2:encoded', 'ns3:author',
'ns3:duration', 'ns4:content', 'ns7:comments', 'ns8:commentrss',
'ns5:link', 'ns6:updateperiod',
];

bench('50-item document', () => {
const m = new Matcher();
m.push('root');
m.push('container');
m.push('title'); stopNodes.matchesAny(m); m.pop();
m.push('link'); stopNodes.matchesAny(m); m.pop();

for (let i = 0; i < 50; i++) {
    m.push('item');
    for (const tag of leafTags) {
    m.push(tag); stopNodes.matchesAny(m); m.pop();
    }
    for (const tag of prefixedTags) {
    m.push(tag); stopNodes.matchesAny(m); m.pop();
    }
    m.pop(); // item
}
m.pop(); // container
m.pop(); // root
}, 2000);
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant