perf: index deep wildcards by terminal tag for O(1) lookup in findMatch by macieklamberski · Pull Request #1 · NaturalIntelligence/path-expression-matcher

macieklamberski · 2026-04-09T07:32:46Z

Summary

findMatch() currently scans the entire _deepWildcards array on every call making the complexity of O(N) where N is the number of deep wildcard expressions. For configurations with many ..tagName patterns, this becomes heavy.

This PR adds a _deepByTerminalTag Map that indexes deep wildcard expressions by their last segment's tag name. Since virtually all real-world deep wildcard patterns end with a specific tag (..title, ..script, ..user), the lookup becomes O(1) instead of O(N).

This makes the findMatch() 13x faster based on the benchmark.

The existing _deepWildcards array is kept as a fallback for the unindexable patterns (..*, root..).

Real-world example

In my feed parsing library, ~280 namespace stopNodes use ..ns:tag patterns via fast-xml-parser.

Benchmark

Click to show

```js
import { performance } from 'node:perf_hooks';
import { Expression, Matcher, ExpressionSet } from '../src/index.js';

function bench(name, fn, iterations = 5000) {
// Warmup
for (let i = 0; i < 500; i++) fn();

const start = performance.now();
for (let i = 0; i < iterations; i++) fn();
const elapsed = performance.now() - start;

const opsPerSec = Math.round(iterations / (elapsed / 1000));
const avgMs = (elapsed / iterations).toFixed(4);
console.log(`  ${name}: ${elapsed.toFixed(1)}ms total, ${avgMs}ms/op, ${opsPerSec.toLocaleString()} ops/sec`);
}

// ---------------------------------------------------------------------------
// Build ExpressionSet with ~280 deep wildcard expressions
// ---------------------------------------------------------------------------
const stopNodes = new ExpressionSet();
const prefixes = [
'ns1', 'ns2', 'ns3', 'ns4', 'ns5', 'ns6', 'ns7',
'ns8', 'ns9', 'ns10', 'ns11', 'ns12', 'ns13', 'ns14',
];
const tags = [
'title', 'creator', 'date', 'description', 'encoded', 'link',
'author', 'category', 'comments', 'guid', 'source', 'pubdate',
'keywords', 'rating', 'copyright', 'summary', 'duration', 'subtitle',
'block', 'image',
];
for (const prefix of prefixes) {
for (const tag of tags) {
    stopNodes.add(new Expression(`..${prefix}:${tag}`));
}
}
stopNodes.seal();
console.log(`ExpressionSet: ${stopNodes.size} deep wildcard expressions\n`);

// ---------------------------------------------------------------------------
// Scenario 1: matchesAny — non-matching tag (worst case: scans all deep wildcards)
// ---------------------------------------------------------------------------
console.log('--- Scenario 1: matchesAny (non-matching tag, worst case) ---');
{
const m = new Matcher();
m.push('root');
m.push('container');
m.push('child');

bench('matchesAny miss', () => {
    m.push('nonexistent');
    stopNodes.matchesAny(m);
    m.pop();
});
}

// ---------------------------------------------------------------------------
// Scenario 2: matchesAny — matching tag (best case for indexed: immediate hit)
// ---------------------------------------------------------------------------
console.log('\n--- Scenario 2: matchesAny (matching tag) ---');
{
const m = new Matcher();
m.push('root');
m.push('container');
m.push('child');

bench('matchesAny hit', () => {
    m.push('ns7:title');
    stopNodes.matchesAny(m);
    m.pop();
});
}

// ---------------------------------------------------------------------------
// Scenario 3: Document traversal simulation (50 items x 20 tags)
// Simulates how fast-xml-parser uses Matcher: one persistent instance,
// push/matchesAny/pop for every tag in the document.
// ---------------------------------------------------------------------------
console.log('\n--- Scenario 3: Document traversal (50 items x 20 tags) ---');
const leafTags = [
'title', 'link', 'description', 'pubdate', 'guid',
'author', 'category', 'comments', 'source', 'enclosure',
];
const prefixedTags = [
'ns1:creator', 'ns1:date', 'ns2:encoded', 'ns3:author',
'ns3:duration', 'ns4:content', 'ns7:comments', 'ns8:commentrss',
'ns5:link', 'ns6:updateperiod',
];

bench('50-item document', () => {
const m = new Matcher();
m.push('root');
m.push('container');
m.push('title'); stopNodes.matchesAny(m); m.pop();
m.push('link'); stopNodes.matchesAny(m); m.pop();

for (let i = 0; i < 50; i++) {
    m.push('item');
    for (const tag of leafTags) {
    m.push(tag); stopNodes.matchesAny(m); m.pop();
    }
    for (const tag of prefixedTags) {
    m.push(tag); stopNodes.matchesAny(m); m.pop();
    }
    m.pop(); // item
}
m.pop(); // container
m.pop(); // root
}, 2000);
```

perf: index deep wildcards by terminal tag for O(1) lookup in findMatch

6fdc4d6

macieklamberski mentioned this pull request Apr 9, 2026

Performance regression: 37-79% slower parsing between v5.3.7 and v5.5.9 NaturalIntelligence/fast-xml-parser#816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: index deep wildcards by terminal tag for O(1) lookup in findMatch#1

perf: index deep wildcards by terminal tag for O(1) lookup in findMatch#1
macieklamberski wants to merge 1 commit intoNaturalIntelligence:mainfrom
macieklamberski:perf/index-deep-wildcards-by-terminal-tag

macieklamberski commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

macieklamberski commented Apr 9, 2026

Summary

Real-world example

Benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant