Implement Glushkov's NFA into cuDF by lingyany-nv · Pull Request #21936 · rapidsai/cudf

lingyany-nv · 2026-03-25T21:09:02Z

Description

Add bit-parallel Glushkov NFA regex engine with shared memory optimization

Implement Glushkov's NFA for regex string matching in cuDF to be more GPU friendly, references (1) hyperscan paper (2) HybridSA paper (3) vectorscan repo. Basically, this is Glushkov's NFA compared with the other popular Thompson's NFA (also used in current cuDF regex). The Glushkov engine represents NFA state as a single uint64_t bitmask (max 64 positions), requiring zero GPU working memory per thread compared to Thompson NFA's per-thread state arrays. A shared memory cache further accelerates execution by cooperatively loading read-only program data (reach masks, shift masks, exception successors) into SMEM at kernel entry.

Key changes

Two-phase O(n) unanchored search algorithm (glushkov.inl): Phase 1 scans forward, injecting start states each character and recording provisional match ends. Phase 2 rescans only the match region to find the true leftmost start. Each character is processed at most twice.
Leftmost-first correctness via priority-kill (glushkov.inl, glushkov_regcomp.cpp): A runtime glushkov_priority_kill clears lower-priority alternative paths at accept time. A compile-time conflict detector (frontier_has_priority_conflict) conservatively falls back to Thompson when bit-index ordering cannot guarantee Thompson-compatible leftmost-first semantics.
Automatic fallback: Patterns with anchors (^, $, \b, \B), >64 positions, nullable top-level expressions, capture group requirements (extract, backref_re), or priority conflicts transparently fall back to Thompson NFA — no user intervention needed.
Shared memory DataSource abstraction (glushkov.cuh, utilities.cuh): Templates compute_follow and compute_reach over glushkov_global_source vs glushkov_shmem_source, with cooperative SMEM loading in kernel wrappers.

Limitations

do not support capturing groups (e.g. extract, replace_with_backrefs)
do not support zero-width assertions like BOL/EOL/BOW/NBOW
max 64 character-consuming positions since we are using uint64_t
do not support lazy quantifiers
empty/degenerate patterns rejected
do not support nullable patterns as well as some ambiguous alternation patterns

When above condition is detected, it falls back to use the current Thompson's NFA.

Unit tests + benchmark

Priority-kill parity tests: Verify Glushkov matches Thompson for overlapping-prefix alternations (foo|foobar, cat|catch, a|aa) across all 5 operations (contains, count, findall, replace, split)
Nullable fallback parity: Confirm nullable patterns (a*, \d*, (ab)?) transparently fall back to Thompson and produce identical results
Mixed-engine regression test (MixedEngineReplace): Exercises multi-pattern replace where one pattern is Glushkov-backed and another falls back to Thompson
Spark-rapids compatibility: ~60 regex patterns from spark-rapids integration tests validated under both engines via parametrized Python tests
Benchmarks: 6–9 patterns per benchmark covering char classes, alternation, bounded repetition, dot wildcards, and late-failure stress patterns; state.skip() guards for Glushkov-unsupported combinations (anchors, backrefs)
Extended more complex regexes in the current split_re/contains/replace_re/count, it showed 1.01-6.62x speedup.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-03-25T21:09:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…e the quadratic search to O(n); don't allocate working memory when Glushkov is applicable

…tmost-matching; fix the multi_re working memory issue

…but skip as in Glushkov

lingyany-nv requested review from a team as code owners March 25, 2026 21:09

lingyany-nv requested review from Matt711, PointKernel and wence- March 25, 2026 21:09

github-actions Bot assigned lingyany-nv Mar 25, 2026

github-actions Bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Mar 25, 2026

github-project-automation Bot added this to cuDF Python Mar 25, 2026

GPUtester moved this to In Progress in cuDF Python Mar 25, 2026

lingyany-nv marked this pull request as draft March 25, 2026 21:24

PointKernel added 2 - In Progress Currently a work in progress feature request New feature or request non-breaking Non-breaking change labels Mar 25, 2026

GregoryKimball changed the title ~~[draft] Implement Glushkov's NFA into cudf~~ [draft] Implement Glushkov's NFA into cuDF Apr 8, 2026

lingyany-nv added 11 commits April 13, 2026 11:38

feature: implement Glushkov's NFA

b39a18e

implement shared memory version of Glushkov's NFA

d90a5e6

refactor and simplify the code

7644ff4

add python regex tests from spark-rapids regex tests

b3d7ab9

add more complex regexes to the benchmarks

cca1cba

fix the correctness (leftmost-longest vs leftmost-first) issue; chang…

8cacf16

…e the quadratic search to O(n); don't allocate working memory when Glushkov is applicable

add compile-time-check to disalbe Glushkov when it cannot produce lef…

76fb67f

…tmost-matching; fix the multi_re working memory issue

add nullable regexes unit tests

cca35f9

for affected benchmarks, keep unsupported patterns in the benchmark, …

9edbab7

…but skip as in Glushkov

fix minor issues

f2240da

delete some dead code, update stale comments

98646b4

remove stalbe comments

2eaec3c

lingyany-nv force-pushed the lingyany/glushkov-nfa branch from 3712832 to 2eaec3c Compare April 13, 2026 20:27

lingyany-nv added 3 commits April 13, 2026 14:05

applied the formatter

30fa2eb

fix python format

b114480

fix clang-format again

8acb318

lingyany-nv marked this pull request as ready for review April 13, 2026 21:36

lingyany-nv changed the title ~~[draft] Implement Glushkov's NFA into cuDF~~ Implement Glushkov's NFA into cuDF Apr 13, 2026

davidwendt marked this pull request as draft April 13, 2026 21:42

lingyany-nv added 2 commits April 15, 2026 14:36

add replace_re tests, fixed minior issues

c4e1cb8

clang-format

7c4fc14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Glushkov's NFA into cuDF#21936

Implement Glushkov's NFA into cuDF#21936
lingyany-nv wants to merge 17 commits intorapidsai:mainfrom
lingyany-nv:lingyany/glushkov-nfa

lingyany-nv commented Mar 25, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lingyany-nv commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Add bit-parallel Glushkov NFA regex engine with shared memory optimization

Key changes

Limitations

Unit tests + benchmark

Checklist

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lingyany-nv commented Mar 25, 2026 •

edited

Loading