Experimental JSONPath engine for querying massive streamed datasets.
The rsonpath crate provides a JSONPath parser and a query execution engine rq,
which utilizes SIMD instructions to provide massive throughput improvements over conventional engines.
Benchmarks of rsonpath against a reference no-SIMD engine on the
Pison dataset. NOTE: Scale is logarithmic!
To run a JSONPath query on a file execute:
rq '$..a.b' ./file.jsonIf the file is omitted, the engine reads standard input. JSON can also be passed inline:
$ rq '$..a.b' --json '{"c":{"a":{"b":42}}}'
42
For details, consult rq --help or the rsonbook.
The result of running a query is a sequence of matched values, delimited by newlines.
Alternatively, passing --result count returns only the number of matches, which might be much faster.
For other result modes consult the --help usage page.
See Releases for precompiled binaries for all first-class support targets.
Easiest way to install is via cargo.
$ cargo install rsonpath
...If maximum speed is paramount, you should install rsonpath with native CPU instructions support.
This will result in a binary that is not portable and might work incorrectly on any other machine,
but will squeeze out every last bit of throughput.
To do this, run the following cargo install variant:
$ RUSTFLAGS="-C target-cpu=native" cargo install rsonpath
...Check out the relevant chapter in the rsonbook.
The project is actively developed and currently supports only a subset of the JSONPath query language. A query is a sequence of segments, each containing one or more selectors.
| Segment | Syntax | Supported | Since | Tracking Issue |
|---|---|---|---|---|
| Child segment (single) | [<selector>] |
βοΈ | v0.1.0 | |
| Child segment (multiple) | [<selector1>,...,<selectorN>] |
β | ||
| Descendant segment (single) | ..[<selector>] |
βοΈ | v0.1.0 | |
| Descendant segment (multiple) | ..[<selector1>,...,<selectorN>] |
β |
| Selector | Syntax | Supported | Since | Tracking Issue |
|---|---|---|---|---|
| Root | $ |
βοΈ | v0.1.0 | |
| Name | .<member>, [<member>] |
βοΈ | v0.1.0 | |
| Wildcard | .*, ..*, [*] |
βοΈ | v0.4.0 | |
| Index (array index) | [<index>] |
βοΈ | v0.5.0 | |
| Index (array index from end) | [-<index>] |
β | ||
| Array slice (forward, positive bounds) | [<start>:<end>:<step>] |
βοΈ | v0.9.0 | #152 |
| Array slice (forward, arbitrary bounds) | [<start>:<end>:<step>] |
β | ||
| Array slice (backward, arbitrary bounds) | [<start>:<end>:-<step>] |
β | ||
| Filters β existential tests | [?<path>] |
β | #154 | |
| Filters β const atom comparisons | [?<path> <binop> <atom>] |
β | #156 | |
| Filters β logical expressions | &&, ||, ! |
β | ||
| Filters β nesting | [?<expr>[?<expr>]...] |
β | ||
| Filters β arbitrary comparisons | [?<path> <binop> <path>] |
β | ||
| Filters β function extensions | [?func(<path>)] |
β |
The crate is continuously built for all Tier 1 Rust targets, and tests are continuously ran for targets that can be ran with GitHub action images. SIMD is supported only on x86/x86_64 platforms.
| Target triple | nosimd build | SIMD support | Continuous testing | Tracking issues |
|---|---|---|---|---|
| aarch64-unknown-linux-gnu | βοΈ | β | βοΈ | #21, #115 |
| i686-unknown-linux-gnu | βοΈ | βοΈ | βοΈ | |
| x86_64-unknown-linux-gnu | βοΈ | βοΈ | βοΈ | |
| x86_64-apple-darwin | βοΈ | βοΈ | βοΈ | |
| i686-pc-windows-gnu | βοΈ | βοΈ | βοΈ | |
| i686-pc-windows-msvc | βοΈ | βοΈ | βοΈ | |
| x86_64-pc-windows-gnu | βοΈ | βοΈ | βοΈ | |
| x86_64-pc-windows-msvc | βοΈ | βοΈ | βοΈ |
SIMD support is enabled on a module-by-module basis. Generally, any CPU released in the past decade supports AVX2, which enables all available optimizations.
Older CPUs with SSE2 or higher get partial support. You can check what exactly is enabled
with rq --version β check the SIMD support field:
$ rq --version
rq 0.9.1
Commit SHA: c024e1bab89610455537b77aed249d2a05a81ed6
Features: default,simd
Opt level: 3
Target triple: x86_64-unknown-linux-gnu
Codegen flags: link-arg=-fuse-ld=lld
SIMD support: avx2;fast_quotes;fast_popcntThe fast_quotes capability depends on the pclmulqdq instruction,
and fast_popcnt on the popcnt instruction.
Not all selectors are supported, see the support table above.
The engine assumes that every object in the input JSON has no duplicate keys. Behavior on duplicate keys is not guaranteed to be stable, but currently the engine will simply match the first such key.
$ rq '$.key' --json '{"key":"value","key":"other value"}'
"value"
The engine does not parse unicode escape sequences in member names.
This means that a key "a" is different from a key "\u0041", even though semantically they represent the same string.
This is actually as-designed with respect to the current JSONPath spec.
Parsing unicode sequences is costly, so the support for this was postponed
in favour of high performance. This is tracked as #117.
The gist is: fork, implement, make a PR back here. More details are in the CONTRIBUTING doc.
The dev workflow utilizes just.
Use the included Justfile. It will automatically install Rust for you using the rustup tool if it detects there is no Cargo in your environment.
$ just build
...
$ just test
...Benchmarks for rsonpath are located in a separate repository,
included as a git submodule in this main repository.
Easiest way to run all the benchmarks is just bench. For details, look at the README in the submodule.
We have a paper on rsonpath to be published at ASPLOS '24! You can read it
here.
This project was conceived as my thesis. You can read it for details on the theoretical background on the engine and details of its implementation.
Showing direct dependencies, for full graph see below.
cargo tree --package rsonpath --edges normal --depth 1rsonpath v0.9.1 (/home/mat/src/rsonpath/crates/rsonpath)
βββ clap v4.5.4
βββ color-eyre v0.6.3
βββ eyre v0.6.12
βββ log v0.4.21
βββ rsonpath-lib v0.9.1 (/home/mat/src/rsonpath/crates/rsonpath-lib)
βββ rsonpath-syntax v0.3.1 (/home/mat/src/rsonpath/crates/rsonpath-syntax)
βββ simple_logger v4.3.3
[build-dependencies]
βββ rustflags v0.1.5
βββ vergen v8.3.1
[build-dependencies]cargo tree --package rsonpath-lib --edges normal --depth 1rsonpath-lib v0.9.1 (/home/mat/src/rsonpath/crates/rsonpath-lib)
βββ arbitrary v1.3.2
βββ cfg-if v1.0.0
βββ log v0.4.21
βββ memmap2 v0.9.4
βββ nom v7.1.3
βββ rsonpath-syntax v0.3.1 (/home/mat/src/rsonpath/crates/rsonpath-syntax)
βββ smallvec v1.13.2
βββ static_assertions v1.1.0
βββ thiserror v1.0.58
βββ vector-map v1.0.1clapβ standard crate to provide the CLI.color-eyre,eyreβ more accessible error messages for the parser.log,simple-loggerβ diagnostic logs during compilation and execution.cfg-ifβ used to support SIMD and no-SIMD versions.memmap2β for fast reading of source files via a memory map instead of buffered copies.nomβ for parser implementation.smallvecβ crucial for small-stack performance.static_assertionsβ additional reliability by some constant assumptions validated at compile time.thiserrorβ idiomaticErrorimplementations.vector_mapβ used in the query compiler for measurably better performance.
cargo tree --package rsonpath --edges normalrsonpath v0.9.1 (/home/mat/src/rsonpath/crates/rsonpath)
βββ clap v4.5.4
β βββ clap_builder v4.5.2
β β βββ anstream v0.6.13
β β β βββ anstyle v1.0.6
β β β βββ anstyle-parse v0.2.3
β β β β βββ utf8parse v0.2.1
β β β βββ anstyle-query v1.0.2
β β β β βββ windows-sys v0.52.0
β β β β βββ windows-targets v0.52.4
β β β β βββ windows_aarch64_gnullvm v0.52.4
β β β β βββ windows_aarch64_msvc v0.52.4
β β β β βββ windows_i686_gnu v0.52.4
β β β β βββ windows_i686_msvc v0.52.4
β β β β βββ windows_x86_64_gnu v0.52.4
β β β β βββ windows_x86_64_gnullvm v0.52.4
β β β β βββ windows_x86_64_msvc v0.52.4
β β β βββ anstyle-wincon v3.0.2
β β β β βββ anstyle v1.0.6
β β β β βββ windows-sys v0.52.0 (*)
β β β βββ colorchoice v1.0.0
β β β βββ utf8parse v0.2.1
β β βββ anstyle v1.0.6
β β βββ clap_lex v0.7.0
β β βββ strsim v0.11.1
β β βββ terminal_size v0.3.0
β β βββ rustix v0.38.32
β β β βββ bitflags v2.5.0
β β β βββ errno v0.3.8
β β β β βββ libc v0.2.153
β β β β βββ windows-sys v0.52.0 (*)
β β β βββ libc v0.2.153
β β β βββ linux-raw-sys v0.4.13
β β β βββ windows-sys v0.52.0 (*)
β β βββ windows-sys v0.48.0
β β βββ windows-targets v0.48.5
β β βββ windows_aarch64_gnullvm v0.48.5
β β βββ windows_aarch64_msvc v0.48.5
β β βββ windows_i686_gnu v0.48.5
β β βββ windows_i686_msvc v0.48.5
β β βββ windows_x86_64_gnu v0.48.5
β β βββ windows_x86_64_gnullvm v0.48.5
β β βββ windows_x86_64_msvc v0.48.5
β βββ clap_derive v4.5.4 (proc-macro)
β βββ heck v0.5.0
β βββ proc-macro2 v1.0.79
β β βββ unicode-ident v1.0.12
β βββ quote v1.0.35
β β βββ proc-macro2 v1.0.79 (*)
β βββ syn v2.0.58
β βββ proc-macro2 v1.0.79 (*)
β βββ quote v1.0.35 (*)
β βββ unicode-ident v1.0.12
βββ color-eyre v0.6.3
β βββ backtrace v0.3.71
β β βββ addr2line v0.21.0
β β β βββ gimli v0.28.1
β β βββ cfg-if v1.0.0
β β βββ libc v0.2.153
β β βββ miniz_oxide v0.7.2
β β β βββ adler v1.0.2
β β βββ object v0.32.2
β β β βββ memchr v2.7.2
β β βββ rustc-demangle v0.1.23
β β [build-dependencies]
β β βββ cc v1.0.90
β βββ eyre v0.6.12
β β βββ indenter v0.3.3
β β βββ once_cell v1.19.0
β βββ indenter v0.3.3
β βββ once_cell v1.19.0
β βββ owo-colors v3.5.0
βββ eyre v0.6.12 (*)
βββ log v0.4.21
βββ rsonpath-lib v0.9.1 (/home/mat/src/rsonpath/crates/rsonpath-lib)
β βββ cfg-if v1.0.0
β βββ log v0.4.21
β βββ memmap2 v0.9.4
β β βββ libc v0.2.153
β βββ nom v7.1.3
β β βββ memchr v2.7.2
β β βββ minimal-lexical v0.2.1
β βββ rsonpath-syntax v0.3.1 (/home/mat/src/rsonpath/crates/rsonpath-syntax)
β β βββ nom v7.1.3 (*)
β β βββ owo-colors v4.0.0
β β βββ thiserror v1.0.58
β β β βββ thiserror-impl v1.0.58 (proc-macro)
β β β βββ proc-macro2 v1.0.79 (*)
β β β βββ quote v1.0.35 (*)
β β β βββ syn v2.0.58 (*)
β β βββ unicode-width v0.1.11
β βββ smallvec v1.13.2
β βββ static_assertions v1.1.0
β βββ thiserror v1.0.58 (*)
β βββ vector-map v1.0.1
β βββ contracts v0.4.0 (proc-macro)
β β βββ proc-macro2 v1.0.79 (*)
β β βββ quote v1.0.35 (*)
β β βββ syn v1.0.109
β β βββ proc-macro2 v1.0.79 (*)
β β βββ quote v1.0.35 (*)
β β βββ unicode-ident v1.0.12
β βββ rand v0.7.3
β βββ getrandom v0.1.16
β β βββ cfg-if v1.0.0
β β βββ libc v0.2.153
β β βββ wasi v0.9.0+wasi-snapshot-preview1
β βββ libc v0.2.153
β βββ rand_chacha v0.2.2
β β βββ ppv-lite86 v0.2.17
β β βββ rand_core v0.5.1
β β βββ getrandom v0.1.16 (*)
β βββ rand_core v0.5.1 (*)
β βββ rand_hc v0.2.0
β βββ rand_core v0.5.1 (*)
βββ rsonpath-syntax v0.3.1 (/home/mat/src/rsonpath/crates/rsonpath-syntax) (*)
βββ simple_logger v4.3.3
βββ colored v2.1.0
β βββ lazy_static v1.4.0
β βββ windows-sys v0.48.0 (*)
βββ log v0.4.21
βββ time v0.3.34
β βββ deranged v0.3.11
β β βββ powerfmt v0.2.0
β βββ itoa v1.0.11
β βββ libc v0.2.153
β βββ num-conv v0.1.0
β βββ num_threads v0.1.7
β β βββ libc v0.2.153
β βββ powerfmt v0.2.0
β βββ time-core v0.1.2
β βββ time-macros v0.2.17 (proc-macro)
β βββ num-conv v0.1.0
β βββ time-core v0.1.2
βββ windows-sys v0.48.0 (*)
[build-dependencies]
βββ rustflags v0.1.5
βββ vergen v8.3.1
βββ anyhow v1.0.81
βββ cargo_metadata v0.18.1
β βββ camino v1.1.6
β β βββ serde v1.0.197
β β βββ serde_derive v1.0.197 (proc-macro)
β β βββ proc-macro2 v1.0.79 (*)
β β βββ quote v1.0.35 (*)
β β βββ syn v2.0.58 (*)
β βββ cargo-platform v0.1.8
β β βββ serde v1.0.197 (*)
β βββ semver v1.0.22
β β βββ serde v1.0.197 (*)
β βββ serde v1.0.197 (*)
β βββ serde_json v1.0.115
β β βββ itoa v1.0.11
β β βββ ryu v1.0.17
β β βββ serde v1.0.197 (*)
β βββ thiserror v1.0.58 (*)
βββ cfg-if v1.0.0
βββ regex v1.10.4
β βββ aho-corasick v1.1.3
β β βββ memchr v2.7.2
β βββ memchr v2.7.2
β βββ regex-automata v0.4.6
β β βββ aho-corasick v1.1.3 (*)
β β βββ memchr v2.7.2
β β βββ regex-syntax v0.8.3
β βββ regex-syntax v0.8.3
βββ rustc_version v0.4.0
β βββ semver v1.0.22 (*)
βββ time v0.3.34
βββ deranged v0.3.11 (*)
βββ itoa v1.0.11
βββ libc v0.2.153
βββ num-conv v0.1.0
βββ num_threads v0.1.7 (*)
βββ powerfmt v0.2.0
βββ time-core v0.1.2
[build-dependencies]
βββ rustversion v1.0.14 (proc-macro)