Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jan 2, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

Optimize JSON struct decoding on wide objects by reducing per-row allocations and repeated field lookups.

What changes are included in this PR?

Reuse a flat child-position buffer in StructArrayDecoder and add an optional field-name index for object mode.
Skip building the field-name index for list mode; add overflow/allocation checks.

decode_wide_object_i64_json
                        time:   [11.828 ms 11.865 ms 11.905 ms]
                        change: [−67.828% −67.378% −67.008%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

decode_wide_object_i64_serialize
                        time:   [7.6923 ms 7.7402 ms 7.7906 ms]
                        change: [−75.652% −75.483% −75.331%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 2, 2026
@Weijun-H Weijun-H marked this pull request as ready for review January 2, 2026 13:57
@Weijun-H Weijun-H changed the title perf: improve field indexing in StructArrayDecoder perf: improve field indexing in StructArrayDecoder (1.5x speed up) Jan 2, 2026
@Weijun-H Weijun-H changed the title perf: improve field indexing in StructArrayDecoder (1.5x speed up) perf: improve field indexing in StructArrayDecoder (2x speed up) Jan 2, 2026
@Weijun-H Weijun-H changed the title perf: improve field indexing in StructArrayDecoder (2x speed up) perf: improve field indexing in StructArrayDecoder (1.7x speed up) Jan 2, 2026
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the indexing code well enough to say whether that part is correct, but the idea of using an optional index for field name lookups makes a lot of sense to me.

}
}

fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Do lifetimes coincide so that we could return Option<HashMap<&str, usize>> instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the lifetimes do coincide. we can use HashMap<&'a str, usize> by taking fields: &'a Fields as a parameter, which avoids the self-referential struct problem. However, this would require threading the lifetime parameter <'a> through the entire decoder system across many files. Since the lookup performance is identical, I don’t think it’s worth the added complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be a good follow on PR

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H and @scovich

use std::fmt::Write;
use std::sync::Arc;

fn build_schema(field_count: usize) -> Arc<Schema> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add some comments here with an example of what this code does / what patterns of input it creates?

Also, it would help me to reproduce your results if you could make a separate PR with the benchmarks (so I can compare main to the PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate benchmark here

#9107

}
}

fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be a good follow on PR

@alamb alamb changed the title perf: improve field indexing in StructArrayDecoder (1.7x speed up) perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) Jan 7, 2026
@alamb
Copy link
Contributor

alamb commented Jan 10, 2026

run benchmark json-reader

@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@alamb-ghbot
Copy link

🤖 Hi @alamb, thanks for the request (#9086 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, json-reader, metadata, row_format, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

@Weijun-H
Copy link
Member Author

run benchmark json-reader

@alamb-ghbot
Copy link

🤖 Hi @Weijun-H, thanks for the request (#9086 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Omega359, adriangb, alamb, comphead, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-json-scan (06ded8b) to b2aeab1 diff
BENCH_NAME=json-reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench json-reader
BENCH_FILTER=
BENCH_BRANCH_NAME=optimize-json-scan
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                        main                                   optimize-json-scan
-----                                        ----                                   ------------------
decode_binary_hex_json                       1.05     93.1±0.89ms        ? ?/sec    1.00     88.5±1.03ms        ? ?/sec
decode_binary_view_hex_json                  1.05     94.2±0.61ms        ? ?/sec    1.00     89.6±1.39ms        ? ?/sec
decode_fixed_binary_hex_json                 1.05     92.9±1.20ms        ? ?/sec    1.00     88.3±1.40ms        ? ?/sec
decode_wide_object_i64_json                  1.38  1468.8±33.63ms        ? ?/sec    1.00  1065.8±27.55ms        ? ?/sec
decode_wide_object_i64_serialize             1.46  1268.0±13.45ms        ? ?/sec    1.00   866.5±14.04ms        ? ?/sec
decode_wide_projection_full_json/131072      1.64       3.0±0.03s    57.4 MB/sec    1.00  1845.3±18.20ms    94.3 MB/sec
decode_wide_projection_narrow_json/131072    1.00   780.7±12.09ms   222.9 MB/sec    1.01   791.4±10.94ms   219.9 MB/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H -- I think this PR is a nice improvement. I have some suggestions on how to make it faster and improve the comments, but overall very nice 👍

is_nullable: bool,
struct_mode: StructMode,
field_name_to_index: Option<HashMap<String, usize>>,
child_pos: Vec<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment that explains what child_pos is? It isn't clear here (the idea of caching rather than recreating it looks good though)

Specifically I think it is important to document what is stored at each index (e.g. each index the tape position of at field_idx * row_count + row)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed and commented in df9e710

))
})?;
}
self.child_pos.resize(total_len, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would set some elements to zero twice -- I think you can get the same result without the extra setting via

self.child_pos.clear();
self.child_pos.resize(total_len, 0);

Also, I think resize calls reserve internally (it internally calls extend_with which calls reserve), so there is no need to also call child_pos.reserve above

(also the rest of this crate just calls reserve so I think using try_reserve just here seems unecessary)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in. df9e710

fields.len()
)));
}
child_pos[entry_idx * row_count + row] = cur_idx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is a nice way to avoid allocations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants