GH-135: Store trace_id, span_id, and parent_span_id as FixedSizeBinary#136
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughMigrates trace/span IDs from hex strings to fixed-length binary across ingest, WAL, Parquet, and query; adds per-column Parquet encoding overrides and shared writer-properties builder; exposes Parquet data-page-size config; and adds a cAdvisor DaemonSet + ServiceMonitor. ChangesBinary ID & Parquet encoding migration
Container metrics (cAdvisor) deployment
Sequence DiagramEstimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
trace_id, span_id, and parent_span_id as `FixedSi…trace_id, span_id, and parent_span_id as FixedSizeBinary
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/icegate-ingest/src/wal/boundary.rs (1)
136-145:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject mixed-width
FixedBytescomponents during structure validation.Two
FixedBytesvalues with different byte lengths are currently treated as structurally compatible and then compared lexicographically. For fixed-width IDs that accepts impossible boundary metadata and can validate a corrupt range after deserialization. Please fail fast when both sides areFixedBytesbut their lengths differ.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/wal/boundary.rs` around lines 136 - 145, The structural compatibility check in the match over (&self.value, &other.value) must reject cases where both values are FixedBytes with different lengths: inside the branch that currently checks boundary_value_discriminant(left) != boundary_value_discriminant(right) add an additional early error for the case when both variants are FixedBytes but left.len() != right.len() (or the appropriate length accessor on the FixedBytes variant) and return Err(IngestError::Shift(...)) describing mismatched fixed byte widths; keep using the same error type/format as the surrounding code and reference the FixedBytes variant, boundary_value_discriminant, and the same component index context in the message.crates/icegate-query/src/logql/datafusion/planner.rs (2)
1240-1247:⚠️ Potential issue | 🟠 MajorRegex matchers still target the raw binary column without type conversion.
For
trace_idandspan_id(which areFixedSizeBinarycolumns), the code at lines 1240–1247 callsregexp_like(col_expr, ...)directly on the raw binary column without converting it to a hex string first. While equality matchers properly use the typedvalvariable, regex matchers bypass it entirely. This causes a type mismatch:regexp_like()expects string arguments, not binary. Either hex-encode/cast the binary columns before passing them toregexp_like(), or reject regex matchers (=~,!~) for these fixed-binary labels explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/logql/datafusion/planner.rs` around lines 1240 - 1247, Regex matchers (MatchOp::Re / MatchOp::Nre) call datafusion::functions::regex::regexp_like() directly on col_expr (the raw FixedSizeBinary columns like trace_id/span_id) which causes a type mismatch; convert those binary columns to a string representation (hex-encode or cast to UTF-8) before calling regexp_like, using the same typed value used by equality matchers (the val variable / matcher.value) or explicitly reject =~ / !~ for fixed-binary labels. Locate the MatchOp::Re and MatchOp::Nre branches, wrap col_expr with the binary->string conversion (or add a guard that returns an error for regex on FixedSizeBinary) and pass the converted expression and matcher.value to regexp_like (and call .not() for Nre).
1225-1239:⚠️ Potential issue | 🟠 MajorFix
!=operator semantics on invalid trace_id/span_id values.When hex decode/length validation fails, both
=and!=currently compare againstFixedSizeBinary(..., None). In DataFusion,col != NULLevaluates to NULL and filters out all rows, makingtrace_id != "badhex"match nothing instead of all valid IDs. Move the invalid-input check beforevalconstruction and apply different fallback expressions: forEquse NULL (match nothing), forNequse a row-matching expression (match all valid rows). Also note that regex matchers on binary columns (trace_id/span_id) require handling separate from thelit(ScalarValue)path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/logql/datafusion/planner.rs` around lines 1225 - 1239, The current logic builds val first then uses it for both Eq and Neq, causing `col != NULL` to filter out all rows; instead, for COL_TRACE_ID and COL_SPAN_ID (identified by mapped_label, COL_TRACE_ID, COL_SPAN_ID) validate hex::decode + length before constructing val and branch on MatchOp: if decode is invalid then for MatchOp::Eq return an expression that matches nothing (i.e. compare against NULL/other "no-match" expression) while for MatchOp::Neq return an expression that matches all valid rows (e.g. a truthy row-matching expression such as an is-not-null or always-true predicate for that binary column); only when decode+length are valid construct lit(ScalarValue::FixedSizeBinary(..., Some(bytes))) and use col_expr.eq(...) / col_expr.not_eq(...). Also handle regex matchers for binary columns separately from the lit(ScalarValue) path.
🧹 Nitpick comments (5)
crates/icegate-query/src/engine/provider/scan.rs (1)
199-203: Confirm that row_selection overhead is justified across all workloads.Enabling row selection parses the Parquet page index to filter row ranges within row groups. Per iceberg-rust docs, this overhead can outweigh the benefits: "parsing the parquet page index…can be slow enough that parsing the page index outweighs any gains from the reduced number of rows that need scanning."
Since
with_row_selection_enabled(true)is unconditional (applies to every scan regardless of predicate selectivity), verify via benchmarks or profiling that the page index parsing overhead justifies enabling it for all queries.Also confirm that iceberg-rust's page index evaluator correctly handles
FixedSizeBinarycolumn predicates. The codebase createsFixedSizeBinarypredicates extensively for trace_id (16 B), span_id (8 B), and parent_span_id (8 B); ensure these benefit from page-level filtering rather than falling back to full row-group inclusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/provider/scan.rs` around lines 199 - 203, Unconditionally enabling row selection via scan_builder.with_row_selection_enabled(true) may add page-index parsing overhead that isn't always beneficial; run targeted benchmarks/profiling comparing scans with and without row selection (use ICEBERG_SCAN_BATCH_SIZE as in the current build) to determine overall benefit and gate the call if needed, and separately verify iceberg-rust's page index evaluator correctly supports FixedSizeBinary predicates by testing predicate pushdown for FixedSizeBinary columns (trace_id, span_id, parent_span_id) to ensure they produce page-level filtering rather than forcing full row-group reads; if benchmarks show regressions, make row selection conditional (e.g., based on predicate selectivity estimation or a config flag) and add unit/integration tests confirming FixedSizeBinary predicate handling.crates/icegate-query/tests/tempo/harness.rs (1)
219-225: ⚡ Quick winSeed at least one non-null
parent_span_idhere.
parent_span_idis part of the fixed-size-binary migration and is also covered by the shared Parquet encoding overrides, but this fixture still writes three nulls. That means these Tempo tests never serialize a real parent span ID through the new path, so a regression there would slip by unnoticed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/tests/tempo/harness.rs` around lines 219 - 225, The test currently creates parent_span_id as three null FixedSizeBinary entries using FixedSizeBinaryArray::try_from_sparse_iter_with_size and repeat_n(None, 3); change the iterator so at least one element is Some(&[u8;8]) (an 8-byte slice) and the other entries remain None (e.g., iterator over [Some(bytes8), None, None]) so the FixedSizeBinaryArray contains at least one real parent span ID; update the call to FixedSizeBinaryArray::try_from_sparse_iter_with_size accordingly (keeping the size 8) so the Tempo tests serialize a real parent_span_id through the migration path.crates/icegate-ingest/src/shift/config.rs (1)
290-294: ⚡ Quick winAdd a unit test for
data_page_size_limit_bytes == 0.This guard is new and the Helm chart now exposes the field, but the test module does not pin it yet. A tiny regression test here would keep the config validation and chart wiring from drifting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/shift/config.rs` around lines 290 - 294, Add a unit test that exercises validation when write.data_page_size_limit_bytes == 0: create a config instance (or builder) with write.data_page_size_limit_bytes set to 0, call the validation method (the impl that returns Result, e.g., validate or the method that produced Err(IngestError::Config)), and assert it returns Err(IngestError::Config) and that the error message contains "data_page_size_limit_bytes must be greater than zero"; name the test something like test_data_page_size_limit_bytes_zero to live in the existing test module so future changes to the Helm chart/field are caught.crates/icegate-common/src/parquet_encoding.rs (1)
135-165: ⚡ Quick win
DELTA_BYTE_ARRAYonFIXED_LEN_BYTE_ARRAYis spec-valid; test coverage is the remaining gapThe official Apache Parquet encoding table explicitly lists
DELTA_BYTE_ARRAY = 7as supported for bothBYTE_ARRAYandFIXED_LEN_BYTE_ARRAY, so the encoding choices fortrace_id,span_id, andparent_span_idare spec-compliant. The spec also notes that "even forFIXED_LEN_BYTE_ARRAY, all lengths are encoded despite the redundancy" — this is by-spec design and does not affect correctness.The gap is that none of the test suites (neither this module nor
writer.rsnoriceberg_storage.rs) exercise aFixedSizeBinarycolumn with an activeDELTA_BYTE_ARRAYoverride. Consider adding at least one test here (or inwriter.rs) that writes aFixedSizeBinarycolumn with dictionary disabled +DELTA_BYTE_ARRAYand reads it back, verifying parquet-rs's encoder and decoder agree on the binary payload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-common/src/parquet_encoding.rs` around lines 135 - 165, Add a regression test that exercises a FixedSizeBinary column using DELTA_BYTE_ARRAY encoding so we verify parquet-rs can encode/decode that combination: create a small Parquet write+read roundtrip (either as a new test in this module or in writer.rs) that defines a FixedSizeBinary column (FixedSizeBinary type), disables dictionary encoding for that column, sets the encoding override to Encoding::DELTA_BYTE_ARRAY (matching the SPANS_COLUMN_ENCODINGS use of DELTA_BYTE_ARRAY for trace_id/span_id/parent_span_id), writes several rows with varying binary values, then reads them back and asserts the binary payloads match; ensure the test fails if the encoder/decoder disagree.crates/icegate-queue/src/writer.rs (1)
466-479: ⚡ Quick winAdd a roundtrip test for
DELTA_BYTE_ARRAYon aFixedSizeBinarycolumnThe encoding logic looks correct, but the existing writer tests (
writer_emits_bloom_filter_for_columns_listed_for_topic, etc.) all useStringArray/Int32Arrayand the encoding map is never populated in any test. The central claim of this PR — thatDELTA_BYTE_ARRAYcompressesFIXED_LEN_BYTE_ARRAYtrace_id/span_idcolumns more efficiently than dictionary — is not exercised at all in the test suite.The Parquet spec confirms
DELTA_BYTE_ARRAY(encoding 7) is valid for bothBYTE_ARRAYandFIXED_LEN_BYTE_ARRAY. However, the parquet-rs crate's encoder forFIXED_LEN_BYTE_ARRAYwithDELTA_BYTE_ARRAYshould be verified via a roundtrip test. A minimal test would:
- Build a schema with a
FixedSizeBinary(16)column (ArrowDataType::FixedSizeBinary(16)).- Populate it with representative 16-byte values.
- Write via
QueueWriterwithcolumn_encodingsset to&[("trace_id", Encoding::DELTA_BYTE_ARRAY)].- Read back and assert the values match, confirming parquet-rs encodes/decodes the path without error or silent fallback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-queue/src/writer.rs` around lines 466 - 479, Add a roundtrip unit test that verifies parquet-rs actually encodes/decodes a FixedSizeBinary column with DELTA_BYTE_ARRAY: create an Arrow schema containing DataType::FixedSizeBinary(16) for a "trace_id" column, populate it with several 16-byte values, construct a QueueWriter with column_encodings mapping the topic to &[("trace_id", Encoding::DELTA_BYTE_ARRAY)], write the record batch to an in-memory sink, read the written Parquet back (using the same reader path the codebase uses) and assert the decoded FixedSizeBinary values equal the originals; put this under a clear test name (e.g., writer_roundtrip_delta_byte_array_fixed_size_binary) so CI covers the new encoding path for QueueWriter and column_encodings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/icegate-common/src/schema.rs`:
- Around line 89-102: Update SCHEMA.md to reflect that trace_id and span_id are
stored as fixed-length binary types (PrimitiveType::Fixed(16) and
PrimitiveType::Fixed(8)) rather than VARCHAR/STRING to match the Rust schema
where NestedField::optional(..., "trace_id",
Type::Primitive(PrimitiveType::Fixed(16))) and NestedField::optional(...,
"span_id", Type::Primitive(PrimitiveType::Fixed(8))) are defined; add a
Migration Guide section that explains the current upgrade_schemas() behavior (it
detects schema diffs and returns an error) and instructs operators to perform
the manual destructive migration by dropping and recreating affected tables
using the documented command (icegate-maintain migrate create), plus a short
checklist of which tables (logs, spans, events, metrics) are affected and a
recommended backup step before dropping tables.
In `@crates/icegate-ingest/src/wal/columns.rs`:
- Around line 121-131: The FixedSizeBinary branch (when matching
PrimitiveType::Fixed) must validate the actual byte width: after downcasting to
FixedSizeBinaryArray (the current
as_any().downcast_ref::<FixedSizeBinaryArray>() call), read its data_type() to
extract the declared width (DataType::FixedSizeBinary(n_actual)) and compare
n_actual to the schema width (the PrimitiveType::Fixed(n_expected) value); if
they differ return an Err(IngestError::Shift(...)) similar to the existing error
message using column_name and context, otherwise proceed to construct
CachedSortColumn::FixedBytes with the cloned array, sort_column.descending and
sort_column.nulls_first. Ensure you reference PrimitiveType::Fixed,
FixedSizeBinaryArray, CachedSortColumn::FixedBytes, IngestError::Shift,
sort_column and column in the fix.
In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 259-275: eval_binary_on_stats currently only inspects
Statistics::ByteArray causing FIXED_LEN_BYTE_ARRAY columns (trace_id/span_id) to
bypass stats pruning; update eval_binary_on_stats to also handle the
fixed-length case by matching Statistics::FixedLenByteArray (or whatever the
parquet Statistics variant for fixed-length bytes is) and decode/compare its
min/max values the same way you handle ByteArray literals (apply the same
hex/string conversion logic used for FIXED_LEN_BYTE_ARRAY decoding such as in
decode_plain_fixed_len_byte_array_values and the
PhysicalType::FIXED_LEN_BYTE_ARRAY branch), so predicates against
trace_id/span_id consult stats correctly and pruning does not regress when no
dictionary page is present.
In `@crates/icegate-query/src/loki/predicate.rs`:
- Around line 21-31: datum_for_column currently accepts any hex string and
converts to Datum::fixed, but doesn't enforce storage widths; update
datum_for_column to decode hex and then validate byte length: for COL_TRACE_ID
require exactly 16 bytes, and for COL_SPAN_ID and COL_PARENT_SPAN_ID require
exactly 8 bytes, returning None when decode fails or length is wrong; keep other
columns returning Datum::string as before and use the existing Datum::fixed(...)
constructor for valid fixed-length byte slices.
In `@crates/icegate-query/src/tempo/formatters.rs`:
- Around line 91-97: The parent_span_id value is being serialized even when it's
the all-zero sentinel for root spans; modify the parent handling so that when
evaluating parent_ids (used to build parent_hex passed to build_span_value) you
treat either null or an all-zero 8-byte value as "no parent". Concretely: in the
code that computes parent_hex (the parent_ids.filter(...).map(...) expression),
retrieve the raw parent byte slice (parent_ids.value(row)), check whether it is
all zeros (compare to an 8-bytes-of-zero array or use .iter().all(|b| *b == 0)),
and only call hex::encode and pass a Some(hex) to build_span_value when it is
non-zero; otherwise pass None (i.e. keep parent_hex as None) so build_span_value
and OTLP output will suppress the zero-filled parent_span_id.
In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs`:
- Around line 25-35: Tests currently use padded ASCII labels via pad_id which
produce non-hex IDs (e.g., "t1-trace-1"), preventing the hex-decode path from
being exercised; change the fixtures to use real hex-encoded trace/span IDs and
update helpers accordingly: replace uses of pad_id(...) for trace/span fixtures
with a new or modified helper (referenced as pad_id or hex_id) that accepts a
hex string and decodes it into a fixed-size [u8; N] byte array (or add a
separate hex_decode helper) so the test selectors like traceID = "..." are valid
hex and the end-to-end hex decode path (trace/span ID matching and tenant
isolation) is exercised. Ensure the new helper validates hex length matches N
and returns the decoded fixed-size array, and update any test vectors that
previously used ASCII labels to use the corresponding hex strings.
---
Outside diff comments:
In `@crates/icegate-ingest/src/wal/boundary.rs`:
- Around line 136-145: The structural compatibility check in the match over
(&self.value, &other.value) must reject cases where both values are FixedBytes
with different lengths: inside the branch that currently checks
boundary_value_discriminant(left) != boundary_value_discriminant(right) add an
additional early error for the case when both variants are FixedBytes but
left.len() != right.len() (or the appropriate length accessor on the FixedBytes
variant) and return Err(IngestError::Shift(...)) describing mismatched fixed
byte widths; keep using the same error type/format as the surrounding code and
reference the FixedBytes variant, boundary_value_discriminant, and the same
component index context in the message.
In `@crates/icegate-query/src/logql/datafusion/planner.rs`:
- Around line 1240-1247: Regex matchers (MatchOp::Re / MatchOp::Nre) call
datafusion::functions::regex::regexp_like() directly on col_expr (the raw
FixedSizeBinary columns like trace_id/span_id) which causes a type mismatch;
convert those binary columns to a string representation (hex-encode or cast to
UTF-8) before calling regexp_like, using the same typed value used by equality
matchers (the val variable / matcher.value) or explicitly reject =~ / !~ for
fixed-binary labels. Locate the MatchOp::Re and MatchOp::Nre branches, wrap
col_expr with the binary->string conversion (or add a guard that returns an
error for regex on FixedSizeBinary) and pass the converted expression and
matcher.value to regexp_like (and call .not() for Nre).
- Around line 1225-1239: The current logic builds val first then uses it for
both Eq and Neq, causing `col != NULL` to filter out all rows; instead, for
COL_TRACE_ID and COL_SPAN_ID (identified by mapped_label, COL_TRACE_ID,
COL_SPAN_ID) validate hex::decode + length before constructing val and branch on
MatchOp: if decode is invalid then for MatchOp::Eq return an expression that
matches nothing (i.e. compare against NULL/other "no-match" expression) while
for MatchOp::Neq return an expression that matches all valid rows (e.g. a truthy
row-matching expression such as an is-not-null or always-true predicate for that
binary column); only when decode+length are valid construct
lit(ScalarValue::FixedSizeBinary(..., Some(bytes))) and use col_expr.eq(...) /
col_expr.not_eq(...). Also handle regex matchers for binary columns separately
from the lit(ScalarValue) path.
---
Nitpick comments:
In `@crates/icegate-common/src/parquet_encoding.rs`:
- Around line 135-165: Add a regression test that exercises a FixedSizeBinary
column using DELTA_BYTE_ARRAY encoding so we verify parquet-rs can encode/decode
that combination: create a small Parquet write+read roundtrip (either as a new
test in this module or in writer.rs) that defines a FixedSizeBinary column
(FixedSizeBinary type), disables dictionary encoding for that column, sets the
encoding override to Encoding::DELTA_BYTE_ARRAY (matching the
SPANS_COLUMN_ENCODINGS use of DELTA_BYTE_ARRAY for
trace_id/span_id/parent_span_id), writes several rows with varying binary
values, then reads them back and asserts the binary payloads match; ensure the
test fails if the encoder/decoder disagree.
In `@crates/icegate-ingest/src/shift/config.rs`:
- Around line 290-294: Add a unit test that exercises validation when
write.data_page_size_limit_bytes == 0: create a config instance (or builder)
with write.data_page_size_limit_bytes set to 0, call the validation method (the
impl that returns Result, e.g., validate or the method that produced
Err(IngestError::Config)), and assert it returns Err(IngestError::Config) and
that the error message contains "data_page_size_limit_bytes must be greater than
zero"; name the test something like test_data_page_size_limit_bytes_zero to live
in the existing test module so future changes to the Helm chart/field are
caught.
In `@crates/icegate-query/src/engine/provider/scan.rs`:
- Around line 199-203: Unconditionally enabling row selection via
scan_builder.with_row_selection_enabled(true) may add page-index parsing
overhead that isn't always beneficial; run targeted benchmarks/profiling
comparing scans with and without row selection (use ICEBERG_SCAN_BATCH_SIZE as
in the current build) to determine overall benefit and gate the call if needed,
and separately verify iceberg-rust's page index evaluator correctly supports
FixedSizeBinary predicates by testing predicate pushdown for FixedSizeBinary
columns (trace_id, span_id, parent_span_id) to ensure they produce page-level
filtering rather than forcing full row-group reads; if benchmarks show
regressions, make row selection conditional (e.g., based on predicate
selectivity estimation or a config flag) and add unit/integration tests
confirming FixedSizeBinary predicate handling.
In `@crates/icegate-query/tests/tempo/harness.rs`:
- Around line 219-225: The test currently creates parent_span_id as three null
FixedSizeBinary entries using
FixedSizeBinaryArray::try_from_sparse_iter_with_size and repeat_n(None, 3);
change the iterator so at least one element is Some(&[u8;8]) (an 8-byte slice)
and the other entries remain None (e.g., iterator over [Some(bytes8), None,
None]) so the FixedSizeBinaryArray contains at least one real parent span ID;
update the call to FixedSizeBinaryArray::try_from_sparse_iter_with_size
accordingly (keeping the size 8) so the Tempo tests serialize a real
parent_span_id through the migration path.
In `@crates/icegate-queue/src/writer.rs`:
- Around line 466-479: Add a roundtrip unit test that verifies parquet-rs
actually encodes/decodes a FixedSizeBinary column with DELTA_BYTE_ARRAY: create
an Arrow schema containing DataType::FixedSizeBinary(16) for a "trace_id"
column, populate it with several 16-byte values, construct a QueueWriter with
column_encodings mapping the topic to &[("trace_id",
Encoding::DELTA_BYTE_ARRAY)], write the record batch to an in-memory sink, read
the written Parquet back (using the same reader path the codebase uses) and
assert the decoded FixedSizeBinary values equal the originals; put this under a
clear test name (e.g., writer_roundtrip_delta_byte_array_fixed_size_binary) so
CI covers the new encoding path for QueueWriter and column_encodings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 16436743-3325-4b81-b4a1-726ae55e7926
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (32)
config/helm/icegate/templates/configmap-ingest.yamlconfig/helm/icegate/values.yamlcrates/icegate-common/src/catalog/config.rscrates/icegate-common/src/lib.rscrates/icegate-common/src/parquet_encoding.rscrates/icegate-common/src/schema.rscrates/icegate-ingest/src/cli/commands/run.rscrates/icegate-ingest/src/shift/config.rscrates/icegate-ingest/src/shift/iceberg_storage.rscrates/icegate-ingest/src/shift/mod.rscrates/icegate-ingest/src/transform.rscrates/icegate-ingest/src/wal/boundary.rscrates/icegate-ingest/src/wal/columns.rscrates/icegate-ingest/src/wal/sorter.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/src/engine/provider/scan.rscrates/icegate-query/src/logql/datafusion/planner.rscrates/icegate-query/src/loki/formatters.rscrates/icegate-query/src/loki/predicate.rscrates/icegate-query/src/tempo/formatters.rscrates/icegate-query/src/tempo/trace_by_id.rscrates/icegate-query/src/traceql/datafusion/planner.rscrates/icegate-query/src/traceql/datafusion/planner_tests.rscrates/icegate-query/src/traceql/datafusion/selectors.rscrates/icegate-query/tests/loki/grouping.rscrates/icegate-query/tests/loki/harness.rscrates/icegate-query/tests/loki/trace.rscrates/icegate-query/tests/loki/unwrap.rscrates/icegate-query/tests/tempo/harness.rscrates/icegate-query/tests/tempo/tags.rscrates/icegate-queue/src/config.rscrates/icegate-queue/src/writer.rs
…with `FixedSizeBinary` support for efficient Parquet storage, query optimization, and validation. Includes schema changes, migration guide, planner updates, and test coverage.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
crates/icegate-common/src/SCHEMA.md (1)
497-499: ⚡ Quick winAdd the Migration Guide to the Table of Contents.
The new section starts at Line 497 but isn’t discoverable from the ToC near Lines 11-20. Please add a
Migration Guideentry there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-common/src/SCHEMA.md` around lines 497 - 499, The Table of Contents in SCHEMA.md is missing an entry for the new "Migration Guide" section; update the ToC near the top to include a "Migration Guide" link that points to the "Migration Guide" heading (which contains the "Trace and Span ID storage change (GH-135)" subsection) so readers can navigate to that new section from the ToC. Ensure the ToC entry uses the same heading text and markdown link style as other entries in the file.crates/icegate-query/src/traceql/datafusion/planner_tests.rs (1)
566-570: ⚡ Quick winInline
paddedconstruction duplicatespad_id— use the helper instead.Five places across three test functions manually reproduce the
pad_idbyte-copy logic instead of calling it. Ifpad_idsemantics change, these diverge silently.♻️ Replace all five occurrences (example shown for `child_only_filter_returns_full_trace_so_root_is_present`; same pattern applies to `limit_caps_traces_not_span_rows` and `spss_one_keeps_root_over_child`)
- let trace_a = { - let mut padded = [0u8; 16]; - padded[..7].copy_from_slice(b"trace-A"); - padded.to_vec() - }; + let trace_a = pad_id::<16>("trace-A").to_vec(); // ... - let trace_b = { - let mut padded = [0u8; 16]; - padded[..7].copy_from_slice(b"trace-B"); - padded.to_vec() - }; + let trace_b = pad_id::<16>("trace-B").to_vec();Also applies to: 583-587, 625-629, 691-695, 696-700
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs` around lines 566 - 570, Replace the manual 16-byte padding logic used to build test trace IDs with the shared helper pad_id to avoid duplicated semantics: locate the test-local constructions (e.g., the trace_a creation in child_only_filter_returns_full_trace_so_root_is_present and the similar blocks in limit_caps_traces_not_span_rows and spss_one_keeps_root_over_child) and call pad_id(b"trace-A") (or the appropriate raw id bytes) instead of creating a local padded array and copy_from_slice; ensure the return type matches (Vec<u8> or &[u8]) expected by the tests and update the five occurrences accordingly so all tests use pad_id consistently.crates/icegate-query/benches/common/harness.rs (1)
245-259: ⚡ Quick winExtract a shared helper to eliminate the triplicated trace/span builder block.
The identical ~11-line decode → build → append → finish pattern appears in all three writer functions. Since this schema is actively evolving (this PR is one step in a migration series), keeping it in three places means all three need updating on the next change. A small helper removes that risk and is also the right place to switch to
with_capacity, which is consistent with how the production ingest path allocates these builders (FixedSizeBinaryBuilder::with_capacity(total_records, 16)incrates/icegate-ingest/src/transform.rs).♻️ Proposed refactor
+// Helper used by all three writer functions +fn build_trace_span_arrays(count: usize) -> (ArrayRef, ArrayRef) { + let trace_id_bytes = hex::decode("0102030405060708090a0b0c0d0e0f10").expect("trace_id hex"); + let span_id_bytes = hex::decode("0102030405060708").expect("span_id hex"); + let mut trace_id_builder = FixedSizeBinaryBuilder::with_capacity(count, 16); + let mut span_id_builder = FixedSizeBinaryBuilder::with_capacity(count, 8); + for _ in 0..count { + trace_id_builder.append_value(&trace_id_bytes).expect("trace_id length 16"); + span_id_builder.append_value(&span_id_bytes).expect("span_id length 8"); + } + ( + Arc::new(trace_id_builder.finish()) as ArrayRef, + Arc::new(span_id_builder.finish()) as ArrayRef, + ) +}Then in each writer:
- // trace_id / span_id are FIXED_LEN_BYTE_ARRAY (16 / 8) in the logs - // schema; decode the hex fixtures into raw bytes once and reuse them - // for every row. - let trace_id_bytes = hex::decode("0102030405060708090a0b0c0d0e0f10").expect("trace_id hex"); - let span_id_bytes = hex::decode("0102030405060708").expect("span_id hex"); - let mut trace_id_builder = FixedSizeBinaryBuilder::new(16); - let mut span_id_builder = FixedSizeBinaryBuilder::new(8); - - for _ in 0..count { - trace_id_builder.append_value(&trace_id_bytes).expect("trace_id length 16"); - span_id_builder.append_value(&span_id_bytes).expect("span_id length 8"); - } - - let trace_id_arr: ArrayRef = Arc::new(trace_id_builder.finish()); - let span_id_arr: ArrayRef = Arc::new(span_id_builder.finish()); + let (trace_id_arr, span_id_arr) = build_trace_span_arrays(count);Also applies to: 382-396, 509-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/benches/common/harness.rs` around lines 245 - 259, Extract a small helper (e.g., fn build_fixed_binary_array(hex: &str, count: usize, byte_width: usize) -> ArrayRef) to replace the repeated decode→build→append→finish logic found around the trace/span builder blocks (uses FixedSizeBinaryBuilder currently constructed via new in those writers); inside the helper decode the hex once, create the builder with FixedSizeBinaryBuilder::with_capacity(count, byte_width), append the decoded bytes count times, finish and return Arc::new(builder.finish()), then call this helper for both trace_id and span_id in each writer to remove duplication and switch to with_capacity.crates/icegate-ingest/src/wal/columns.rs (1)
534-552: ⚡ Quick winAdd a descending-sort test case for the
FixedBytespath.All
fixed_bytes_descriptorcalls in the new tests passdescending=false. Thecompare_option_ordinversion is verified forUtf8/TimestampMicrosecondelsewhere, but a single descending check here would pin the new plumbing end-to-end. The test helper already acceptsdescending: bool, so the addition is trivial.✅ Suggested addition to
sort_column_cache_supports_fixed_bytes_sortor as a new test+ #[test] + fn sort_column_cache_supports_fixed_bytes_sort_descending() { + let trace_a = [b'a', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; + let trace_b = [b'b', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; + + // Row 0 = trace_b, row 1 = trace_a; descending → b > a, so row 0 sorts first. + let batch = fixed_bytes_batch(&[Some(trace_b), Some(trace_a)]); + let descriptor = fixed_bytes_descriptor(true, false); // descending=true + let cache = SortColumnCache::try_new(&batch, &descriptor, "fixed-bytes-desc").expect("cache"); + + // In descending order, trace_b (row 0) ranks before trace_a (row 1). + assert_eq!(cache.compare_indices(0, 1), Ordering::Less); + assert_eq!(cache.compare_indices(1, 0), Ordering::Greater); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/wal/columns.rs` around lines 534 - 552, The test sort_column_cache_supports_fixed_bytes_sort only covers ascending order; add a descending-case (either extend the test or a new one) by creating a descriptor with descending=true via fixed_bytes_descriptor(true, true), build a SortColumnCache with SortColumnCache::try_new using the same fixed_bytes_batch input, and assert that compare_indices yields reversed Ordering results (e.g., row 0 vs 1 is Less instead of Greater) and that stable tie-breaking still behaves correctly for equal rows; reference the functions fixed_bytes_batch, fixed_bytes_descriptor, SortColumnCache::try_new, and compare_indices to locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/icegate-ingest/src/shift/config.rs`:
- Line 210: ShiftWriteConfig::row_group_size default (20_000) is out of sync
with QueueCommonConfig::max_row_group_size (16_384); update the default for
QueueCommonConfig::max_row_group_size to 20_000 so both defaults match. Locate
the QueueCommonConfig struct and change its max_row_group_size default value to
20_000, and run/build tests to ensure no other code assumes the old limit.
In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs`:
- Around line 852-858: The test matching_tenant_trace_id_filter_returns_rows
uses a weak assertion assert!(total > 0) which should be tightened to assert the
exact expected row count; update the assertion in that test (which calls
fixture_session(), make_query_ctx("t1") and run_and_count(...)) to
assert_eq!(total, 1, "expected exactly one span for trace_id
pad_id::<16>(\"t1-trace-1\") in tenant t1") so it fails if extra rows (e.g.,
from tenant leakage) are returned.
---
Nitpick comments:
In `@crates/icegate-common/src/SCHEMA.md`:
- Around line 497-499: The Table of Contents in SCHEMA.md is missing an entry
for the new "Migration Guide" section; update the ToC near the top to include a
"Migration Guide" link that points to the "Migration Guide" heading (which
contains the "Trace and Span ID storage change (GH-135)" subsection) so readers
can navigate to that new section from the ToC. Ensure the ToC entry uses the
same heading text and markdown link style as other entries in the file.
In `@crates/icegate-ingest/src/wal/columns.rs`:
- Around line 534-552: The test sort_column_cache_supports_fixed_bytes_sort only
covers ascending order; add a descending-case (either extend the test or a new
one) by creating a descriptor with descending=true via
fixed_bytes_descriptor(true, true), build a SortColumnCache with
SortColumnCache::try_new using the same fixed_bytes_batch input, and assert that
compare_indices yields reversed Ordering results (e.g., row 0 vs 1 is Less
instead of Greater) and that stable tie-breaking still behaves correctly for
equal rows; reference the functions fixed_bytes_batch, fixed_bytes_descriptor,
SortColumnCache::try_new, and compare_indices to locate and implement the
change.
In `@crates/icegate-query/benches/common/harness.rs`:
- Around line 245-259: Extract a small helper (e.g., fn
build_fixed_binary_array(hex: &str, count: usize, byte_width: usize) ->
ArrayRef) to replace the repeated decode→build→append→finish logic found around
the trace/span builder blocks (uses FixedSizeBinaryBuilder currently constructed
via new in those writers); inside the helper decode the hex once, create the
builder with FixedSizeBinaryBuilder::with_capacity(count, byte_width), append
the decoded bytes count times, finish and return Arc::new(builder.finish()),
then call this helper for both trace_id and span_id in each writer to remove
duplication and switch to with_capacity.
In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs`:
- Around line 566-570: Replace the manual 16-byte padding logic used to build
test trace IDs with the shared helper pad_id to avoid duplicated semantics:
locate the test-local constructions (e.g., the trace_a creation in
child_only_filter_returns_full_trace_so_root_is_present and the similar blocks
in limit_caps_traces_not_span_rows and spss_one_keeps_root_over_child) and call
pad_id(b"trace-A") (or the appropriate raw id bytes) instead of creating a local
padded array and copy_from_slice; ensure the return type matches (Vec<u8> or
&[u8]) expected by the tests and update the five occurrences accordingly so all
tests use pad_id consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cfe6f8ca-ed2a-4092-be07-922d2a6bfac8
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
crates/icegate-common/Cargo.tomlcrates/icegate-common/src/SCHEMA.mdcrates/icegate-common/src/parquet_encoding.rscrates/icegate-ingest/src/shift/config.rscrates/icegate-ingest/src/wal/boundary.rscrates/icegate-ingest/src/wal/columns.rscrates/icegate-query/benches/common/harness.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/src/logql/datafusion/planner.rscrates/icegate-query/src/loki/predicate.rscrates/icegate-query/src/tempo/formatters.rscrates/icegate-query/src/traceql/datafusion/planner_tests.rscrates/icegate-query/tests/tempo/harness.rs
✅ Files skipped from review due to trivial changes (1)
- crates/icegate-common/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (4)
- crates/icegate-ingest/src/wal/boundary.rs
- crates/icegate-query/tests/tempo/harness.rs
- crates/icegate-common/src/parquet_encoding.rs
- crates/icegate-query/src/tempo/formatters.rs
…eamline `trace_id` handling, enhance tests, and update `max_row_group_size` config.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/kustomize/base/cadvisor/servicemonitor.yaml`:
- Around line 105-119: The fallback relabel rules that map cgroup `id`→`uid` are
unconditional and can overwrite an existing pod UID; change each of the three
rules so they only run when the real pod UID is absent/empty (the label from
container_label_io_kubernetes_pod_uid). Concretely, add
container_label_io_kubernetes_pod_uid as the first sourceLabel in each rule
(e.g. sourceLabels: [container_label_io_kubernetes_pod_uid, id]) and adjust the
regex to require the first capture to be empty before matching the cgroup path
(so the replacement only triggers when container_label_io_kubernetes_pod_uid is
empty), keeping the same targetLabel: uid and replacement for the second
capture.
In `@crates/icegate-common/src/SCHEMA.md`:
- Around line 539-552: Update the root-span sentinel example in SCHEMA.md so the
hex literal is portable across SQL dialects: replace the underscored form
`X'00000000_00000000'` with a plain 16-hex-digit literal `X'0000000000000000'`
in the sentence that suggests checking `parent_span_id IS NULL OR parent_span_id
= ...` (referencing the `parent_span_id` check in the "Root-span sentinel"
section) and ensure the surrounding wording still indicates this represents 8
zero bytes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fc8cf97c-0825-432b-bfd1-bf1e01537cb6
📒 Files selected for processing (12)
config/kustomize/base/cadvisor/daemonset.yamlconfig/kustomize/base/cadvisor/kustomization.yamlconfig/kustomize/base/cadvisor/service.yamlconfig/kustomize/base/cadvisor/serviceaccount.yamlconfig/kustomize/base/cadvisor/servicemonitor.yamlconfig/kustomize/base/kustomization.yamlconfig/kustomize/base/values-prometheus.yamlcrates/icegate-common/src/SCHEMA.mdcrates/icegate-ingest/src/wal/columns.rscrates/icegate-query/benches/common/harness.rscrates/icegate-query/src/traceql/datafusion/planner_tests.rscrates/icegate-queue/src/config.rs
✅ Files skipped from review due to trivial changes (4)
- config/kustomize/base/cadvisor/serviceaccount.yaml
- crates/icegate-queue/src/config.rs
- crates/icegate-query/src/traceql/datafusion/planner_tests.rs
- crates/icegate-ingest/src/wal/columns.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/icegate-query/benches/common/harness.rs
…odule, consolidate `build_writer_properties`, and streamline column encoding definitions.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/icegate-common/src/parquet_encoding.rs (1)
102-112: 💤 Low valueNested
links[].trace_idandlinks[].span_idwon't receive theDELTA_BYTE_ARRAYoverride.
build_writer_propertiescallsColumnPath::from(col_name)which creates a single-segment path (e.g.["trace_id"]). This matches the root-leveltrace_id(Iceberg field 4) correctly. However thelinkslist struct contains a secondtrace_id(field ID 33) andspan_id(field ID 34) whose physical Parquet paths are multi-segment (e.g.links.list.element.trace_id), so the override is silently skipped for those columns and they fall back to plain+RLE_DICTIONARY. Since span links are uncommon and typically low-volume this is a minor optimization gap, not a correctness issue.If closing the gap becomes worthwhile, the nested paths can be added to
SPANS_COLUMN_ENCODINGSusing the appropriate dotted segment form that Iceberg emits for list-of-struct fields.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/icegate-common/src/parquet_encoding.rs` around lines 102 - 112, SPANS_COLUMN_ENCODINGS currently only lists root-level columns so build_writer_properties (which uses ColumnPath::from) won't match nested list-of-struct fields; add entries for the nested links fields using the dotted multi-segment form Iceberg emits (e.g. "links.list.element.trace_id" and "links.list.element.span_id") to SPANS_COLUMN_ENCODINGS and assign them Encoding::DELTA_BYTE_ARRAY so those nested columns receive the intended override; keep the existing root-level entries intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/icegate-common/src/parquet_encoding.rs`:
- Around line 102-112: SPANS_COLUMN_ENCODINGS currently only lists root-level
columns so build_writer_properties (which uses ColumnPath::from) won't match
nested list-of-struct fields; add entries for the nested links fields using the
dotted multi-segment form Iceberg emits (e.g. "links.list.element.trace_id" and
"links.list.element.span_id") to SPANS_COLUMN_ENCODINGS and assign them
Encoding::DELTA_BYTE_ARRAY so those nested columns receive the intended
override; keep the existing root-level entries intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ca1e05f-fb32-4b46-8449-85fdcafa5ab8
📒 Files selected for processing (11)
crates/icegate-common/src/lib.rscrates/icegate-common/src/parquet_encoding.rscrates/icegate-common/src/parquet_writer.rscrates/icegate-common/src/schema.rscrates/icegate-ingest/src/shift/iceberg_storage.rscrates/icegate-ingest/src/shift/mod.rscrates/icegate-ingest/src/transform.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/tests/tempo/harness.rscrates/icegate-query/tests/tempo/tags.rscrates/icegate-queue/src/writer.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- crates/icegate-queue/src/writer.rs
- crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs
- crates/icegate-query/tests/tempo/tags.rs
- crates/icegate-query/tests/tempo/harness.rs
- crates/icegate-ingest/src/transform.rs
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'IceGate Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.
| Benchmark suite | Current: 2da0853 | Previous: 1972421 | Ratio |
|---|---|---|---|
write_performance/large_batches |
2431235279 ns/iter (± 32966219) |
501499979 ns/iter (± 211080) |
4.85 |
This comment was automatically generated by workflow using github-action-benchmark.
…rking adjustment
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/icegate-queue/benches/queue_s3_bench.rs (1)
95-100:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the stale comment to match the new batch size.
The comment on line 95 still says
10,000 recordsbut the actual argument on line 100 is now21_000.✏️ Proposed fix
- // Benchmark 2: Large batches - 10 batches x 10,000 records + // Benchmark 2: Large batches - 10 batches x 21,000 records🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/icegate-queue/benches/queue_s3_bench.rs` around lines 95 - 100, Update the stale comment for the "large_batches" benchmark to reflect the actual batch size used: change the text that says "10,000 records" to "21,000 records" (or otherwise indicate "10 batches x 21,000 records") so the comment matches the call to generate_batches_for_throughput(10, 21_000) inside group.bench_function("large_batches") and the surrounding benchmark setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/icegate-queue/benches/queue_s3_bench.rs`:
- Around line 95-100: Update the stale comment for the "large_batches" benchmark
to reflect the actual batch size used: change the text that says "10,000
records" to "21,000 records" (or otherwise indicate "10 batches x 21,000
records") so the comment matches the call to generate_batches_for_throughput(10,
21_000) inside group.bench_function("large_batches") and the surrounding
benchmark setup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f9a9c896-e50d-4462-b4c4-26c990f63e4b
📒 Files selected for processing (14)
config/helm/icegate/templates/configmap-ingest.yamlconfig/helm/icegate/values.yamlcrates/icegate-ingest/src/cli/commands/run.rscrates/icegate-ingest/src/shift/config.rscrates/icegate-ingest/src/shift/iceberg_storage.rscrates/icegate-ingest/src/shift/mod.rscrates/icegate-ingest/src/shift/shift_runner.rscrates/icegate-ingest/src/wal/boundary.rscrates/icegate-ingest/src/wal/columns.rscrates/icegate-ingest/src/wal/sorter.rscrates/icegate-ingest/src/wal/test_utils.rscrates/icegate-queue/benches/queue_s3_bench.rscrates/icegate-queue/src/config.rscrates/icegate-queue/src/writer.rs
✅ Files skipped from review due to trivial changes (1)
- crates/icegate-ingest/src/shift/shift_runner.rs
🚧 Files skipped from review as they are similar to previous changes (10)
- crates/icegate-queue/src/config.rs
- config/helm/icegate/templates/configmap-ingest.yaml
- config/helm/icegate/values.yaml
- crates/icegate-ingest/src/wal/sorter.rs
- crates/icegate-ingest/src/shift/mod.rs
- crates/icegate-ingest/src/cli/commands/run.rs
- crates/icegate-ingest/src/wal/test_utils.rs
- crates/icegate-queue/src/writer.rs
- crates/icegate-ingest/src/shift/config.rs
- crates/icegate-ingest/src/shift/iceberg_storage.rs
…e_s3_bench` for performance tuning
…n for consistency
Summary by CodeRabbit
New Features
Configuration
Schema Changes