GH-119: Tempo endpoint with TraceQL query#123
Conversation
Implements /api/search/tags (v1), /api/v2/search/tags, and
/api/search/tag/{name}/values on top of the TraceQL-search-capable
Tempo module from 562ad71.
Generalises the former log_metadata_scan module into a table-agnostic
metadata_scan driven by MetadataScanConfig so logs, spans, and future
metrics share the same parquet-dictionary-direct label discovery path.
- New MetadataScanConfig captures indexed columns, aliases, excluded map
keys, and the map-column name per table; scan_labels / scan_label_values
take this config plus an optional extra iceberg Predicate.
- LogQL selector translation moved from the shared module to loki/predicate.rs.
- tempo/metadata.rs scans the post-split resource_attributes and
span_attributes maps separately, mapping each into its TraceQL scope;
tag-value lookup routes by dotted prefix (resource.X / span.X).
- Tempo TraceQL intrinsic tags (name, status, kind, duration, ...) and
span column constant lists added to icegate-common/schema.rs.
Includes 7 Parquet-backed integration tests for tempo tag endpoints.
All 367 lib tests, 46 loki integration tests, and 7 tempo integration
tests pass. Clippy -D warnings clean, fmt clean.
…lti-root traces, and enforce per-trace span caps (`spss`)
Integrates handling for `events` and `links` in span formatting for both OTLP JSON and protobuf outputs. Includes utilities for decoding, encoding, and clamping invalid values. Adjusts tests to verify proper propagation of `droppedEventsCount` and `droppedLinksCount`.
…les (`logs`, `spans`) and introduce allowlist-based dispatch.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughAdds TraceQL (parser, AST, planner, DataFusion execution), implements Tempo HTTP APIs (trace lookup, search, tag discovery), refactors Parquet metadata label discovery, splits span attributes into resource/span maps across ingest/transform/storage, injects OTEL trace context into logs/tracing output, and enables Parquet column bloom filters for trace/span lookup. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TempoHandler as Tempo API
participant Parser as TraceQL Parser
participant Planner as DataFusion Planner
participant Engine as Query Engine
participant Iceberg as Iceberg/Parquet
participant Formatter as OTLP Formatter
Client->>TempoHandler: POST /api/search (TraceQL)
TempoHandler->>Parser: parse(query)
Parser-->>TempoHandler: TraceQLExpr
TempoHandler->>Planner: plan(expr, QueryContext)
Planner->>Engine: scan spans (tenant/time + selector)
Engine->>Iceberg: read filtered RecordBatches
Iceberg-->>Engine: RecordBatch[]
Engine-->>Planner: batches
Planner->>Planner: aggregate by trace_id (limit)
Planner->>Engine: fetch traces by trace_id IN (...)
Engine->>Iceberg: read full traces
Iceberg-->>Engine: RecordBatch[]
Planner->>Formatter: spansets_to_search_response(batches)
Formatter-->>TempoHandler: SearchResponse
TempoHandler-->>Client: 200 + JSON/Proto
sequenceDiagram
participant Client
participant TempoHandler as Tempo API
participant MetadataScan as metadata_scan
participant Parquet as Parquet reader
participant Catalog as Iceberg catalog
Client->>TempoHandler: GET /api/search/tags
TempoHandler->>MetadataScan: scan_labels(table, window, extra_pred)
MetadataScan->>Catalog: plan files (tenant/time predicate)
MetadataScan->>Parquet: read row-group stats (null_count)
Parquet-->>MetadataScan: indexed-column presence
MetadataScan->>Parquet: read dictionary pages for map.key
Parquet-->>MetadataScan: distinct map keys
MetadataScan->>TempoHandler: merged tag set
TempoHandler-->>Client: 200 + TagsV2Response
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (18)
crates/icegate-query/src/traceql/common.rs (2)
255-259: Optional: cover allKindValuevariants in the OTLP-code test.The current test only asserts
ServerandClient. Since the encoding is hand-maintained, an off-by-one onUnspecified,Internal,Producer, orConsumerwould slip through. Adding the remaining four assertions is a one-line each safety net (or moot if you adopt#[repr(i32)]).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/common.rs` around lines 255 - 259, The test kind_otlp_code_matches_otlp_spec currently only checks Server and Client; extend it to assert all KindValue variants map to their OTLP integer codes: assert KindValue::Unspecified.otlp_code() == 0, KindValue::Internal.otlp_code() == 1, KindValue::Server.otlp_code() == 2, KindValue::Client.otlp_code() == 3, KindValue::Producer.otlp_code() == 4, and KindValue::Consumer.otlp_code() == 5 so any off-by-one or missing mapping is caught (update the test function kind_otlp_code_matches_otlp_spec to include these four additional assertions).
166-220: Optional: derive numeric encodings via#[repr(i32)]to keep schema and enum in lockstep.
StatusValue::otlp_codeandKindValue::otlp_codehand-maintain numeric mappings that are also implicit in the OTLP enum ordering. Annotating each enum with#[repr(i32)]and replacing the match with*self as i32would remove the duplication and make schema drift impossible.♻️ Proposed refactor
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[repr(i32)] pub enum StatusValue { - /// `ok` - Ok, - /// `error` - Error, - /// `unset` - Unset, + /// `unset` + Unset = 0, + /// `ok` + Ok = 1, + /// `error` + Error = 2, } impl StatusValue { #[must_use] - pub const fn otlp_code(&self) -> i32 { - match self { - Self::Unset => 0, - Self::Ok => 1, - Self::Error => 2, - } - } + pub const fn otlp_code(self) -> i32 { self as i32 } }(Same shape for
KindValue.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/common.rs` around lines 166 - 220, StatusValue::otlp_code and KindValue::otlp_code manually duplicate numeric mappings; add #[repr(i32)] to the StatusValue and KindValue enum definitions and replace the match bodies in their otlp_code methods with a simple cast (return *self as i32) so the numeric encoding is derived from the enum layout; update the otlp_code functions (still const fn) to perform the cast and remove the match arms for StatusValue::otlp_code and KindValue::otlp_code.crates/icegate-query/src/traceql/datafusion/mod.rs (1)
11-16: Optional: drop redundant prefixes onpub(crate)re-exports.The
pipeline_/selectors_prefixes duplicate information already implicit in the module path. Since these arepub(crate), callers can disambiguate via the import path (e.g.,use crate::traceql::datafusion::{pipeline::field_ref_to_group_key, selectors::intrinsic_column}) or via local aliases at the import site. The re-export aliases here just spread the naming convention across the crate.If you want to keep a single grouped re-export, consider:
♻️ Suggested simplification
-// Re-exports consumed by the metrics planner (and any downstream callers -// that need a uniform field-ref → group-key conversion). -pub(crate) use pipeline::field_ref_to_group_key as pipeline_field_ref_to_group_key; pub use planner::DataFusionPlanner; -pub(crate) use selectors::{ - intrinsic_column as selectors_intrinsic_column, resource_attribute_lhs as selectors_resource_attribute_lhs, - span_attribute_lhs as selectors_span_attribute_lhs, -}; +pub(crate) use pipeline::field_ref_to_group_key; +pub(crate) use selectors::{intrinsic_column, resource_attribute_lhs, span_attribute_lhs};Non-blocking — the current naming is consistent and readable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/mod.rs` around lines 11 - 16, The current pub(crate) re-exports add redundant prefixes (pipeline_ and selectors_) that duplicate module path info; update the re-exports in mod.rs to use the original symbol names instead of aliased ones so callers can rely on the module path for disambiguation — e.g., stop exporting pipeline::field_ref_to_group_key as pipeline_field_ref_to_group_key and selectors::intrinsic_column/resource_attribute_lhs/span_attribute_lhs as selectors_intrinsic_column/selectors_resource_attribute_lhs/selectors_span_attribute_lhs, and instead re-export them directly (keep DataFusionPlanner export as-is).crates/icegate-query/src/loki/mod.rs (1)
43-48: Optional: avoid driftingexcluded_map_keysbetween configs.
LOGS_VALUES_METADATA_CONFIGreusesLOGS_METADATA_CONFIG.label_aliases(good) but inlinesexcluded_map_keys: &[]and re-statesmap_column. If the design intent is "values config is the labels config with no exclusions and a wider indexed-column set", consider deriving more of it fromLOGS_METADATA_CONFIGso future edits to aliases/map_columnonly need to change one site:♻️ Optional consolidation
const LOGS_VALUES_METADATA_CONFIG: MetadataScanConfig = MetadataScanConfig { indexed_columns: icegate_common::schema::LOG_INDEXED_ATTRIBUTE_COLUMNS, label_aliases: LOGS_METADATA_CONFIG.label_aliases, excluded_map_keys: &[], - map_column: icegate_common::schema::COL_ATTRIBUTES, + map_column: LOGS_METADATA_CONFIG.map_column, };Non-blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/loki/mod.rs` around lines 43 - 48, LOGS_VALUES_METADATA_CONFIG duplicates parts of LOGS_METADATA_CONFIG (it inlines excluded_map_keys: &[] and re-states map_column), which can drift; change LOGS_VALUES_METADATA_CONFIG to build from LOGS_METADATA_CONFIG by copying its label_aliases and map_column (or cloning/using a struct update pattern) and only override indexed_columns and excluded_map_keys to &[] so future updates to LOGS_METADATA_CONFIG.label_aliases or map_column are reflected automatically; specifically adjust the MetadataScanConfig construction for LOGS_VALUES_METADATA_CONFIG to derive fields from LOGS_METADATA_CONFIG instead of repeating them.crates/icegate-query/src/traceql/Makefile (1)
3-16: Mark phony targets and userm -ffor idiomatic cleanup.
install,gen, andcleanare all phony — they don't produce a file with their own name. Without a.PHONYdeclaration, they break the day someone happens to create aclean(orgen) file/dir in this directory. Static analysis already flagged this forclean. The|| truepattern incleanis also better expressed withrm -f(idempotent on missing files, and letsmakesee real failures).♻️ Proposed diff
ANTLR4_JAR = ../../../../target/antlr4.jar +.PHONY: install gen clean + install: wget -O $(ANTLR4_JAR) https://github.com/antlr4rust/antlr4/releases/download/v0.5.0/antlr4-4.13.3-SNAPSHOT-complete.jar chmod +x $(ANTLR4_JAR) gen: antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4 java -jar $(ANTLR4_JAR) -Dlanguage=Rust -visitor antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4 sed -i.bak 's/TraceQLParserParserContext/TraceQLParserContext/g' antlr/traceqlparser.rs && rm antlr/traceqlparser.rs.bak # https://github.com/rrevenantt/antlr4rust/pull/53 for f in antlr/traceql*.rs; do sed -i.bak '1s/^/#![allow(clippy::all, clippy::pedantic, clippy::nursery, clippy::unwrap_used)]\n/' $$f && rm $$f.bak; done clean: - rm antlr/traceql*.rs || true - rm antlr/*.tokens || true - rm antlr/*.interp || true - rm -rf antlr/gen || true + rm -f antlr/traceql*.rs + rm -f antlr/*.tokens + rm -f antlr/*.interp + rm -rf antlr/gen🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/Makefile` around lines 3 - 16, Make the Makefile targets robust by declaring the phony targets and using idiomatic rm flags: add a .PHONY line listing install, gen, and clean, and replace the "rm ... || true" patterns in the clean target with proper rm flags (e.g., rm -f for files and rm -rf for directories) so failures are surfaced and removals are idempotent; update the clean rule operating on patterns like antlr/traceql*.rs, antlr/*.tokens, antlr/*.interp, and antlr/gen accordingly and keep the existing behavior for non-fatal removals.crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs (1)
326-341: Inconsistent handling of malformed dictionary offsets — stats fallback is skipped.When
dictionary_page_offset()returnsSomebutu64::try_from(...)fails on the offset, the code incrementsno_dict_no_statsandcontinues the outer loop, skipping the stats fallback at lines 343–353. In contrast, whendict_end <= dict_start, control falls through to the stats fallback. This asymmetry is subtle: for a corrupted/negative offset we miss a viable enumeration via row-group statistics.Negative offsets in a valid Parquet file are unlikely, but if you want consistency consider falling through to stats in all "dict path failed" branches:
♻️ Proposed restructure
- // Try dictionary page first. - if let Some(dict_offset) = col_chunk.dictionary_page_offset() { - let data_offset = col_chunk.data_page_offset(); - let Ok(dict_start) = u64::try_from(dict_offset) else { - no_dict_no_stats += 1; - continue; - }; - let Ok(dict_end) = u64::try_from(data_offset) else { - no_dict_no_stats += 1; - continue; - }; - if dict_end > dict_start { - ranges.push(dict_start..dict_end); - selected_rgs.push(rg_idx); - continue; - } - } + // Try dictionary page first; fall through to stats if any + // offset is malformed or the dict region is empty. + let dict_range = col_chunk.dictionary_page_offset().and_then(|d| { + let dict_start = u64::try_from(d).ok()?; + let dict_end = u64::try_from(col_chunk.data_page_offset()).ok()?; + (dict_end > dict_start).then_some(dict_start..dict_end) + }); + if let Some(range) = dict_range { + ranges.push(range); + selected_rgs.push(rg_idx); + continue; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs` around lines 326 - 341, The code in parquet_reader.rs treats failures converting dictionary offsets (calls to u64::try_from on dict_offset or data_offset) by incrementing no_dict_no_stats and continuing, which skips the later row-group stats fallback; instead, when conversion fails for dict_start or dict_end, do not early-continue—increment no_dict_no_stats but allow control to fall through to the existing stats-path (the same path taken when dict_end <= dict_start). Update the block that handles col_chunk.dictionary_page_offset() so failures converting to u64 for dict_start or dict_end do not execute continue; keep updates to no_dict_no_stats and then let the function proceed to the stats fallback (affecting symbols: col_chunk.dictionary_page_offset(), dict_start, dict_end, no_dict_no_stats, ranges, selected_rgs, rg_idx).crates/icegate-query/src/traceql/datafusion/planner_tests.rs (2)
196-210: Window lower bound is hard-coded to2026-01-01.
make_query_ctxwill silently filter out the entire fixture if these tests ever run on a system clock earlier than 2026 (e.g., during a date-shifted CI replay or a developer's misconfigured machine). Consider basing the window onUtc::now() - Duration::days(1) ..= Utc::now() + Duration::days(1)so the tests track wall-clock rather than depending on calendar drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs` around lines 196 - 210, The test QueryContext in make_query_ctx currently hard-codes start/end to 2026/2099 which can drop fixtures on systems with different clocks; update make_query_ctx to compute start as Utc::now() - Duration::days(1) and end as Utc::now() + Duration::days(1) (using chrono::Duration) so the QueryContext (tenant_id, start, end, limit, spans_per_spanset, etc.) uses a relative wall-clock window instead of fixed calendar dates; ensure you import or qualify Duration and keep other fields (spans_per_spanset, max_grid_points) unchanged.
270-353: Add regex-operator coverage.The attribute-routing block covers
=thoroughly (resource/span/Any-scope, indexed-column short-circuit, missing-key NULL semantics) but not=~/!~. Adding even one case like{ name =~ "GET.*" }and one attribute regex like{ resource.service.name =~ "fro.*" }would have caught thecast_to(empty DFSchema)issue flagged inselectors.rs::regex_match.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs` around lines 270 - 353, Add tests exercising the regex operators =~ and !~ to cover attribute routing and catch the cast_to(empty DFSchema) bug in selectors.rs::regex_match: create new async tests in planner_tests.rs similar to resource_service_name_quoted_eq_returns_matching_rows that call fixture_session(), make_query_ctx("t1") and run_and_count(ctx, qctx, "{ name =~ \"GET.*\" }") and "{ resource.service.name =~ \"fro.*\" }" (also consider unquoted RHS like "{ name =~ GET.* }"), asserting the expected row counts; ensure you include at least one attribute-scope, one resource-scope, and a negative/!~ case so regex routing/NULL semantics are exercised and will trigger the selector code path that previously hit cast_to(empty DFSchema).crates/icegate-query/tests/tempo/tags.rs (1)
1-3: Optional: extend the module doc comment.The doc lists
/api/search/tags,/api/v2/search/tags,/api/search/tag/{name}/values, but the file also covers/api/v2/search/tag/{name}/values(the typed v2 value endpoint withinspectedBytesas string and keyword-typed status enum). Worth mentioning so a future reader doesn't think v2-values is untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/tests/tempo/tags.rs` around lines 1 - 3, Update the module doc comment at the top of the tags.rs test module to also mention the `/api/v2/search/tag/{name}/values` endpoint; explicitly note that the file covers the typed v2 values endpoint (with `inspectedBytes` represented as a string and the `status` enum using keyword-typed variants) so readers know v2-values behavior is tested alongside `/api/search/tags`, `/api/v2/search/tags`, and `/api/search/tag/{name}/values`.crates/icegate-query/src/traceql/antlr/visitors.rs (1)
419-426: Silent truncation if grammar ever yields >2 operands.
pop_two_filtersonly consumesfilters[0]andfilters[1]. ANTLR binary rules typically emit exactly 2 children, so this is safe today, but if the grammar evolves to a left-recursive list (e.g.filter ('AND' filter)+), additional operands would be silently dropped. Considerassert_eq!(filters.len(), 2)in debug builds, or fold over the slice. Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/antlr/visitors.rs` around lines 419 - 426, pop_two_filters currently only reads filters[0] and filters[1], silently dropping any extra operands if the grammar ever produces >2 children; add a debug-only check to catch this regression by inserting something like debug_assert_eq!(filters.len(), 2) near the start of pop_two_filters (after the existing len < 2 error check), and keep using visit_span_filter for filters[0] and filters[1]; alternatively, if you want to support n-ary operands, change the logic to visit and fold the entire filters slice into a left-associative combined SpanFilter instead of indexing only [0] and [1].crates/icegate-query/src/engine/metadata_scan/values.rs (1)
105-118: Optional: apply the system-reserved guard symmetrically.
collect_indexed_values_via_dict(line 76) gates onis_system_reserved_value_column, butcollect_indexed_int_values_via_dictdoesn't. In practice the only reserved column istenant_idwhich is a string, so the int path can't currently leak — but the asymmetry weakens the "metadata-scan layer is the last line of defense" property in the doc on lines 30-33. Cheap to add for consistency.♻️ Proposed change
pub async fn collect_indexed_int_values_via_dict( reader: &mut ArrowFileReader, metadata: &ParquetMetaData, predicate: &Predicate, column_name: &str, out: &mut BTreeSet<i32>, ) -> Result<(), MetadataScanError> { + if is_system_reserved_value_column(column_name) { + return Ok(()); + } let schema = metadata.file_metadata().schema_descr();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/values.rs` around lines 105 - 118, collect_indexed_int_values_via_dict is missing the same system-reserved column guard used by collect_indexed_values_via_dict; add a call to is_system_reserved_value_column(column_name) at the start of collect_indexed_int_values_via_dict and return Ok(()) when it returns true so reserved columns are skipped consistently, then proceed to find leaf_idx and call parquet_reader::read_column_int_dictionaries as before (keep function signature and reader/metadata/predicate/out usage unchanged).crates/icegate-query/src/engine/metadata_scan/mod.rs (1)
283-293: LGTM — small operational note onplan_files.
stream.try_collect()materializes the fullVec<FileScanTask>before any per-file work begins. Combined withMAX_QUERY_WINDOW = 30 daysfromtempo::validation, a worst-case tenant could plan tens-of-thousands of tasks before the firstscan_labels_for_fileruns. For today's deployments this is fine; if metadata scans ever start hitting memory ceilings, switching to a streamedtry_buffered_unordered-style pipeline (no intermediateVec) is the natural next step.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 283 - 293, The current plan_files function collects the entire stream into a Vec via stream.try_collect(), which can materialize tens-of-thousands of FileScanTask items and spike memory; change plan_files to return a streaming/iterator-based pipeline instead of a Vec by processing the stream from scan.plan_files() as it yields items (e.g., replace try_collect() usage with a streamed approach using try_buffered_unordered or similar combinators) so callers can handle FileScanTask items incrementally; locate the function plan_files (takes &Table and &Predicate), the call to table.scan().with_filter(...).build() and the variable stream = scan.plan_files(), and modify the downstream usage to avoid creating an intermediate Vec<FileScanTask>.crates/icegate-query/src/tempo/validation.rs (1)
169-173: Redundanttrimmed.eq_ignore_ascii_case("tenant_id")clause.
last_segmentis computed astrimmed.rsplit('.').next().unwrap_or(trimmed). Whentrimmed == "tenant_id"(no dots),rsplit('.').next()returns the entire string"tenant_id", solast_segment.eq_ignore_ascii_case("tenant_id")already matches that case. The trailing|| trimmed.eq_ignore_ascii_case("tenant_id")clause never adds new coverage — safe to drop.♻️ Proposed simplification
- let trimmed = name.trim_start_matches('.'); - let last_segment = trimmed.rsplit('.').next().unwrap_or(trimmed); - if last_segment.eq_ignore_ascii_case("tenant_id") || trimmed.eq_ignore_ascii_case("tenant_id") { + let trimmed = name.trim_start_matches('.'); + let last_segment = trimmed.rsplit('.').next().unwrap_or(trimmed); + if last_segment.eq_ignore_ascii_case("tenant_id") { return Err(QueryError::Validation("tag name 'tenant_id' is reserved".into())); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/validation.rs` around lines 169 - 173, The check redundantly tests trimmed.eq_ignore_ascii_case("tenant_id") after computing last_segment from trimmed; remove the redundant clause so the reserved-name validation only checks last_segment.eq_ignore_ascii_case("tenant_id"), i.e., update the condition that returns QueryError::Validation("tag name 'tenant_id' is reserved") to rely solely on last_segment (remove the || trimmed.eq_ignore_ascii_case("tenant_id") part) in the function that performs Tempo tag validation (uses trimmed and last_segment).crates/icegate-query/src/tempo/formatters.rs (2)
172-175: Inconsistent overflow handling for timestamp → nanos conversion.These three call sites compute nanos via raw
* 1_000multiplication oni64:
- Line 172:
(start_micros * 1_000).to_string()- Line 174:
(e * 1_000).to_string()- Line 519:
(start_micros * 1_000).to_string()But the matched-span builder (lines 562-563) uses
saturating_mul, and the proto path (line 779) goes throughmicros_to_nanos_u64which at least clamps negatives. With realistic timestamps this never triggers, but a pathological/test fixture value would panic in debug builds. Worth normalizing on a single helper.♻️ Suggested approach
Add a small
micros_to_nanos_str(i64) -> Stringhelper that usessaturating_muland route all three sites (plus the matched-span builder) through it. The proto-sidemicros_to_nanos_u64could similarly usesaturating_mulon the final* 1_000to harden againsti64::MAXmicros wrappingu64.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/formatters.rs` around lines 172 - 175, Introduce a small helper micros_to_nanos_str(i64) -> String that converts microseconds to nanoseconds using saturating_mul(1_000) and returns the stringified result, then replace the direct `(start_micros * 1_000).to_string()` and `(e * 1_000).to_string()` call sites in the span insertion code and the matched-span builder with this helper; also harden the proto-side conversion function micros_to_nanos_u64 to use saturating_mul for the final multiplication/path that converts i64 micros to u64 nanos so all conversion sites share the same overflow-safe behavior.
235-247: Inconsistent error handling in*_optcolumn helpers:string_col_optandi32_col_optpanic on type mismatch, whilets_col_optandlist_col_optgracefully returnNone.
string_col_opt(line 235) callsas_string::<i32>()andi32_col_opt(line 242) callsas_primitive::<Int32Type>(), both of which panic if the column exists but has the wrong Arrow type. By contrast,ts_col_opt(line 256) andlist_col_opt(line 277) useas_any().downcast_ref()and returnNoneon type mismatch.For consistency across the
*_optfamily, refactorstring_col_optandi32_col_optto useas_any().downcast_ref::<StringArray>()andas_any().downcast_ref::<Int32Array>()respectively, mirroring the graceful behavior of their peers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/formatters.rs` around lines 235 - 247, The helpers string_col_opt and i32_col_opt currently call as_string::<i32>() and as_primitive::<Int32Type>() which panic on type mismatch; change both to the safe downcast pattern used by ts_col_opt/list_col_opt: after finding the column index via batch.schema().column_with_name(name) call batch.column(idx).as_any().downcast_ref::<StringArray>() for string_col_opt and .downcast_ref::<Int32Array>() for i32_col_opt, and return that Option so a wrong Arrow type yields None instead of a panic.crates/icegate-query/src/tempo/models.rs (1)
88-92:#[allow(dead_code)]here is likely unnecessary.
max_stale_valuesis read by Serde during deserialization, so the compiler should not flag it as dead code, even with the repo-widedead_code = "deny"lint. Removing the suppression keeps the deny lint honest — and if the compiler ever does start flagging the field, that's a real signal something has changed. If you want to make the "intentionally ignored" intent explicit, prefer#[serde(rename = "maxStaleValues")] _max_stale_values: ...or drop the field altogether and use#[serde(deny_unknown_fields = false)]semantics.As per coding guidelines: "Configure
dead_code = "deny"lint to disallow unused code in Cargo.toml".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/models.rs` around lines 88 - 92, Remove the unnecessary #[allow(dead_code)] on the struct field max_stale_values: this field is deserialized by Serde so the dead_code suppression should be dropped to respect the crate-wide deny lint; locate the field definition named max_stale_values (with #[serde(rename = "maxStaleValues", default)]) and delete the #[allow(dead_code)] attribute, or alternatively rename the field to a private unused-prefixed name like _max_stale_values if you intend to signal it is intentionally ignored, or remove the field entirely if not needed.crates/icegate-query/src/tempo/metadata.rs (1)
168-230: PreferOption::is_none_orovermap_or(true, …).The five
scope_filter.map_or(true, |s| s == Scope::X)checks are clearer (and silenceclippy::unnecessary_map_orunder pedantic) asscope_filter.is_none_or(|s| s == Scope::X).♻️ Proposed change
- if scope_filter.map_or(true, |s| s == Scope::Resource) { + if scope_filter.is_none_or(|s| s == Scope::Resource) {(Apply to all five
map_or(true, …)sites inlist_tags_v2.)As per coding guidelines: "Enable clippy pedantic and nursery lints in Cargo.toml for strict code quality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/metadata.rs` around lines 168 - 230, Replace the five occurrences of scope_filter.map_or(true, |s| s == Scope::X) in list_tags_v2 with the clearer Option::is_none_or form: call scope_filter.is_none_or(|s| s == Scope::Resource/Span/Intrinsic/Event/Link) respectively (affects the checks around Scope::Resource, Scope::Span, Scope::Intrinsic, Scope::Event, Scope::Link in metadata.rs); ensure you import or use the Option::is_none_or method available on Option and preserve existing logic and error handling in the surrounding code (e.g., the blocks that construct ScopeGroup and call metadata_scan::scan_labels).crates/icegate-query/src/traceql/spanset.rs (1)
62-91: Takeselfby value forCopyenum methods.
SpansetOpderivesCopy, so&selfonas_strandis_hierarchymay tripclippy::trivially_copy_pass_by_refunder pedantic lints. NoteScope::as_strintempo/models.rsalready usesselfby value — making this consistent also matches pedantic guidance.♻️ Proposed change
- pub const fn as_str(&self) -> &'static str { + pub const fn as_str(self) -> &'static str { match self { Self::And => "&&", @@ - pub const fn is_hierarchy(&self) -> bool { + pub const fn is_hierarchy(self) -> bool { !matches!(self, Self::And | Self::Or) }As per coding guidelines: "Enable clippy pedantic and nursery lints in Cargo.toml for strict code quality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/spanset.rs` around lines 62 - 91, The methods as_str and is_hierarchy take &self but SpansetOp is a Copy enum, so change their signatures to take self by value (pub const fn as_str(self) -> &'static str and pub const fn is_hierarchy(self) -> bool) to satisfy clippy::trivially_copy_pass_by_ref; update any internal pattern matches to match on self and keep the existing match arms and logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/kustomize/base/values-grafana.yaml`:
- Around line 87-97: Remove the redundant flags filterByTraceID and
filterBySpanID under the tracesToLogsV2 block since customQuery: true makes them
no-ops; update the tracesToLogsV2 section (the keys customQuery, query,
spanStartTimeShift, spanEndTimeShift, datasourceUid) by deleting filterByTraceID
and filterBySpanID so the configuration relies solely on the provided custom
query '{trace_id="$${__span.traceId}"}'.
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 526-538: The code currently appends scope and span attributes
directly to span_attrs_builder using add_flattened_attributes_dotted
(referencing scope_spans.scope and span.attributes), which can produce duplicate
keys and relies on reader-specific last-write-wins semantics; change this to
dedupe per-span by building a small HashMap<String,String> (insert scope
attributes first, then span attributes so span wins) for each span, then iterate
that map to feed span_attrs_builder (and keep the existing append calls to
resource_attrs_builder/span_attrs_builder). Update
add_flattened_attributes_dotted usage accordingly or replace it with a helper
that merges into the per-span HashMap before writing to span_attrs_builder to
guarantee one entry per key and preserve the documented "span shadows scope"
intent.
In `@crates/icegate-query/src/loki/mod.rs`:
- Around line 21-36: The doc comment on LOGS_METADATA_CONFIG is misleading: it
says `/label_values` can resolve excluded_map_keys like `trace_id`/`span_id`
"explicitly via the alias logic", but those keys are actually resolved via the
indexed_columns set used in LOGS_VALUES_METADATA_CONFIG
(LOG_INDEXED_ATTRIBUTE_COLUMNS), not via label_aliases; update the comment to
state that label_aliases only contain the three mappings (level→severity_text,
detected_level→severity_text, service→service_name) and clarify that
`/label_values` resolution for trace_id/span_id is handled by the
indexed_columns configuration referenced in LOGS_VALUES_METADATA_CONFIG rather
than by label_aliases or excluded_map_keys.
In `@crates/icegate-query/src/tempo/executor.rs`:
- Around line 123-142: The parsed TraceQL expression (expr from
AntlrParser::parse) can be a metrics-mode query which DataFusionPlanner::plan
will accept but which spansets_to_search_response cannot format; reject
non-search expressions immediately after parsing and before planning by matching
on TraceQLExpr (allow TraceQLExpr::Spanset and PipelineExpr::Search variants
only) and return a TempoError::new(QueryError::Validation(...)) with a clear
message like "metrics-mode queries are not supported on /api/search; use the
metrics endpoint" so metric RecordBatches never reach DataFusionPlanner::plan or
spansets_to_search_response.
- Around line 62-74: The eager duration validation is dead because
params.min_duration/max_duration are checked for NotImplemented before parsing;
move the validation so you call
params.min_duration.as_deref().and_then(parse_duration_opt) and
params.max_duration.as_deref().and_then(parse_duration_opt) before the
TempoError::new(QueryError::NotImplemented(...)) short-circuit, and if
parse_duration_opt indicates a malformed value return a BadRequest-style
TempoError (400) with a clear message about invalid duration format instead of
proceeding to the 501 path (ensure you reference params.min_duration,
params.max_duration, parse_duration_opt, and the NotImplemented short-circuit
when making the change).
In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 52-57: The helper parse_epoch_seconds currently returns
DateTime::UNIX_EPOCH on invalid/ambiguous inputs which lets malformed start/end
values silently become 1970; change parse_epoch_seconds to return a
TempoResult<DateTime<Utc>> (or Result<DateTime<Utc>, TempoError>) that errors on
out-of-range or non-single timestamp (use
Utc.timestamp_opt(...).single().ok_or(...)), update get_trace to call the new
fallible parse and propagate the error to produce an HTTP 400 (similar to
parse_q_to_predicate handling), and ensure validate_query_window only runs after
successful parsing so bad epoch inputs surface as client errors instead of being
clamped.
In `@crates/icegate-query/src/tempo/mod.rs`:
- Around line 6-15: The new pub mod declarations expose many undocumented public
items and violate missing_docs; either add docs to each public symbol or make
the modules private again—specifically either (A) add /// doc comments for all
public types/functions in error.rs (e.g., TempoError), formatters.rs (e.g.,
spansets_to_search_response), models.rs (all ~24 public items), trace_by_id.rs
(e.g., FetchResult, default_window), and confirm executor.rs's execute remains
documented, or (B) revert the module declarations back to private (change pub
mod error/executor/formatters/models/trace_by_id to mod ...) so their public
surfaces are not exported from the crate.
In `@crates/icegate-query/src/traceql/antlr/mod.rs`:
- Line 83: The tracing span currently declares an empty field because of
fields(query); update the instrument attribute on the function (the
#[tracing::instrument(...)] above the method in mod.rs) to capture the parameter
value, e.g. change fields(query) to fields(query = %query) to record the query
string via Display (or use fields(query = ?query) for Debug), keeping skip(self)
as-is.
In `@crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4`:
- Around line 121-122: Lexer rules FLOAT and INT currently include an optional
leading '-' which causes the lexer to consume minus signs (e.g., "5-3" becomes
INT(5) INT(-3)) and will break arithmetic parsing; remove the optional '-' from
the FLOAT and INT lexer rules so minus becomes a separate MINUS token that the
parser can handle (i.e., change FLOAT to DIGIT+ '.' DIGIT+ and INT to DIGIT+),
and ensure the grammar uses the existing MINUS terminal for unary negation where
needed or document the intended behavior if you intentionally want signed
numeric tokens.
- Line 19: The lexer rule NEQ_RE ('!~') currently shadows NOT_SIBLING because
ANTLR picks the first matching token, so update the grammar (TraceQLLexer.g4 /
parser spansetOp) to consistently accept the '!~' operator in spanset contexts
by either renaming the lexeme used for NOT_SIBLING (e.g., to a distinct symbol)
or by adding NEQ_RE into the parser's spansetOp alternatives and disambiguating
at parser level; remove the string-based fallback in visitors.rs that checks
ctx.get_text() and instead rely on the proper token from the parser; finally,
add a parser/integration test exercising spanset not-sibling (for example "{
kind = server } !~ { kind = client }") to prevent regressions.
In `@crates/icegate-query/src/traceql/antlr/TraceQLParser.g4`:
- Around line 57-66: The current spansetExpr rule treats all operators equally;
split it into precedence-tiered rules so relationship operators bind tighter
than boolean ones: introduce (for example) spansetExpr -> spansetOr ->
spansetAnd -> spansetRel -> spansetPrimary, move boolean tokens (AND, OR) into
the lower-precedence rules (spansetOr/spansetAnd) and move relationship tokens
(DESC, GT, ANC, LT, SIBLING and their NOT_* variants) into a new spansetRelOp
used by spansetRel; update/remove the combined spansetOp rule and ensure
spansetParen and spanSelector remain as the primary/leaf cases (mirroring the
spanFilter precedence pattern).
In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 112-120: time_bucket_expr currently computes step_micros = step_ns
/ 1_000 which can be zero for sub-microsecond steps and produce a
division-by-zero expression; change time_bucket_expr to validate or clamp the
computed step_micros to a minimum of 1 (or return a clear Err) before
constructing the bucket expression (referencing function time_bucket_expr,
COL_TIMESTAMP and BUCKET_ALIAS), and add a runtime assertion or schema check
that the incoming timestamp column is Timestamp(Microsecond, _) so the
cast_to(&DataType::Int64) produces correct bucket boundaries; ensure the error
message clearly references the invalid step value if you choose to return an
error.
In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 135-138: In field_arg_expr, replace the
not_implemented("aggregation requires argument") error with the crate's
validation/invalid-input error variant so missing aggregation args return a 400
client error; specifically, change the Err returned in field_arg_expr (the
function handling Option<&FieldRef>) to something like
QueryError::Validation(...) or the appropriate Validation/InvalidInput
constructor used in this crate, preserving the existing message text
("aggregation requires argument") so the HTTP layer maps it to a client error
instead of 501.
- Around line 84-94: The grouping function group_key_for_attribute currently
returns lit(NULL) for Scope::Event and Scope::Link which collapses all rows;
change group_key_for_attribute to return Result<Expr, QueryError> and on
Scope::Event or Scope::Link return Err(QueryError::NotImplemented("grouping by
event/link attributes not implemented")); update all callers (notably
field_ref_to_group_key and field_arg_expr) to propagate the Result (use ? or
map/and_then) so plan-time errors are surfaced instead of emitting a NULL
literal, and adjust any downstream code to handle the propagated QueryError.
In `@crates/icegate-query/src/traceql/datafusion/selectors.rs`:
- Around line 222-227: The match arm handling
(FieldRef::Intrinsic(IntrinsicField::Duration), LiteralValue::Duration(ns))
currently does integer-truncate with *ns / 1_000 and silently turns
sub-microsecond thresholds to 0; change it to preserve intent by rounding up
instead (e.g., compute micro = (ns + 999) / 1_000) or alternatively return a
parse/validation error for sub-microsecond literals — update the arm in
selectors.rs that matches FieldRef::Intrinsic(IntrinsicField::Duration) /
LiteralValue::Duration to use the chosen behavior and adjust the surrounding
comment to explain rounding-up or the rejection policy.
- Around line 255-266: The regex_match helper currently casts both operands
using df_dummy_schema which triggers Expr::get_type against an empty DFSchema
and breaks plan-time column refs; remove the unnecessary cast_to calls in
regex_match (stop using df_dummy_schema), instead pass the raw lhs and rhs into
regexp_like().call(vec![lhs, rhs]) and keep the existing negation handling
(m.not() if negated), and delete or stop referencing df_dummy_schema; also add
planner-level integration tests exercising expressions like `{ name =~ "GET.*"
}` and `{ resource.service.name =~ ".*end" }` to ensure end-to-end
planning/coercion works.
In `@crates/icegate-query/src/traceql/iceberg_predicate.rs`:
- Around line 290-307: The catch-all string arm in datum_for lets string
literals fall through onto INT columns (Status/Kind); add an explicit match arm
for (FieldRef::Intrinsic(IntrinsicField::Status) |
FieldRef::Intrinsic(IntrinsicField::Kind), LiteralValue::String(_)) that returns
None (i.e., drops the predicate) so string RHS against those INT enum columns
does not produce a string Datum; update datum_for accordingly and add the
suggested regression test (e.g., using translate(...) and asserting
Predicate::AlwaysTrue) to lock the behavior.
In `@crates/icegate-query/src/traceql/metric.rs`:
- Around line 91-114: AggregationOp::Quantile currently maps to
"quantile_over_time" in as_str() and the docs expect two args (a field and
percentile), but PipelineStage::Aggregate / AggregateFilter only provide arg:
Option<FieldRef> so the percentile is lost; fix by either removing the Quantile
variant from AggregationOp (and from as_str()) if you want to defer support, or
extend the aggregate AST to carry a second optional literal argument (e.g.,
change AggregateFilter / the Aggregate payload to include arg: Option<FieldRef>
plus percentile: Option<Literal> or similar) and update
AggregationOp::requires_arg/as_str handling and any planner/visitor code that
constructs/reads Aggregate to consume the new percentile field.
---
Nitpick comments:
In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 283-293: The current plan_files function collects the entire
stream into a Vec via stream.try_collect(), which can materialize
tens-of-thousands of FileScanTask items and spike memory; change plan_files to
return a streaming/iterator-based pipeline instead of a Vec by processing the
stream from scan.plan_files() as it yields items (e.g., replace try_collect()
usage with a streamed approach using try_buffered_unordered or similar
combinators) so callers can handle FileScanTask items incrementally; locate the
function plan_files (takes &Table and &Predicate), the call to
table.scan().with_filter(...).build() and the variable stream =
scan.plan_files(), and modify the downstream usage to avoid creating an
intermediate Vec<FileScanTask>.
In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 326-341: The code in parquet_reader.rs treats failures converting
dictionary offsets (calls to u64::try_from on dict_offset or data_offset) by
incrementing no_dict_no_stats and continuing, which skips the later row-group
stats fallback; instead, when conversion fails for dict_start or dict_end, do
not early-continue—increment no_dict_no_stats but allow control to fall through
to the existing stats-path (the same path taken when dict_end <= dict_start).
Update the block that handles col_chunk.dictionary_page_offset() so failures
converting to u64 for dict_start or dict_end do not execute continue; keep
updates to no_dict_no_stats and then let the function proceed to the stats
fallback (affecting symbols: col_chunk.dictionary_page_offset(), dict_start,
dict_end, no_dict_no_stats, ranges, selected_rgs, rg_idx).
In `@crates/icegate-query/src/engine/metadata_scan/values.rs`:
- Around line 105-118: collect_indexed_int_values_via_dict is missing the same
system-reserved column guard used by collect_indexed_values_via_dict; add a call
to is_system_reserved_value_column(column_name) at the start of
collect_indexed_int_values_via_dict and return Ok(()) when it returns true so
reserved columns are skipped consistently, then proceed to find leaf_idx and
call parquet_reader::read_column_int_dictionaries as before (keep function
signature and reader/metadata/predicate/out usage unchanged).
In `@crates/icegate-query/src/loki/mod.rs`:
- Around line 43-48: LOGS_VALUES_METADATA_CONFIG duplicates parts of
LOGS_METADATA_CONFIG (it inlines excluded_map_keys: &[] and re-states
map_column), which can drift; change LOGS_VALUES_METADATA_CONFIG to build from
LOGS_METADATA_CONFIG by copying its label_aliases and map_column (or
cloning/using a struct update pattern) and only override indexed_columns and
excluded_map_keys to &[] so future updates to LOGS_METADATA_CONFIG.label_aliases
or map_column are reflected automatically; specifically adjust the
MetadataScanConfig construction for LOGS_VALUES_METADATA_CONFIG to derive fields
from LOGS_METADATA_CONFIG instead of repeating them.
In `@crates/icegate-query/src/tempo/formatters.rs`:
- Around line 172-175: Introduce a small helper micros_to_nanos_str(i64) ->
String that converts microseconds to nanoseconds using saturating_mul(1_000) and
returns the stringified result, then replace the direct `(start_micros *
1_000).to_string()` and `(e * 1_000).to_string()` call sites in the span
insertion code and the matched-span builder with this helper; also harden the
proto-side conversion function micros_to_nanos_u64 to use saturating_mul for the
final multiplication/path that converts i64 micros to u64 nanos so all
conversion sites share the same overflow-safe behavior.
- Around line 235-247: The helpers string_col_opt and i32_col_opt currently call
as_string::<i32>() and as_primitive::<Int32Type>() which panic on type mismatch;
change both to the safe downcast pattern used by ts_col_opt/list_col_opt: after
finding the column index via batch.schema().column_with_name(name) call
batch.column(idx).as_any().downcast_ref::<StringArray>() for string_col_opt and
.downcast_ref::<Int32Array>() for i32_col_opt, and return that Option so a wrong
Arrow type yields None instead of a panic.
In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 168-230: Replace the five occurrences of scope_filter.map_or(true,
|s| s == Scope::X) in list_tags_v2 with the clearer Option::is_none_or form:
call scope_filter.is_none_or(|s| s == Scope::Resource/Span/Intrinsic/Event/Link)
respectively (affects the checks around Scope::Resource, Scope::Span,
Scope::Intrinsic, Scope::Event, Scope::Link in metadata.rs); ensure you import
or use the Option::is_none_or method available on Option and preserve existing
logic and error handling in the surrounding code (e.g., the blocks that
construct ScopeGroup and call metadata_scan::scan_labels).
In `@crates/icegate-query/src/tempo/models.rs`:
- Around line 88-92: Remove the unnecessary #[allow(dead_code)] on the struct
field max_stale_values: this field is deserialized by Serde so the dead_code
suppression should be dropped to respect the crate-wide deny lint; locate the
field definition named max_stale_values (with #[serde(rename = "maxStaleValues",
default)]) and delete the #[allow(dead_code)] attribute, or alternatively rename
the field to a private unused-prefixed name like _max_stale_values if you intend
to signal it is intentionally ignored, or remove the field entirely if not
needed.
In `@crates/icegate-query/src/tempo/validation.rs`:
- Around line 169-173: The check redundantly tests
trimmed.eq_ignore_ascii_case("tenant_id") after computing last_segment from
trimmed; remove the redundant clause so the reserved-name validation only checks
last_segment.eq_ignore_ascii_case("tenant_id"), i.e., update the condition that
returns QueryError::Validation("tag name 'tenant_id' is reserved") to rely
solely on last_segment (remove the || trimmed.eq_ignore_ascii_case("tenant_id")
part) in the function that performs Tempo tag validation (uses trimmed and
last_segment).
In `@crates/icegate-query/src/traceql/antlr/visitors.rs`:
- Around line 419-426: pop_two_filters currently only reads filters[0] and
filters[1], silently dropping any extra operands if the grammar ever produces >2
children; add a debug-only check to catch this regression by inserting something
like debug_assert_eq!(filters.len(), 2) near the start of pop_two_filters (after
the existing len < 2 error check), and keep using visit_span_filter for
filters[0] and filters[1]; alternatively, if you want to support n-ary operands,
change the logic to visit and fold the entire filters slice into a
left-associative combined SpanFilter instead of indexing only [0] and [1].
In `@crates/icegate-query/src/traceql/common.rs`:
- Around line 255-259: The test kind_otlp_code_matches_otlp_spec currently only
checks Server and Client; extend it to assert all KindValue variants map to
their OTLP integer codes: assert KindValue::Unspecified.otlp_code() == 0,
KindValue::Internal.otlp_code() == 1, KindValue::Server.otlp_code() == 2,
KindValue::Client.otlp_code() == 3, KindValue::Producer.otlp_code() == 4, and
KindValue::Consumer.otlp_code() == 5 so any off-by-one or missing mapping is
caught (update the test function kind_otlp_code_matches_otlp_spec to include
these four additional assertions).
- Around line 166-220: StatusValue::otlp_code and KindValue::otlp_code manually
duplicate numeric mappings; add #[repr(i32)] to the StatusValue and KindValue
enum definitions and replace the match bodies in their otlp_code methods with a
simple cast (return *self as i32) so the numeric encoding is derived from the
enum layout; update the otlp_code functions (still const fn) to perform the cast
and remove the match arms for StatusValue::otlp_code and KindValue::otlp_code.
In `@crates/icegate-query/src/traceql/datafusion/mod.rs`:
- Around line 11-16: The current pub(crate) re-exports add redundant prefixes
(pipeline_ and selectors_) that duplicate module path info; update the
re-exports in mod.rs to use the original symbol names instead of aliased ones so
callers can rely on the module path for disambiguation — e.g., stop exporting
pipeline::field_ref_to_group_key as pipeline_field_ref_to_group_key and
selectors::intrinsic_column/resource_attribute_lhs/span_attribute_lhs as
selectors_intrinsic_column/selectors_resource_attribute_lhs/selectors_span_attribute_lhs,
and instead re-export them directly (keep DataFusionPlanner export as-is).
In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs`:
- Around line 196-210: The test QueryContext in make_query_ctx currently
hard-codes start/end to 2026/2099 which can drop fixtures on systems with
different clocks; update make_query_ctx to compute start as Utc::now() -
Duration::days(1) and end as Utc::now() + Duration::days(1) (using
chrono::Duration) so the QueryContext (tenant_id, start, end, limit,
spans_per_spanset, etc.) uses a relative wall-clock window instead of fixed
calendar dates; ensure you import or qualify Duration and keep other fields
(spans_per_spanset, max_grid_points) unchanged.
- Around line 270-353: Add tests exercising the regex operators =~ and !~ to
cover attribute routing and catch the cast_to(empty DFSchema) bug in
selectors.rs::regex_match: create new async tests in planner_tests.rs similar to
resource_service_name_quoted_eq_returns_matching_rows that call
fixture_session(), make_query_ctx("t1") and run_and_count(ctx, qctx, "{ name =~
\"GET.*\" }") and "{ resource.service.name =~ \"fro.*\" }" (also consider
unquoted RHS like "{ name =~ GET.* }"), asserting the expected row counts;
ensure you include at least one attribute-scope, one resource-scope, and a
negative/!~ case so regex routing/NULL semantics are exercised and will trigger
the selector code path that previously hit cast_to(empty DFSchema).
In `@crates/icegate-query/src/traceql/Makefile`:
- Around line 3-16: Make the Makefile targets robust by declaring the phony
targets and using idiomatic rm flags: add a .PHONY line listing install, gen,
and clean, and replace the "rm ... || true" patterns in the clean target with
proper rm flags (e.g., rm -f for files and rm -rf for directories) so failures
are surfaced and removals are idempotent; update the clean rule operating on
patterns like antlr/traceql*.rs, antlr/*.tokens, antlr/*.interp, and antlr/gen
accordingly and keep the existing behavior for non-fatal removals.
In `@crates/icegate-query/src/traceql/spanset.rs`:
- Around line 62-91: The methods as_str and is_hierarchy take &self but
SpansetOp is a Copy enum, so change their signatures to take self by value (pub
const fn as_str(self) -> &'static str and pub const fn is_hierarchy(self) ->
bool) to satisfy clippy::trivially_copy_pass_by_ref; update any internal pattern
matches to match on self and keep the existing match arms and logic unchanged.
In `@crates/icegate-query/tests/tempo/tags.rs`:
- Around line 1-3: Update the module doc comment at the top of the tags.rs test
module to also mention the `/api/v2/search/tag/{name}/values` endpoint;
explicitly note that the file covers the typed v2 values endpoint (with
`inspectedBytes` represented as a string and the `status` enum using
keyword-typed variants) so readers know v2-values behavior is tested alongside
`/api/search/tags`, `/api/v2/search/tags`, and `/api/search/tag/{name}/values`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f6e91aa1-8181-4a8d-802f-a2923fcd90df
⛔ Files ignored due to path filters (4)
Cargo.lockis excluded by!**/*.lockcrates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.interpis excluded by!**/gen/**crates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.javais excluded by!**/gen/**crates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.tokensis excluded by!**/gen/**
📒 Files selected for processing (82)
Cargo.tomlconfig/docker/grafana/provisioning/datasources/datasources.yamlconfig/docker/otel-collector/config.yamlconfig/kustomize/base/otel-collector/configmap.yamlconfig/kustomize/base/values-grafana.yamlconfig/kustomize/overlays/skaffold/values-icegate.yamlcrates/icegate-common/Cargo.tomlcrates/icegate-common/src/schema.rscrates/icegate-common/src/tracing.rscrates/icegate-ingest/src/cli/commands/run.rscrates/icegate-ingest/src/shift/iceberg_storage.rscrates/icegate-ingest/src/shift/mod.rscrates/icegate-ingest/src/transform.rscrates/icegate-query/Cargo.tomlcrates/icegate-query/src/engine/log_metadata_scan/labels.rscrates/icegate-query/src/engine/log_metadata_scan/mod.rscrates/icegate-query/src/engine/log_metadata_scan/predicate.rscrates/icegate-query/src/engine/log_metadata_scan/values.rscrates/icegate-query/src/engine/metadata_scan/error.rscrates/icegate-query/src/engine/metadata_scan/labels.rscrates/icegate-query/src/engine/metadata_scan/mod.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/src/engine/metadata_scan/predicate.rscrates/icegate-query/src/engine/metadata_scan/values.rscrates/icegate-query/src/engine/mod.rscrates/icegate-query/src/engine/provider/mod.rscrates/icegate-query/src/engine/provider/schema.rscrates/icegate-query/src/engine/provider/table.rscrates/icegate-query/src/lib.rscrates/icegate-query/src/loki/executor.rscrates/icegate-query/src/loki/mod.rscrates/icegate-query/src/loki/predicate.rscrates/icegate-query/src/tempo/error.rscrates/icegate-query/src/tempo/executor.rscrates/icegate-query/src/tempo/formatters.rscrates/icegate-query/src/tempo/handlers.rscrates/icegate-query/src/tempo/metadata.rscrates/icegate-query/src/tempo/mod.rscrates/icegate-query/src/tempo/models.rscrates/icegate-query/src/tempo/routes.rscrates/icegate-query/src/tempo/server.rscrates/icegate-query/src/tempo/trace_by_id.rscrates/icegate-query/src/tempo/validation.rscrates/icegate-query/src/traceql/Makefilecrates/icegate-query/src/traceql/antlr/TraceQLLexer.g4crates/icegate-query/src/traceql/antlr/TraceQLLexer.interpcrates/icegate-query/src/traceql/antlr/TraceQLLexer.tokenscrates/icegate-query/src/traceql/antlr/TraceQLParser.g4crates/icegate-query/src/traceql/antlr/TraceQLParser.interpcrates/icegate-query/src/traceql/antlr/TraceQLParser.tokenscrates/icegate-query/src/traceql/antlr/mod.rscrates/icegate-query/src/traceql/antlr/tests.rscrates/icegate-query/src/traceql/antlr/traceqllexer.rscrates/icegate-query/src/traceql/antlr/traceqlparser.rscrates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rscrates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rscrates/icegate-query/src/traceql/antlr/traceqlparserlistener.rscrates/icegate-query/src/traceql/antlr/traceqlparservisitor.rscrates/icegate-query/src/traceql/antlr/visitors.rscrates/icegate-query/src/traceql/common.rscrates/icegate-query/src/traceql/datafusion/metrics.rscrates/icegate-query/src/traceql/datafusion/mod.rscrates/icegate-query/src/traceql/datafusion/pipeline.rscrates/icegate-query/src/traceql/datafusion/planner.rscrates/icegate-query/src/traceql/datafusion/planner_tests.rscrates/icegate-query/src/traceql/datafusion/registry.rscrates/icegate-query/src/traceql/datafusion/selectors.rscrates/icegate-query/src/traceql/duration.rscrates/icegate-query/src/traceql/expr.rscrates/icegate-query/src/traceql/iceberg_predicate.rscrates/icegate-query/src/traceql/metric.rscrates/icegate-query/src/traceql/mod.rscrates/icegate-query/src/traceql/parser.rscrates/icegate-query/src/traceql/planner.rscrates/icegate-query/src/traceql/spanset.rscrates/icegate-query/tests/tempo/echo.rscrates/icegate-query/tests/tempo/harness.rscrates/icegate-query/tests/tempo/mod.rscrates/icegate-query/tests/tempo/tags.rscrates/icegate-query/tests/tempo_tests.rscrates/icegate-queue/src/writer.rsrustfmt.toml
💤 Files with no reviewable changes (4)
- crates/icegate-query/src/engine/log_metadata_scan/labels.rs
- crates/icegate-query/src/engine/log_metadata_scan/values.rs
- crates/icegate-query/src/engine/log_metadata_scan/predicate.rs
- crates/icegate-query/src/engine/log_metadata_scan/mod.rs
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/icegate-ingest/src/transform.rs (1)
249-253:⚠️ Potential issue | 🟡 MinorStale doc references the pre-split merged
attributesmap.The function-level doc still says it produces "the merged attributes map (resource + scope + span attributes plus indexed-column duplicates)", but the implementation now emits two separate columns (
resource_attributesandspan_attributes) and the indexed-column mirror keys have been removed. Please update so future readers don't expect the old shape.📝 Suggested doc update
-/// Produces every column in the spans schema: top-level fields, the merged -/// attributes map (resource + scope + span attributes plus indexed-column -/// duplicates), and the nested `events` / `links` `List<Struct>` arrays. +/// Produces every column in the spans schema: top-level fields, the two +/// attribute maps (`resource_attributes` for resource-level OTLP attrs, and +/// `span_attributes` for the deduplicated merge of scope + span attrs with +/// span winning on collision), and the nested `events` / `links` +/// `List<Struct>` arrays. /// Links with invalid `trace_id` / `span_id` are dropped from the list and /// counted into the parent span's `dropped_links_count`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/transform.rs` around lines 249 - 253, Update the stale function-level docstring that currently claims a single "merged attributes map" to reflect the current output shape: mention that the implementation emits separate columns named resource_attributes and span_attributes (instead of a merged attributes map) and that the indexed-column mirror keys have been removed; keep the rest of the description about top-level fields, nested events/links arrays, and dropped links being counted in dropped_links_count intact so readers of the function (the doc attached to the transform function in transform.rs) see the accurate schema shape.
🧹 Nitpick comments (10)
crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs (1)
12-21: Generated doc-comment templates reference a placeholder rule (#s) for every method.All
enter_*/exit_*doc comments in this file reference\{@linkTraceQLParserBaseParser#s}regardless of the actual grammar rule (e.g.,enter_pipelineexprshould referencepipelineExpr,enter_filterparenshould referenceFilterParen, etc.). This appears to be an artifact of the ANTLR Rust target's template — purely cosmetic, no functional impact, but the docs are not informative as-is.Since this is generated code, the practical fix is either (a) leave as-is and rely on the rule name in the method signature, or (b) post-process the generated file in
make gento substitute rule names. No action required if regeneration is manual and rare.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs` around lines 12 - 21, The generated doc-comments all reference a placeholder rule `\{`@link` TraceQLParserBaseParser#s}`; fix by post-processing the generated file to substitute the correct rule name for each enter_/exit_ method: for each method like enter_root/exit_root (or enter_pipelineexpr/enter_filterparen), parse the method signature's context type (e.g., RootContext<'input>, PipelineExprContext, FilterParenContext), strip the trailing "Context" and lifetimes, derive the rule identifier, and replace the `#s` placeholder in that method's doc-comment with the derived rule name (formatted to match the parser link style); implement this substitution step in the `make gen` post-processing step so generated files (e.g., traceqlparserbaselistener.rs and methods enter_root, exit_root, enter_pipelineexpr, enter_filterparen) get correct links.crates/icegate-common/src/testing/minio.rs (1)
16-19: Consider sharing the credential constants withcreate_s3_bucket.The doc comments here state these constants are "used for both the container and downstream S3 SDK calls", but per the relevant snippet from
crates/icegate-common/src/testing/s3.rs,create_s3_bucketconstructs itsCredentialswith hardcoded"minioadmin"literals rather than importing these constants. They happen to agree today, but ifMINIO_ROOT_USER/MINIO_ROOT_PASSWORDare ever rotated, bucket creation will silently fail authentication. Either expose these aspub(crate)(orpub) and use them froms3.rs, or tone down the docstring.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-common/src/testing/minio.rs` around lines 16 - 19, The constants MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are declared for MinIO credentials but create_s3_bucket constructs Credentials with hardcoded "minioadmin" literals, which will break if the constants change; modify the code so create_s3_bucket uses the shared constants instead of string literals (export the constants as pub(crate) or pub as appropriate and replace the literal usages in create_s3_bucket with MINIO_ROOT_USER and MINIO_ROOT_PASSWORD to ensure a single source of truth for credentials).crates/icegate-query/src/traceql/metric.rs (1)
50-76: Quantile/percentile invariant is enforced by convention only.The shape
op: AggregationOp, arg: Option<FieldRef>, percentile: Option<f64>is flexible but doesn't prevent two illegal AST states by construction:
op = Quantile, percentile = None(planner must produce a 400)op = Avg, percentile = Some(0.95)(silently ignored per the docstring)A small constructor (or a fallible
try_new) onPipelineStagethat asserts these invariants would let the parser/visitor surface validation errors at AST-construction time and let the planner assume a well-formed AST. Not blocking — just noting since this was the source of the prior round-trip concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/metric.rs` around lines 50 - 76, The Aggregate and AggregateFilter variants allow invalid combinations (e.g., AggregationOp::Quantile with percentile None, or non-Quantile ops with Some(percentile)); add a fallible constructor (e.g., PipelineStage::Aggregate::try_new or PipelineStage::try_new that validates Aggregate/AggregateFilter) or implement TryFrom<(AggregationOp, Option<FieldRef>, Option<f64>, ...)> for the enum so the AST cannot be constructed with illegal states; inside the constructor/checker assert that if op == AggregationOp::Quantile then percentile.is_some() and for other ops percentile.is_none(), returning a clear error type on violation so callers (parser/visitor) surface validation errors early and the planner can assume a well-formed AST.crates/icegate-ingest/src/transform.rs (1)
522-544: Resource-attribute path still admits duplicate keys.The new
merge_dotted_attributescleanly deduplicates scope+span intospan_attributes, butresource_attrsis still pushed viaadd_flattened_attributes_dotted, which appends every entry verbatim. OTLP proto does not forbid duplicateResource.attributeskeys; if a producer sends them, the persisted MAP will contain duplicates and downstream readers will again disagree on which one "wins" (the same problem the merge fixes for span attrs).For symmetry and to avoid surprising MAP-reader behavior, consider running resource attrs through a single-list
BTreeMapdedup as well.♻️ Suggested approach
- add_flattened_attributes_dotted(resource_attrs, &mut resource_attrs_builder); + let merged_resource_attrs = merge_dotted_attributes(&[], resource_attrs); + for (key, value) in &merged_resource_attrs { + resource_attrs_builder.keys().append_value(key); + resource_attrs_builder.values().append_value(value); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/transform.rs` around lines 522 - 544, The resource attributes are still appended verbatim via add_flattened_attributes_dotted which allows duplicate keys; instead, deduplicate merged dotted resource attributes before appending to resource_attrs_builder similar to how scope+span were handled: collect resource_attrs (the same variable passed to add_flattened_attributes_dotted) into a single-list ordered map (e.g., BTreeMap or the same merge_dotted_attributes helper), then iterate that deduped map and append keys/values into resource_attrs_builder (like the loop used for merged_span_attrs) rather than calling add_flattened_attributes_dotted directly.crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs (1)
419-443:MAX_STATS_SPREAD = 1024will silently drop wider INT columns from discovery.
extend_from_int_statsreturnsfalsewhen the row-group[min, max]exceeds 1024, which then incrementsno_dict_no_statsand emits a generic warning. For the documented use (status_code,kindenums) this is fine, but if this reader is reused for any wider INT column in the future, callers will see no values for those row groups even thoughmin/maxare available.Consider either: (a) narrowing the function to a name that signals "small-enum only", or (b) when the spread is exceeded, fall back to inserting just
minandmax(a 2-point over-approximation) so the result is never empty. Not blocking for v1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs` around lines 419 - 443, The helper extend_from_int_stats currently returns false when (max-min) > MAX_STATS_SPREAD which causes callers (and counters like no_dict_no_stats) to treat the row-group as having no discoverable values; change extend_from_int_stats so that when spread > MAX_STATS_SPREAD it falls back to inserting just min and max into out (a 2-point over-approximation) and returns true instead of false; keep the MAX_STATS_SPREAD constant for the small-enum fast-path but ensure callers see at least min/max for wide ranges so values aren’t silently dropped.crates/icegate-query/src/traceql/datafusion/metrics.rs (1)
60-79: Duplicate group keys when preludeby(...)overlaps with terminalgroup_by.If the user writes
{} | by(.svc) | rate() by (.svc)(or the parser allows both),.svcwill be appended twice togroup_keys. DataFusion will likely tolerate it but the optimizer's behavior on duplicate group exprs is implementation-defined and the result schema gets two columns with the same logical name.Consider deduplicating
group_keys(e.g., with aHashSet<String>keyed on the rendered expression) before passing todf.aggregate(...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/metrics.rs` around lines 60 - 79, The code currently appends grouping expressions into group_keys (built from time_bucket_expr, prelude stages via PipelineStage::By and terminal group_by) which can produce duplicates (e.g., same field from prelude and terminal group_by); change the construction so you deduplicate group_keys before using them in df.aggregate: after pushing time_bucket_expr and all keys from PipelineStage::By and group_by, iterate the vector and build a new Vec<Expr> preserving order while filtering duplicates using a HashSet<String> keyed by a stable rendering/identifier of each expression (for example the string produced by the same renderer used elsewhere, or Expr::to_string if appropriate); keep references to the existing functions/values (group_keys, super::field_ref_to_group_key, PipelineStage::By, group_by, time_bucket_expr) so the change is limited to de-duplicating the group_keys vector prior to aggregation.crates/icegate-query/src/traceql/datafusion/selectors.rs (1)
267-275: Strengthen the duration-normalization assertion.
format!("{e:?}").contains("1000000")is a substring match — it would also pass if the encoded scalar were10_000_000or100_000_000. Pinning the exact decoded scalar (or at minimum asserting absence of the original1000000000-ns value) makes the rounding contract harder to silently regress.♻️ Sketch
- // 1s = 1e9 ns = 1e6 micros — encoded as Int64(1_000_000). - assert!(format!("{e:?}").contains("1000000")); + // 1s = 1e9 ns = 1e6 micros — encoded as Int64(1_000_000). + let dbg = format!("{e:?}"); + assert!(dbg.contains("1000000") && !dbg.contains("1000000000"), "unexpected scalar: {dbg}");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/traceql/datafusion/selectors.rs` around lines 267 - 275, The test duration_literal_is_normalized_to_micros currently uses a substring match which can false-positive; update the assertion on the result `e` (produced by literal_to_scalar called with FieldRef::Intrinsic(IntrinsicField::Duration)) to be stricter: either pattern-match/inspect the produced scalar to assert it equals the expected micro value (e.g., Int64(1_000_000)) or at minimum assert format!("{e:?}") both contains "1000000" and does NOT contain the original nanosecond string "1000000000" so the normalization cannot silently regress.crates/icegate-query/src/tempo/metadata.rs (1)
366-368: Usetake(limit)for consistency with v2 and to skip allocating the full vec.
list_tag_values_v2(line 403) already usesinto_iter().take(limit).collect(). Doing the same here avoids materializing the entireVec<String>when the BTreeSet is much larger thanlimit. SinceBTreeSet's iterator is sorted, the truncation result is identical.♻️ Proposed change
- let mut out: Vec<String> = values.into_iter().collect(); - out.truncate(limit); - Ok(out) + Ok(values.into_iter().take(limit).collect())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/metadata.rs` around lines 366 - 368, The code currently collects the entire BTreeSet into a Vec then truncates it; change the collection to streamline and avoid allocating the full vector by using values.into_iter().take(limit).collect() (mirror the approach used in list_tag_values_v2) inside the same function (list_tag_values or the block shown) so the iterator yields only up to limit elements before collecting and returning Ok(out).crates/icegate-query/src/tempo/handlers.rs (1)
95-115: Consider sharing theq→ AST parsing helper withexecutor::execute.The
spawn_blocking+make_query_error_send+join_errorpattern here mirrorsexecutor::execute(per the relevant snippet fromtempo/executor.rs:46-150). If a future change modifies how non-SendANTLRErroris scrubbed or how the blocking parse is instrumented, both call sites need to stay in sync. Extracting aparse_query_blocking(q: String) -> TempoResult<Expr>(or similar) into a shared module would keep them aligned.Not blocking — flag if you'd rather defer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/tempo/handlers.rs` around lines 95 - 115, The parsing logic in parse_q_to_predicate duplicates the spawn_blocking + make_query_error_send + join_error pattern used in executor::execute; extract that into a shared helper like parse_query_blocking(q: String) -> TempoResult<Expr> (placed in a common module) that performs the spawn_blocking call, uses span.in_scope, applies AntlrParser::new().parse(&q_owned).map_err(super::executor::make_query_error_send), awaits and maps join errors with super::executor::join_error, and returns the parsed Expr; then replace the inline spawn_blocking flow in parse_q_to_predicate and in executor::execute to call this new helper so both sites share the same error-scrubbing and instrumentation logic.crates/icegate-query/src/engine/metadata_scan/mod.rs (1)
263-279: Optional: extract a per-file helper to mirror the other two scan functions.
scan_labelsandscan_label_valuesboth delegate per-file work to dedicated helpers (scan_labels_for_file,scan_label_values_for_file).scan_label_int_valuesinlines the equivalent work in the closure. Extracting ascan_label_int_values_for_filehelper would keep the three entry points symmetrical and make the int-values path independently testable /#[tracing::instrument]-able like its siblings on line 317 and line 339.♻️ Proposed refactor
- .map_ok(|task| { - let file_io = file_io.clone(); - let predicate = predicate.clone(); - let column = column.clone(); - async move { - let (mut reader, metadata) = parquet_reader::open_file_direct(&file_io, &task).await?; - let mut out: BTreeSet<i32> = BTreeSet::new(); - values::collect_indexed_int_values_via_dict(&mut reader, &metadata, &predicate, &column, &mut out) - .await?; - Ok::<BTreeSet<i32>, MetadataScanError>(out) - } - }) + .map_ok(|task| { + let file_io = file_io.clone(); + let predicate = predicate.clone(); + let column = column.clone(); + async move { scan_label_int_values_for_file(&file_io, task, &predicate, &column).await } + })Add the helper alongside the existing per-file functions:
#[tracing::instrument(skip_all, fields(file = %task.data_file_path, column = %column))] async fn scan_label_int_values_for_file( file_io: &iceberg::io::FileIO, task: FileScanTask, predicate: &Predicate, column: &str, ) -> Result<BTreeSet<i32>, MetadataScanError> { let (mut reader, metadata) = parquet_reader::open_file_direct(file_io, &task).await?; let mut out: BTreeSet<i32> = BTreeSet::new(); values::collect_indexed_int_values_via_dict(&mut reader, &metadata, predicate, column, &mut out).await?; Ok(out) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 263 - 279, Extract the per-file logic inside the closure in scan_label_int_values into a new async helper function scan_label_int_values_for_file(file_io: &iceberg::io::FileIO, task: FileScanTask, predicate: &Predicate, column: &str) -> Result<BTreeSet<i32>, MetadataScanError> that mirrors scan_labels_for_file/scan_label_values_for_file: move the parquet_reader::open_file_direct call, the BTreeSet creation, and the values::collect_indexed_int_values_via_dict call into that helper, annotate it with #[tracing::instrument(skip_all, fields(file = %task.data_file_path, column = %column))], and then change the plan_stream.map_ok closure to call the helper (e.g. let file_io = file_io.clone(); let predicate = predicate.clone(); let column = column.clone(); async move { scan_label_int_values_for_file(&file_io, task, &predicate, &column).await }). Ensure signature and error type match MetadataScanError and tests/tracing behavior remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/icegate-common/src/testing/minio.rs`:
- Around line 67-76: Update the doc comment to correct the type and clarify the
behavior: change the incorrect `RequestedContainer<GenericImage>` to
`ContainerRequest<GenericImage>`, remove the misleading claim about keeping the
chain "uniform with the no-options path", and state that since earlier calls
like `with_env_var()` and `with_cmd()` (from `ImageExt`) already make `image` a
`ContainerRequest<GenericImage>`, the conditional `if let Some(net)` / `if let
Some(name)` blocks simply continue mutating the same `image` via `with_network`
and `with_container_name`.
In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 392-408: The docstring promises a conservative true when
null_count is unknown and the chunk has rows, but has_nulls currently returns
false on None; update has_nulls to return true when stats.null_count_opt() is
None and the chunk contains at least one row (i.e., treat unknown null_count as
"may contain nulls"), otherwise keep the existing Some(n) if n > 0 logic; locate
the function has_nulls and use the Statistics API (e.g.,
Statistics::null_count_opt() and the stats' row-count accessor available in this
crate) to check row count and return true when null_count_opt() is None &&
row_count > 0.
In `@crates/icegate-query/src/tempo/formatters.rs`:
- Around line 420-489: The timestamp/null-handling is wrong: compute ts_micros
and end_micros only when the respective column is non-null for that row (use
starts/ends is_null(row) guards) so null timestamps don't become 0 and bias
BestSpan; when updating acc.start_micros and acc.end_micros only call
s.value(row)/e.value(row) if s.is_null(row)/e.is_null(row) is false (skip
otherwise); update the construction of BestSpan (ts_micros, duration_micros) to
treat missing timestamps as "absent" (e.g., fallback to i64::MAX for ordering
but do not read underlying 0) and ensure the span accumulation logic that uses
starts, ends, acc.start_micros, acc.end_micros, ts_micros, and end_micros is
guarded by per-row null checks.
In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 274-277: The metadata scan for "rootServiceName" currently uses
scan_values_either(..., COL_SERVICE_NAME) and therefore enumerates service_name
values from all spans, not only root spans; update the call site in metadata.rs
(the "rootServiceName" | "trace:rootService" match arm) to add a predicate
requiring parent_span_id IS NULL (i.e., combine extra_predicate with a root-span
predicate) so scan_values_either only returns root service names, or if that
pushdown is too expensive add a clear comment and create a follow-up issue
noting the limitation; reference scan_values_either and COL_SERVICE_NAME when
making the change.
In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 52-104: Compute the number of time buckets after you determine
step_ns in apply_metrics and enforce QueryContext::max_grid_points: obtain the
query start/end from the QueryContext (e.g., ctx.range.start and ctx.range.end
or the equivalent fields), compute duration_ns = end - start, compute buckets =
duration_ns / step_ns (ceil or integer division as appropriate), and if buckets
> ctx.max_grid_points return a 400 error (use the project’s
bad_request/invalid_input helper) before calling time_bucket_expr or performing
aggregation; reference apply_metrics, step_ns, and QueryContext::max_grid_points
when locating where to add this check.
In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 74-81: The current field_ref_to_group_key silently maps unmodelled
intrinsics to a constant NULL group key; update it so FieldRef::Intrinsic
handles only TraceID (col(COL_TRACE_ID)) and any other intrinsic that is not
explicitly modelled should return an Err(NotImplemented) instead of calling
super::intrinsic_column and producing a NULL literal. Concretely: in
field_ref_to_group_key, keep the TraceID short-circuit, explicitly accept any
other intrinsics that you know are modelled by intrinsic_column, but for the
remaining/unmodelled Intrinsic variants return a not-implemented error (include
a clear message like "grouping by intrinsic X not supported") rather than
letting intrinsic_column produce lit(NULL); reference the symbols
FieldRef::Intrinsic, field_ref_to_group_key, super::intrinsic_column and
group_key_for_attribute when making this change.
- Around line 185-197: The duration literal conversion in literal_to_filter_expr
currently truncates nanoseconds with `*ns / 1_000`, causing sub-microsecond
values to become 0; change the Duration arm to round up like the selector path
by using saturating addition before division, e.g. replace `lit(*ns / 1_000)`
with `lit((*ns.saturating_add(999)) / 1_000)` (use the same integer type as
LiteralValue::Duration) so `literal_to_filter_expr` and the selector logic
produce identical semantics.
In `@crates/icegate-query/src/traceql/Makefile`:
- Line 12: The Makefile loop that prepends a header to antlr/traceql*.rs uses
sed with an escaped '\n' (the line "for f in antlr/traceql*.rs; do sed -i.bak
'1s/^/...\\n/' $$f ..."), which is non-portable on macOS; replace that sed-based
prepend with a portable approach (for both the traceql and logql Makefiles) such
as piping a printf of the header into cat with the target file (e.g., "printf
'...\\n' | cat - $$f > $$f.tmp && mv $$f.tmp $$f") or using sed's '1i\' form
with a literal newline, ensuring the header string includes the same clippy
allowances and that .bak cleanup is handled similarly.
---
Outside diff comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 249-253: Update the stale function-level docstring that currently
claims a single "merged attributes map" to reflect the current output shape:
mention that the implementation emits separate columns named resource_attributes
and span_attributes (instead of a merged attributes map) and that the
indexed-column mirror keys have been removed; keep the rest of the description
about top-level fields, nested events/links arrays, and dropped links being
counted in dropped_links_count intact so readers of the function (the doc
attached to the transform function in transform.rs) see the accurate schema
shape.
---
Nitpick comments:
In `@crates/icegate-common/src/testing/minio.rs`:
- Around line 16-19: The constants MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are
declared for MinIO credentials but create_s3_bucket constructs Credentials with
hardcoded "minioadmin" literals, which will break if the constants change;
modify the code so create_s3_bucket uses the shared constants instead of string
literals (export the constants as pub(crate) or pub as appropriate and replace
the literal usages in create_s3_bucket with MINIO_ROOT_USER and
MINIO_ROOT_PASSWORD to ensure a single source of truth for credentials).
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 522-544: The resource attributes are still appended verbatim via
add_flattened_attributes_dotted which allows duplicate keys; instead,
deduplicate merged dotted resource attributes before appending to
resource_attrs_builder similar to how scope+span were handled: collect
resource_attrs (the same variable passed to add_flattened_attributes_dotted)
into a single-list ordered map (e.g., BTreeMap or the same
merge_dotted_attributes helper), then iterate that deduped map and append
keys/values into resource_attrs_builder (like the loop used for
merged_span_attrs) rather than calling add_flattened_attributes_dotted directly.
In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 263-279: Extract the per-file logic inside the closure in
scan_label_int_values into a new async helper function
scan_label_int_values_for_file(file_io: &iceberg::io::FileIO, task:
FileScanTask, predicate: &Predicate, column: &str) -> Result<BTreeSet<i32>,
MetadataScanError> that mirrors scan_labels_for_file/scan_label_values_for_file:
move the parquet_reader::open_file_direct call, the BTreeSet creation, and the
values::collect_indexed_int_values_via_dict call into that helper, annotate it
with #[tracing::instrument(skip_all, fields(file = %task.data_file_path, column
= %column))], and then change the plan_stream.map_ok closure to call the helper
(e.g. let file_io = file_io.clone(); let predicate = predicate.clone(); let
column = column.clone(); async move { scan_label_int_values_for_file(&file_io,
task, &predicate, &column).await }). Ensure signature and error type match
MetadataScanError and tests/tracing behavior remain consistent.
In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 419-443: The helper extend_from_int_stats currently returns false
when (max-min) > MAX_STATS_SPREAD which causes callers (and counters like
no_dict_no_stats) to treat the row-group as having no discoverable values;
change extend_from_int_stats so that when spread > MAX_STATS_SPREAD it falls
back to inserting just min and max into out (a 2-point over-approximation) and
returns true instead of false; keep the MAX_STATS_SPREAD constant for the
small-enum fast-path but ensure callers see at least min/max for wide ranges so
values aren’t silently dropped.
In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 95-115: The parsing logic in parse_q_to_predicate duplicates the
spawn_blocking + make_query_error_send + join_error pattern used in
executor::execute; extract that into a shared helper like
parse_query_blocking(q: String) -> TempoResult<Expr> (placed in a common module)
that performs the spawn_blocking call, uses span.in_scope, applies
AntlrParser::new().parse(&q_owned).map_err(super::executor::make_query_error_send),
awaits and maps join errors with super::executor::join_error, and returns the
parsed Expr; then replace the inline spawn_blocking flow in parse_q_to_predicate
and in executor::execute to call this new helper so both sites share the same
error-scrubbing and instrumentation logic.
In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 366-368: The code currently collects the entire BTreeSet into a
Vec then truncates it; change the collection to streamline and avoid allocating
the full vector by using values.into_iter().take(limit).collect() (mirror the
approach used in list_tag_values_v2) inside the same function (list_tag_values
or the block shown) so the iterator yields only up to limit elements before
collecting and returning Ok(out).
In `@crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs`:
- Around line 12-21: The generated doc-comments all reference a placeholder rule
`\{`@link` TraceQLParserBaseParser#s}`; fix by post-processing the generated file
to substitute the correct rule name for each enter_/exit_ method: for each
method like enter_root/exit_root (or enter_pipelineexpr/enter_filterparen),
parse the method signature's context type (e.g., RootContext<'input>,
PipelineExprContext, FilterParenContext), strip the trailing "Context" and
lifetimes, derive the rule identifier, and replace the `#s` placeholder in that
method's doc-comment with the derived rule name (formatted to match the parser
link style); implement this substitution step in the `make gen` post-processing
step so generated files (e.g., traceqlparserbaselistener.rs and methods
enter_root, exit_root, enter_pipelineexpr, enter_filterparen) get correct links.
In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 60-79: The code currently appends grouping expressions into
group_keys (built from time_bucket_expr, prelude stages via PipelineStage::By
and terminal group_by) which can produce duplicates (e.g., same field from
prelude and terminal group_by); change the construction so you deduplicate
group_keys before using them in df.aggregate: after pushing time_bucket_expr and
all keys from PipelineStage::By and group_by, iterate the vector and build a new
Vec<Expr> preserving order while filtering duplicates using a HashSet<String>
keyed by a stable rendering/identifier of each expression (for example the
string produced by the same renderer used elsewhere, or Expr::to_string if
appropriate); keep references to the existing functions/values (group_keys,
super::field_ref_to_group_key, PipelineStage::By, group_by, time_bucket_expr) so
the change is limited to de-duplicating the group_keys vector prior to
aggregation.
In `@crates/icegate-query/src/traceql/datafusion/selectors.rs`:
- Around line 267-275: The test duration_literal_is_normalized_to_micros
currently uses a substring match which can false-positive; update the assertion
on the result `e` (produced by literal_to_scalar called with
FieldRef::Intrinsic(IntrinsicField::Duration)) to be stricter: either
pattern-match/inspect the produced scalar to assert it equals the expected micro
value (e.g., Int64(1_000_000)) or at minimum assert format!("{e:?}") both
contains "1000000" and does NOT contain the original nanosecond string
"1000000000" so the normalization cannot silently regress.
In `@crates/icegate-query/src/traceql/metric.rs`:
- Around line 50-76: The Aggregate and AggregateFilter variants allow invalid
combinations (e.g., AggregationOp::Quantile with percentile None, or
non-Quantile ops with Some(percentile)); add a fallible constructor (e.g.,
PipelineStage::Aggregate::try_new or PipelineStage::try_new that validates
Aggregate/AggregateFilter) or implement TryFrom<(AggregationOp,
Option<FieldRef>, Option<f64>, ...)> for the enum so the AST cannot be
constructed with illegal states; inside the constructor/checker assert that if
op == AggregationOp::Quantile then percentile.is_some() and for other ops
percentile.is_none(), returning a clear error type on violation so callers
(parser/visitor) surface validation errors early and the planner can assume a
well-formed AST.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 522543eb-bb50-42ff-b158-007518fda3f6
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (48)
.cargo/audit.tomlclippy.tomlconfig/kustomize/base/values-grafana.yamlconfig/kustomize/overlays/skaffold/values-icegate.yamlcrates/icegate-common/src/testing/minio.rscrates/icegate-ingest/src/transform.rscrates/icegate-jobmanager/src/core/job.rscrates/icegate-maintain/Cargo.tomlcrates/icegate-maintain/tests/migrate_integration_test.rscrates/icegate-query/src/engine/metadata_scan/labels.rscrates/icegate-query/src/engine/metadata_scan/mod.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/src/engine/metadata_scan/values.rscrates/icegate-query/src/loki/executor.rscrates/icegate-query/src/loki/mod.rscrates/icegate-query/src/tempo/executor.rscrates/icegate-query/src/tempo/formatters.rscrates/icegate-query/src/tempo/handlers.rscrates/icegate-query/src/tempo/metadata.rscrates/icegate-query/src/tempo/models.rscrates/icegate-query/src/tempo/validation.rscrates/icegate-query/src/traceql/Makefilecrates/icegate-query/src/traceql/antlr/TraceQLLexer.g4crates/icegate-query/src/traceql/antlr/TraceQLLexer.interpcrates/icegate-query/src/traceql/antlr/TraceQLLexer.tokenscrates/icegate-query/src/traceql/antlr/TraceQLParser.g4crates/icegate-query/src/traceql/antlr/TraceQLParser.interpcrates/icegate-query/src/traceql/antlr/TraceQLParser.tokenscrates/icegate-query/src/traceql/antlr/mod.rscrates/icegate-query/src/traceql/antlr/tests.rscrates/icegate-query/src/traceql/antlr/traceqllexer.rscrates/icegate-query/src/traceql/antlr/traceqlparser.rscrates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rscrates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rscrates/icegate-query/src/traceql/antlr/traceqlparserlistener.rscrates/icegate-query/src/traceql/antlr/traceqlparservisitor.rscrates/icegate-query/src/traceql/antlr/visitors.rscrates/icegate-query/src/traceql/common.rscrates/icegate-query/src/traceql/datafusion/metrics.rscrates/icegate-query/src/traceql/datafusion/mod.rscrates/icegate-query/src/traceql/datafusion/pipeline.rscrates/icegate-query/src/traceql/datafusion/planner.rscrates/icegate-query/src/traceql/datafusion/planner_tests.rscrates/icegate-query/src/traceql/datafusion/selectors.rscrates/icegate-query/src/traceql/iceberg_predicate.rscrates/icegate-query/src/traceql/metric.rscrates/icegate-query/src/traceql/spanset.rscrates/icegate-query/tests/tempo/tags.rs
✅ Files skipped from review due to trivial changes (9)
- clippy.toml
- crates/icegate-query/src/traceql/antlr/TraceQLLexer.interp
- crates/icegate-query/src/traceql/antlr/TraceQLParser.interp
- .cargo/audit.toml
- crates/icegate-query/src/traceql/antlr/TraceQLParser.tokens
- crates/icegate-query/src/traceql/antlr/TraceQLLexer.tokens
- crates/icegate-jobmanager/src/core/job.rs
- crates/icegate-query/src/traceql/antlr/TraceQLParser.g4
- crates/icegate-query/tests/tempo/tags.rs
🚧 Files skipped from review as they are similar to previous changes (12)
- config/kustomize/overlays/skaffold/values-icegate.yaml
- crates/icegate-query/src/traceql/datafusion/mod.rs
- crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4
- crates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rs
- crates/icegate-query/src/traceql/antlr/traceqlparserlistener.rs
- crates/icegate-query/src/traceql/antlr/tests.rs
- crates/icegate-query/src/traceql/common.rs
- config/kustomize/base/values-grafana.yaml
- crates/icegate-query/src/traceql/antlr/mod.rs
- crates/icegate-query/src/tempo/executor.rs
- crates/icegate-query/src/traceql/antlr/traceqllexer.rs
- crates/icegate-query/src/traceql/antlr/visitors.rs
…kustomize namespace patches following the deprecation of all-in-one Jaeger services. Refactor trace duration, processing, and validation logic in Tempo. Adjust TraceQL and Tempo handlers for consistency and performance improvements. Set OTLP HTTP server body size limits for large batches.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (3)
crates/icegate-ingest/src/transform.rs (2)
529-545: Hoist invariant attribute flattening out of the per-span hot path.Line 529 recomputes
dedupe_dotted_attributes(resource_attrs)for every span even thoughresource_attrsare constant within aresource_spansblock. On large batches this adds avoidable CPU/alloc overhead.♻️ Suggested refactor
for resource_spans in &request.resource_spans { let resource_attrs = resource_spans.resource.as_ref().map_or(&empty_attrs, |r| &r.attributes); + let merged_resource_attrs = dedupe_dotted_attributes(resource_attrs); // ... for scope_spans in &resource_spans.scope_spans { + let scope_attrs = scope_spans.scope.as_ref().map_or(&empty_attrs, |s| &s.attributes); for span in &scope_spans.spans { - let merged_resource_attrs = dedupe_dotted_attributes(resource_attrs); for (key, value) in &merged_resource_attrs { resource_attrs_builder.keys().append_value(key); resource_attrs_builder.values().append_value(value); } - let scope_attrs = scope_spans.scope.as_ref().map_or(&empty_attrs, |s| &s.attributes); let merged_span_attrs = merge_dotted_attributes(scope_attrs, &span.attributes); for (key, value) in &merged_span_attrs { span_attrs_builder.keys().append_value(key); span_attrs_builder.values().append_value(value); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/transform.rs` around lines 529 - 545, The dedupe_dotted_attributes(resource_attrs) call is inside the per-span loop causing repeated work; hoist it so merged_resource_attrs is computed once per resource_spans and reused for every span. Specifically, compute let merged_resource_attrs = dedupe_dotted_attributes(resource_attrs); immediately after obtaining resource_attrs (before looping over spans), then inside the span loop just iterate over &merged_resource_attrs to append to resource_attrs_builder; ensure you remove the duplicate call in the span-processing block and keep the existing merge_dotted_attributes(scope_attrs, &span.attributes) per-span behavior.
1949-2053: Add one collision test to lock in “span wins over scope”.Current assertions validate routing and mirror-key absence, but they don’t explicitly cover same-key collision precedence. Add one key present in both scope and span and assert span value is emitted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-ingest/src/transform.rs` around lines 1949 - 2053, In the test spans_split_attributes_into_resource_and_span_columns add a collision case so "span wins over scope": modify the ScopeSpans.InstrumentationScope.attributes to include a key (e.g., "dup.key") with a scope value and the Span.attributes to include the same key with a different span value, then after building batch assert that span_attributes contains ("dup.key", spanValue) (and not the scopeValue) — reference the test function spans_split_attributes_into_resource_and_span_columns, the kv helper, and the resource_attrs_col/span_attrs_col + map_row_as_pairs checks to locate where to add the extra kv entries and assertion.crates/icegate-query/src/engine/metadata_scan/mod.rs (1)
318-322: Apply the row-group predicate to indexed-label discovery as well.
collect_indexed_from_metadatasees the whole file, while MAP-key discovery is already filtered withpredicate. If a planned file contains both matching and non-matching row groups,/labelsand Tempo tag discovery can surface indexed labels that only exist outside the requested selector/time slice. Thread the predicate into the indexed metadata path so both discovery modes use the same pruning granularity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 318 - 322, The indexed-label discovery currently calls collect_indexed_from_metadata(&metadata, config, &mut out) which ignores the row-group predicate used by collect_map_keys_via_dict; update collect_indexed_from_metadata to accept the same predicate parameter (e.g., predicate: &RowGroupPredicate or the existing predicate type) and apply that predicate when scanning file-level metadata so indexed labels are only collected from matching row-groups, then update this call site in metadata_scan::mod.rs to pass predicate into collect_indexed_from_metadata; ensure any other callers are updated accordingly and tests/usage reflect the new signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 540-541: Update the misleading comment that says "HashMap" to
reference "BTreeMap" (and note the ordering semantics) so it matches the
implementation and the merge helper behavior in transform.rs; specifically edit
the comment around the merge logic that currently reads "merging into a HashMap"
to state "merging into a BTreeMap" and mention that scope-first then span
ensures span attrs win due to BTreeMap ordering.
In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 278-285: In read_column_int_dictionaries, add a defensive
physical-type check that verifies the targeted leaf column is actually INT32
before attempting any dictionary decoding or stats-based fallback; locate the
column descriptor for the given leaf_idx from the Parquet metadata (the same
structure used to access dictionary bytes and statistics) and if its physical
type is not INT32 return a MetadataScanError with a clear message (including
column name/leaf_idx and actual physical type) so the function fails fast on
schema mismatches rather than decoding garbage; apply this single check once
up-front so both the dictionary-decoding path and the Statistics::Int32 fallback
are protected.
In `@crates/icegate-query/src/tempo/executor.rs`:
- Line 6: The import for RecordBatch is incorrect; replace the use of
datafusion::arrow::array::RecordBatch with the correct path
datafusion::arrow::record_batch::RecordBatch in executor.rs so the type resolves
(update the use statement referencing RecordBatch accordingly).
In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 113-125: wants_protobuf currently returns true if the Accept
header contains protobuf anywhere, which ignores the caller's JSON preference;
update wants_protobuf to properly parse the Accept header into media-type
entries with q values (default q=1) and determine precedence: extract media
types from headers.get(header::ACCEPT), split on commas, parse each token's mime
and optional q parameter, then compute the highest-q match for JSON (e.g.,
"application/json" or "application/*" or "*/*") and for protobuf
("application/protobuf" or "application/x-protobuf"); return false if JSON has
higher or equal q than protobuf, return true if protobuf has higher q, and
default to true only when neither is present. Ensure changes are made inside the
wants_protobuf function and handle invalid/missing Accept gracefully.
In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 67-71: The bucket count calculation undercounts when duration is
an exact multiple of step: replace the manual ceil logic with inclusive-endpoint
logic by computing buckets as (duration_ns / step_ns) + 1 (using integer
division) so exact-multiple cases get the extra endpoint; update the calculation
in the block that uses ctx.start/ctx.end and step_ns and still respect the
existing guards (if duration_ns > 0 && step_ns > 0) and the subsequent
comparison against ctx.max_grid_points using the variable buckets.
In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 142-143: The code only checks that `percentile` is present when
building the quantile aggregate; you must also validate that the extracted `p`
is finite and within [0.0, 1.0] before continuing. After obtaining `p` from
`percentile.ok_or_else(...)` (the variable `p` used in the
`quantile_over_time(.field, p)` handling in pipeline.rs), add a guard that
returns a validation error if p.is_nan(), !p.is_finite(), p < 0.0, or p > 1.0 so
invalid percentiles are rejected with a clear validation message instead of
failing later in the planner/runtime.
- Around line 185-190: The error message in field_arg_expr is specific to
sum/avg/min/max and misleads callers of AggregationOp::Quantile (e.g.,
quantile_over_time()); update the validation error text in field_arg_expr to
mention quantile or use a generic "aggregation requires a field argument"
message so quantile errors are correctly described; locate field_arg_expr and
change the string passed to validation(...) accordingly.
In `@crates/icegate-query/src/traceql/Makefile`:
- Around line 9-10: The gen Make target runs java -jar $(ANTLR4_JAR) but doesn't
declare a prerequisite for $(ANTLR4_JAR) or the install target, causing failures
on clean checkouts; update the gen target declaration to depend on either
$(ANTLR4_JAR) or the install target (e.g., change the target line to "gen:
antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4 $(ANTLR4_JAR)" or add "install" as
a prerequisite) so the JAR is guaranteed present before invoking java -jar, and
ensure the existing install target builds or fetches $(ANTLR4_JAR).
- Around line 6-7: The Makefile's install rule currently wget's ANTLR4_JAR
without ensuring the destination directory exists or verifying file integrity;
update the install rule to first create the target directory (run mkdir -p $(dir
$(ANTLR4_JAR))) before wget, add an ANTLR4_SHA256 variable containing the
expected SHA256 checksum, download the jar to $(ANTLR4_JAR) (use a robust
curl/wget invocation) and then verify it with sha256sum -c or by comparing
`sha256sum` output to $(ANTLR4_SHA256), failing the build if the check fails,
and only after successful verification run chmod +x $(ANTLR4_JAR).
---
Nitpick comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 529-545: The dedupe_dotted_attributes(resource_attrs) call is
inside the per-span loop causing repeated work; hoist it so
merged_resource_attrs is computed once per resource_spans and reused for every
span. Specifically, compute let merged_resource_attrs =
dedupe_dotted_attributes(resource_attrs); immediately after obtaining
resource_attrs (before looping over spans), then inside the span loop just
iterate over &merged_resource_attrs to append to resource_attrs_builder; ensure
you remove the duplicate call in the span-processing block and keep the existing
merge_dotted_attributes(scope_attrs, &span.attributes) per-span behavior.
- Around line 1949-2053: In the test
spans_split_attributes_into_resource_and_span_columns add a collision case so
"span wins over scope": modify the ScopeSpans.InstrumentationScope.attributes to
include a key (e.g., "dup.key") with a scope value and the Span.attributes to
include the same key with a different span value, then after building batch
assert that span_attributes contains ("dup.key", spanValue) (and not the
scopeValue) — reference the test function
spans_split_attributes_into_resource_and_span_columns, the kv helper, and the
resource_attrs_col/span_attrs_col + map_row_as_pairs checks to locate where to
add the extra kv entries and assertion.
In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 318-322: The indexed-label discovery currently calls
collect_indexed_from_metadata(&metadata, config, &mut out) which ignores the
row-group predicate used by collect_map_keys_via_dict; update
collect_indexed_from_metadata to accept the same predicate parameter (e.g.,
predicate: &RowGroupPredicate or the existing predicate type) and apply that
predicate when scanning file-level metadata so indexed labels are only collected
from matching row-groups, then update this call site in metadata_scan::mod.rs to
pass predicate into collect_indexed_from_metadata; ensure any other callers are
updated accordingly and tests/usage reflect the new signature.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 083dea42-43f3-4f1f-8593-d2ef7ac0fd10
📒 Files selected for processing (22)
config/kustomize/README.mdconfig/kustomize/base/kustomization.yamlconfig/kustomize/base/otel-collector/configmap.yamlconfig/kustomize/base/values-grafana.yamlconfig/kustomize/base/values-jaeger.yamlcrates/icegate-common/src/testing/minio.rscrates/icegate-common/src/testing/s3.rscrates/icegate-ingest/src/otlp_http/config.rscrates/icegate-ingest/src/otlp_http/routes.rscrates/icegate-ingest/src/otlp_http/server.rscrates/icegate-ingest/src/transform.rscrates/icegate-query/src/engine/metadata_scan/mod.rscrates/icegate-query/src/engine/metadata_scan/parquet_reader.rscrates/icegate-query/src/logql/Makefilecrates/icegate-query/src/tempo/executor.rscrates/icegate-query/src/tempo/formatters.rscrates/icegate-query/src/tempo/handlers.rscrates/icegate-query/src/tempo/metadata.rscrates/icegate-query/src/traceql/Makefilecrates/icegate-query/src/traceql/datafusion/metrics.rscrates/icegate-query/src/traceql/datafusion/pipeline.rscrates/icegate-query/src/traceql/datafusion/selectors.rs
💤 Files with no reviewable changes (3)
- config/kustomize/README.md
- config/kustomize/base/values-jaeger.yaml
- config/kustomize/base/kustomization.yaml
✅ Files skipped from review due to trivial changes (1)
- crates/icegate-query/src/tempo/formatters.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/icegate-query/src/traceql/datafusion/selectors.rs
- crates/icegate-query/src/tempo/metadata.rs
…lumn type validation, and standardize attribute merging logic with deterministic ordering.
| validate_duration_param("minDuration", params.min_duration.as_deref())?; | ||
| validate_duration_param("maxDuration", params.max_duration.as_deref())?; | ||
| if params.min_duration.is_some() || params.max_duration.is_some() { | ||
| return Err(TempoError::new(QueryError::NotImplemented( |
There was a problem hiding this comment.
Maybe we should TODO mark here?
Summary by CodeRabbit
New Features
Storage & Query Improvements
Chores