GH-119: Tempo endpoint with TraceQL query by frisbeeman · Pull Request #123 · icegatetech/icegate

frisbeeman · 2026-04-27T15:58:07Z

Summary by CodeRabbit

New Features
- Full Tempo API: trace lookup, search, tag discovery/tag-values (v1 & v2) with JSON/proto responses and validation.
- TraceQL: new query language for trace search and metrics (parser, planner, execution).
- Grafana: demo Loki/Tempo datasources with trace↔log cross-links and “View trace” links.
Storage & Query Improvements
- Split span attributes into resource vs span maps; improved tag discovery via Parquet metadata.
- Parquet bloom-filter support for faster lookups.
Chores
- OTLP HTTP: configurable max body size enforced (over-limit requests return 413).

Implements /api/search/tags (v1), /api/v2/search/tags, and /api/search/tag/{name}/values on top of the TraceQL-search-capable Tempo module from 562ad71. Generalises the former log_metadata_scan module into a table-agnostic metadata_scan driven by MetadataScanConfig so logs, spans, and future metrics share the same parquet-dictionary-direct label discovery path. - New MetadataScanConfig captures indexed columns, aliases, excluded map keys, and the map-column name per table; scan_labels / scan_label_values take this config plus an optional extra iceberg Predicate. - LogQL selector translation moved from the shared module to loki/predicate.rs. - tempo/metadata.rs scans the post-split resource_attributes and span_attributes maps separately, mapping each into its TraceQL scope; tag-value lookup routes by dotted prefix (resource.X / span.X). - Tempo TraceQL intrinsic tags (name, status, kind, duration, ...) and span column constant lists added to icegate-common/schema.rs. Includes 7 Parquet-backed integration tests for tempo tag endpoints. All 367 lib tests, 46 loki integration tests, and 7 tempo integration tests pass. Clippy -D warnings clean, fmt clean.

…lti-root traces, and enforce per-trace span caps (`spss`)

Integrates handling for `events` and `links` in span formatting for both OTLP JSON and protobuf outputs. Includes utilities for decoding, encoding, and clamping invalid values. Adjusts tests to verify proper propagation of `droppedEventsCount` and `droppedLinksCount`.

…les (`logs`, `spans`) and introduce allowlist-based dispatch.

coderabbitai · 2026-04-27T15:58:21Z

Warning

Rate limit exceeded

@frisbeeman has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 26 minutes and 33 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 37270a64-a6fa-499c-a0f1-c086cc84e743

📥 Commits

Reviewing files that changed from the base of the PR and between d84dbe9 and 97ecb61.

📒 Files selected for processing (3)

crates/icegate-ingest/src/transform.rs
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs
crates/icegate-query/src/traceql/datafusion/metrics.rs

Walkthrough

Adds TraceQL (parser, AST, planner, DataFusion execution), implements Tempo HTTP APIs (trace lookup, search, tag discovery), refactors Parquet metadata label discovery, splits span attributes into resource/span maps across ingest/transform/storage, injects OTEL trace context into logs/tracing output, and enables Parquet column bloom filters for trace/span lookup.

Changes

Cohort / File(s)	Summary
Workspace & Tooling `Cargo.toml`, `rustfmt.toml`, `.cargo/audit.toml`, `clippy.toml`	Dependency/tooling updates: added deps (prost/hex/serde_json/chrono/opentelemetry-proto), enabled `tower-http` timeout feature, rustfmt ignore rules for generated ANTLR files, audit & clippy config tweaks.
Grafana & OTEL collector `config/docker/grafana/.../datasources.yaml`, `config/kustomize/base/values-grafana.yaml`, `config/docker/otel-collector/config.yaml`, `config/kustomize/base/otel-collector/configmap.yaml`	Added demo Loki/Tempo datasources with trace↔logs linking; OTEL collector/log transform propagates optional JSON `trace_id`/`span_id` into log record trace context; demo headers set.
Schema & common `crates/icegate-common/src/schema.rs`, `crates/icegate-common/Cargo.toml`	Spans schema split: `resource_attributes` and `span_attributes` MAP columns; added new span column constants and tag-discovery lists; added serde_json and chrono deps.
Tracing JSON formatter `crates/icegate-common/src/tracing.rs`	Custom JSON formatter injects optional `trace_id`/`span_id` from current span OtelData extension into emitted JSON events; tests updated.
Ingest / Transform / Storage `crates/icegate-ingest/src/transform.rs`, `crates/icegate-ingest/src/cli/commands/run.rs`, `crates/icegate-ingest/src/shift/...`, `crates/icegate-queue/src/writer.rs`	Transforms now emit separate resource/span MAP columns; WAL/shift and QueueWriter accept per-topic `bloom_filter_columns` and enable Parquet column bloom filters via writer properties; signatures wired through ShiftJobSpec → IcebergStorage.
Query engine metadata-scan refactor `crates/icegate-query/src/engine/log_metadata_scan/` (deleted), `crates/icegate-query/src/engine/metadata_scan/` (new), `crates/icegate-query/src/engine/mod.rs`	Removed old log_metadata_scan module; replaced with unified `metadata_scan` implementing footer/dictionary-driven indexed-column and MAP-key discovery plus INT32 dictionary helpers; integrated into engine.
Provider / Table routing `crates/icegate-query/src/engine/provider/schema.rs`, `crates/icegate-query/src/engine/provider/table.rs`, `crates/icegate-query/src/engine/provider/mod.rs`	Schema provider now allowlists WAL-merged tables (logs, spans) and passes WAL topic into IcegateTableProvider; provider/table updated to accept and use topic for WAL segment listing.
Loki label endpoints & predicates `crates/icegate-query/src/loki/*`	Label endpoints now call new metadata_scan APIs with explicit tenant/time predicates; added LogQL→metadata predicate translation module.
TraceQL language & ANTLR artifacts `crates/icegate-query/src/traceql/*/`	Full TraceQL implementation: ANTLR grammars, generated lexer/parser/listener/visitor, visitor→AST, parser trait, common types, duration helpers, Makefile and generated artifacts.
TraceQL DataFusion planner & selectors `crates/icegate-query/src/traceql/datafusion/*`	DataFusion planner for TraceQL: selector→DataFrame translation, two-stage search limiting, pipeline aggregations, metrics-mode support, and tests.
TraceQL → Iceberg predicate `crates/icegate-query/src/traceql/iceberg_predicate.rs`	Translator producing conservative Iceberg predicates for metadata-scan pushdown with an excluding variant.
Tempo service & APIs `crates/icegate-query/src/tempo/*`	Implemented Tempo: trace lookup (proto/JSON), search (TraceQL executor), tag discovery v1/v2, tag-values (typed/untyped), formatters, validation, routes, and server wiring.
Tempo tests & harness `crates/icegate-query/tests/tempo/*`	Integration tests and TestServer harness for Tempo, span-writing helpers, echo and tag-discovery tests.
TraceQL tests `crates/icegate-query/src/traceql/antlr/tests.rs`, traceql planner tests	Comprehensive parser/planner unit tests validating AST shapes, planner behavior, and DataFusion-backed execution.
Test helpers / MinIO `crates/icegate-common/src/testing/minio.rs`, `crates/icegate-maintain/*`, `crates/icegate-common/src/testing/s3.rs`	Refactored MinIO testcontainer into builder API and centralized testing helpers; maintain tests switched to these helpers and credentials constants.
OTLP HTTP limits `crates/icegate-ingest/src/otlp_http/*`	Added `max_body_bytes` config, enforce decompressed request body size via DefaultBodyLimit in routes, and validation.
Queue/WAL wiring `crates/icegate-queue/src/writer.rs`, `crates/icegate-ingest/src/cli/commands/run.rs`, `crates/icegate-ingest/src/shift/iceberg_storage.rs`, `crates/icegate-ingest/src/shift/mod.rs`	Wired bloom_filter_columns through QueueWriter, IcebergStorage signature changed to accept columns, build_writer_properties helper emits Parquet bloom filters; shift specs updated.
Misc / small `crates/icegate-jobmanager`, `rustfmt.toml`, `.cargo/audit.toml`, `clippy.toml`, others	Minor refactors (Option::is_none_or), formatting/ignore tweaks, audit/clippy msrv update, dev-dependency changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TempoHandler as Tempo API
    participant Parser as TraceQL Parser
    participant Planner as DataFusion Planner
    participant Engine as Query Engine
    participant Iceberg as Iceberg/Parquet
    participant Formatter as OTLP Formatter

    Client->>TempoHandler: POST /api/search (TraceQL)
    TempoHandler->>Parser: parse(query)
    Parser-->>TempoHandler: TraceQLExpr
    TempoHandler->>Planner: plan(expr, QueryContext)
    Planner->>Engine: scan spans (tenant/time + selector)
    Engine->>Iceberg: read filtered RecordBatches
    Iceberg-->>Engine: RecordBatch[]
    Engine-->>Planner: batches
    Planner->>Planner: aggregate by trace_id (limit)
    Planner->>Engine: fetch traces by trace_id IN (...)
    Engine->>Iceberg: read full traces
    Iceberg-->>Engine: RecordBatch[]
    Planner->>Formatter: spansets_to_search_response(batches)
    Formatter-->>TempoHandler: SearchResponse
    TempoHandler-->>Client: 200 + JSON/Proto

sequenceDiagram
    participant Client
    participant TempoHandler as Tempo API
    participant MetadataScan as metadata_scan
    participant Parquet as Parquet reader
    participant Catalog as Iceberg catalog

    Client->>TempoHandler: GET /api/search/tags
    TempoHandler->>MetadataScan: scan_labels(table, window, extra_pred)
    MetadataScan->>Catalog: plan files (tenant/time predicate)
    MetadataScan->>Parquet: read row-group stats (null_count)
    Parquet-->>MetadataScan: indexed-column presence
    MetadataScan->>Parquet: read dictionary pages for map.key
    Parquet-->>MetadataScan: distinct map keys
    MetadataScan->>TempoHandler: merged tag set
    TempoHandler-->>Client: 200 + TagsV2Response

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

GH-71: Integrate WAL support into QueryEngine #72 — overlaps on QueryEngine/provider and WAL-backed table support, directly related to provider/schema and table/topic wiring.
GH-108: Add demo logging to all services #112 — related Grafana/OTEL demo datasource and collector changes.
GH-52: Add tracing and metrics to query service. #70 — related tracing and instrumentation changes affecting tracing_subscriber and trace context propagation.

Suggested reviewers

s-prosvirnin

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch GH-119/query

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (18)

crates/icegate-query/src/traceql/common.rs (2)
255-259: Optional: cover all KindValue variants in the OTLP-code test.

The current test only asserts Server and Client. Since the encoding is hand-maintained, an off-by-one on Unspecified, Internal, Producer, or Consumer would slip through. Adding the remaining four assertions is a one-line each safety net (or moot if you adopt #[repr(i32)]).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/common.rs` around lines 255 - 259, The test
kind_otlp_code_matches_otlp_spec currently only checks Server and Client; extend
it to assert all KindValue variants map to their OTLP integer codes: assert
KindValue::Unspecified.otlp_code() == 0, KindValue::Internal.otlp_code() == 1,
KindValue::Server.otlp_code() == 2, KindValue::Client.otlp_code() == 3,
KindValue::Producer.otlp_code() == 4, and KindValue::Consumer.otlp_code() == 5
so any off-by-one or missing mapping is caught (update the test function
kind_otlp_code_matches_otlp_spec to include these four additional assertions).
166-220: Optional: derive numeric encodings via #[repr(i32)] to keep schema and enum in lockstep.

StatusValue::otlp_code and KindValue::otlp_code hand-maintain numeric mappings that are also implicit in the OTLP enum ordering. Annotating each enum with #[repr(i32)] and replacing the match with *self as i32 would remove the duplication and make schema drift impossible.
♻️ Proposed refactor
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
+#[repr(i32)]
 pub enum StatusValue {
-    /// `ok`
-    Ok,
-    /// `error`
-    Error,
-    /// `unset`
-    Unset,
+    /// `unset`
+    Unset = 0,
+    /// `ok`
+    Ok = 1,
+    /// `error`
+    Error = 2,
 }

 impl StatusValue {
     #[must_use]
-    pub const fn otlp_code(&self) -> i32 {
-        match self {
-            Self::Unset => 0,
-            Self::Ok => 1,
-            Self::Error => 2,
-        }
-    }
+    pub const fn otlp_code(self) -> i32 { self as i32 }
 }
(Same shape for KindValue.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/common.rs` around lines 166 - 220,
StatusValue::otlp_code and KindValue::otlp_code manually duplicate numeric
mappings; add #[repr(i32)] to the StatusValue and KindValue enum definitions and
replace the match bodies in their otlp_code methods with a simple cast (return
*self as i32) so the numeric encoding is derived from the enum layout; update
the otlp_code functions (still const fn) to perform the cast and remove the
match arms for StatusValue::otlp_code and KindValue::otlp_code.
crates/icegate-query/src/traceql/datafusion/mod.rs (1)
11-16: Optional: drop redundant prefixes on pub(crate) re-exports.

The pipeline_ / selectors_ prefixes duplicate information already implicit in the module path. Since these are pub(crate), callers can disambiguate via the import path (e.g., use crate::traceql::datafusion::{pipeline::field_ref_to_group_key, selectors::intrinsic_column}) or via local aliases at the import site. The re-export aliases here just spread the naming convention across the crate.

If you want to keep a single grouped re-export, consider:
♻️ Suggested simplification
-// Re-exports consumed by the metrics planner (and any downstream callers
-// that need a uniform field-ref → group-key conversion).
-pub(crate) use pipeline::field_ref_to_group_key as pipeline_field_ref_to_group_key;
 pub use planner::DataFusionPlanner;
-pub(crate) use selectors::{
-    intrinsic_column as selectors_intrinsic_column, resource_attribute_lhs as selectors_resource_attribute_lhs,
-    span_attribute_lhs as selectors_span_attribute_lhs,
-};
+pub(crate) use pipeline::field_ref_to_group_key;
+pub(crate) use selectors::{intrinsic_column, resource_attribute_lhs, span_attribute_lhs};
Non-blocking — the current naming is consistent and readable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/datafusion/mod.rs` around lines 11 - 16, The
current pub(crate) re-exports add redundant prefixes (pipeline_ and selectors_)
that duplicate module path info; update the re-exports in mod.rs to use the
original symbol names instead of aliased ones so callers can rely on the module
path for disambiguation — e.g., stop exporting pipeline::field_ref_to_group_key
as pipeline_field_ref_to_group_key and
selectors::intrinsic_column/resource_attribute_lhs/span_attribute_lhs as
selectors_intrinsic_column/selectors_resource_attribute_lhs/selectors_span_attribute_lhs,
and instead re-export them directly (keep DataFusionPlanner export as-is).
crates/icegate-query/src/loki/mod.rs (1)
43-48: Optional: avoid drifting excluded_map_keys between configs.

LOGS_VALUES_METADATA_CONFIG reuses LOGS_METADATA_CONFIG.label_aliases (good) but inlines excluded_map_keys: &[] and re-states map_column. If the design intent is "values config is the labels config with no exclusions and a wider indexed-column set", consider deriving more of it from LOGS_METADATA_CONFIG so future edits to aliases/map_column only need to change one site:
♻️ Optional consolidation
 const LOGS_VALUES_METADATA_CONFIG: MetadataScanConfig = MetadataScanConfig {
     indexed_columns: icegate_common::schema::LOG_INDEXED_ATTRIBUTE_COLUMNS,
     label_aliases: LOGS_METADATA_CONFIG.label_aliases,
     excluded_map_keys: &[],
-    map_column: icegate_common::schema::COL_ATTRIBUTES,
+    map_column: LOGS_METADATA_CONFIG.map_column,
 };
Non-blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/loki/mod.rs` around lines 43 - 48,
LOGS_VALUES_METADATA_CONFIG duplicates parts of LOGS_METADATA_CONFIG (it inlines
excluded_map_keys: &[] and re-states map_column), which can drift; change
LOGS_VALUES_METADATA_CONFIG to build from LOGS_METADATA_CONFIG by copying its
label_aliases and map_column (or cloning/using a struct update pattern) and only
override indexed_columns and excluded_map_keys to &[] so future updates to
LOGS_METADATA_CONFIG.label_aliases or map_column are reflected automatically;
specifically adjust the MetadataScanConfig construction for
LOGS_VALUES_METADATA_CONFIG to derive fields from LOGS_METADATA_CONFIG instead
of repeating them.
crates/icegate-query/src/traceql/Makefile (1)
3-16: Mark phony targets and use rm -f for idiomatic cleanup.

install, gen, and clean are all phony — they don't produce a file with their own name. Without a .PHONY declaration, they break the day someone happens to create a clean (or gen) file/dir in this directory. Static analysis already flagged this for clean. The || true pattern in clean is also better expressed with rm -f (idempotent on missing files, and lets make see real failures).
♻️ Proposed diff
 ANTLR4_JAR = ../../../../target/antlr4.jar

+.PHONY: install gen clean
+
 install:
 	wget -O $(ANTLR4_JAR) https://github.com/antlr4rust/antlr4/releases/download/v0.5.0/antlr4-4.13.3-SNAPSHOT-complete.jar
 	chmod +x $(ANTLR4_JAR)

 gen: antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4
 	java -jar $(ANTLR4_JAR) -Dlanguage=Rust -visitor antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4
 	sed -i.bak 's/TraceQLParserParserContext/TraceQLParserContext/g' antlr/traceqlparser.rs && rm antlr/traceqlparser.rs.bak # https://github.com/rrevenantt/antlr4rust/pull/53
 	for f in antlr/traceql*.rs; do sed -i.bak '1s/^/#![allow(clippy::all, clippy::pedantic, clippy::nursery, clippy::unwrap_used)]\n/' $$f && rm $$f.bak; done

 clean:
-	rm antlr/traceql*.rs || true
-	rm antlr/*.tokens || true
-	rm antlr/*.interp || true
-	rm -rf antlr/gen || true
+	rm -f antlr/traceql*.rs
+	rm -f antlr/*.tokens
+	rm -f antlr/*.interp
+	rm -rf antlr/gen
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/Makefile` around lines 3 - 16, Make the
Makefile targets robust by declaring the phony targets and using idiomatic rm
flags: add a .PHONY line listing install, gen, and clean, and replace the "rm
... || true" patterns in the clean target with proper rm flags (e.g., rm -f for
files and rm -rf for directories) so failures are surfaced and removals are
idempotent; update the clean rule operating on patterns like antlr/traceql*.rs,
antlr/*.tokens, antlr/*.interp, and antlr/gen accordingly and keep the existing
behavior for non-fatal removals.
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs (1)
326-341: Inconsistent handling of malformed dictionary offsets — stats fallback is skipped.

When dictionary_page_offset() returns Some but u64::try_from(...) fails on the offset, the code increments no_dict_no_stats and continues the outer loop, skipping the stats fallback at lines 343–353. In contrast, when dict_end <= dict_start, control falls through to the stats fallback. This asymmetry is subtle: for a corrupted/negative offset we miss a viable enumeration via row-group statistics.

Negative offsets in a valid Parquet file are unlikely, but if you want consistency consider falling through to stats in all "dict path failed" branches:
♻️ Proposed restructure
-        // Try dictionary page first.
-        if let Some(dict_offset) = col_chunk.dictionary_page_offset() {
-            let data_offset = col_chunk.data_page_offset();
-            let Ok(dict_start) = u64::try_from(dict_offset) else {
-                no_dict_no_stats += 1;
-                continue;
-            };
-            let Ok(dict_end) = u64::try_from(data_offset) else {
-                no_dict_no_stats += 1;
-                continue;
-            };
-            if dict_end > dict_start {
-                ranges.push(dict_start..dict_end);
-                selected_rgs.push(rg_idx);
-                continue;
-            }
-        }
+        // Try dictionary page first; fall through to stats if any
+        // offset is malformed or the dict region is empty.
+        let dict_range = col_chunk.dictionary_page_offset().and_then(|d| {
+            let dict_start = u64::try_from(d).ok()?;
+            let dict_end = u64::try_from(col_chunk.data_page_offset()).ok()?;
+            (dict_end > dict_start).then_some(dict_start..dict_end)
+        });
+        if let Some(range) = dict_range {
+            ranges.push(range);
+            selected_rgs.push(rg_idx);
+            continue;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs` around lines
326 - 341, The code in parquet_reader.rs treats failures converting dictionary
offsets (calls to u64::try_from on dict_offset or data_offset) by incrementing
no_dict_no_stats and continuing, which skips the later row-group stats fallback;
instead, when conversion fails for dict_start or dict_end, do not
early-continue—increment no_dict_no_stats but allow control to fall through to
the existing stats-path (the same path taken when dict_end <= dict_start).
Update the block that handles col_chunk.dictionary_page_offset() so failures
converting to u64 for dict_start or dict_end do not execute continue; keep
updates to no_dict_no_stats and then let the function proceed to the stats
fallback (affecting symbols: col_chunk.dictionary_page_offset(), dict_start,
dict_end, no_dict_no_stats, ranges, selected_rgs, rg_idx).
crates/icegate-query/src/traceql/datafusion/planner_tests.rs (2)
196-210: Window lower bound is hard-coded to 2026-01-01.

make_query_ctx will silently filter out the entire fixture if these tests ever run on a system clock earlier than 2026 (e.g., during a date-shifted CI replay or a developer's misconfigured machine). Consider basing the window on Utc::now() - Duration::days(1) ..= Utc::now() + Duration::days(1) so the tests track wall-clock rather than depending on calendar drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs` around lines
196 - 210, The test QueryContext in make_query_ctx currently hard-codes
start/end to 2026/2099 which can drop fixtures on systems with different clocks;
update make_query_ctx to compute start as Utc::now() - Duration::days(1) and end
as Utc::now() + Duration::days(1) (using chrono::Duration) so the QueryContext
(tenant_id, start, end, limit, spans_per_spanset, etc.) uses a relative
wall-clock window instead of fixed calendar dates; ensure you import or qualify
Duration and keep other fields (spans_per_spanset, max_grid_points) unchanged.
270-353: Add regex-operator coverage.

The attribute-routing block covers = thoroughly (resource/span/Any-scope, indexed-column short-circuit, missing-key NULL semantics) but not =~/!~. Adding even one case like { name =~ "GET.*" } and one attribute regex like { resource.service.name =~ "fro.*" } would have caught the cast_to(empty DFSchema) issue flagged in selectors.rs::regex_match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs` around lines
270 - 353, Add tests exercising the regex operators =~ and !~ to cover attribute
routing and catch the cast_to(empty DFSchema) bug in selectors.rs::regex_match:
create new async tests in planner_tests.rs similar to
resource_service_name_quoted_eq_returns_matching_rows that call
fixture_session(), make_query_ctx("t1") and run_and_count(ctx, qctx, "{ name =~
\"GET.*\" }") and "{ resource.service.name =~ \"fro.*\" }" (also consider
unquoted RHS like "{ name =~ GET.* }"), asserting the expected row counts;
ensure you include at least one attribute-scope, one resource-scope, and a
negative/!~ case so regex routing/NULL semantics are exercised and will trigger
the selector code path that previously hit cast_to(empty DFSchema).
crates/icegate-query/tests/tempo/tags.rs (1)
1-3: Optional: extend the module doc comment.

The doc lists /api/search/tags, /api/v2/search/tags, /api/search/tag/{name}/values, but the file also covers /api/v2/search/tag/{name}/values (the typed v2 value endpoint with inspectedBytes as string and keyword-typed status enum). Worth mentioning so a future reader doesn't think v2-values is untested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/tests/tempo/tags.rs` around lines 1 - 3, Update the
module doc comment at the top of the tags.rs test module to also mention the
`/api/v2/search/tag/{name}/values` endpoint; explicitly note that the file
covers the typed v2 values endpoint (with `inspectedBytes` represented as a
string and the `status` enum using keyword-typed variants) so readers know
v2-values behavior is tested alongside `/api/search/tags`,
`/api/v2/search/tags`, and `/api/search/tag/{name}/values`.
crates/icegate-query/src/traceql/antlr/visitors.rs (1)
419-426: Silent truncation if grammar ever yields >2 operands.

pop_two_filters only consumes filters[0] and filters[1]. ANTLR binary rules typically emit exactly 2 children, so this is safe today, but if the grammar evolves to a left-recursive list (e.g. filter ('AND' filter)+), additional operands would be silently dropped. Consider assert_eq!(filters.len(), 2) in debug builds, or fold over the slice. Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/antlr/visitors.rs` around lines 419 - 426,
pop_two_filters currently only reads filters[0] and filters[1], silently
dropping any extra operands if the grammar ever produces >2 children; add a
debug-only check to catch this regression by inserting something like
debug_assert_eq!(filters.len(), 2) near the start of pop_two_filters (after the
existing len < 2 error check), and keep using visit_span_filter for filters[0]
and filters[1]; alternatively, if you want to support n-ary operands, change the
logic to visit and fold the entire filters slice into a left-associative
combined SpanFilter instead of indexing only [0] and [1].
crates/icegate-query/src/engine/metadata_scan/values.rs (1)
105-118: Optional: apply the system-reserved guard symmetrically.

collect_indexed_values_via_dict (line 76) gates on is_system_reserved_value_column, but collect_indexed_int_values_via_dict doesn't. In practice the only reserved column is tenant_id which is a string, so the int path can't currently leak — but the asymmetry weakens the "metadata-scan layer is the last line of defense" property in the doc on lines 30-33. Cheap to add for consistency.
♻️ Proposed change
 pub async fn collect_indexed_int_values_via_dict(
     reader: &mut ArrowFileReader,
     metadata: &ParquetMetaData,
     predicate: &Predicate,
     column_name: &str,
     out: &mut BTreeSet<i32>,
 ) -> Result<(), MetadataScanError> {
+    if is_system_reserved_value_column(column_name) {
+        return Ok(());
+    }
     let schema = metadata.file_metadata().schema_descr();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/values.rs` around lines 105 -
118, collect_indexed_int_values_via_dict is missing the same system-reserved
column guard used by collect_indexed_values_via_dict; add a call to
is_system_reserved_value_column(column_name) at the start of
collect_indexed_int_values_via_dict and return Ok(()) when it returns true so
reserved columns are skipped consistently, then proceed to find leaf_idx and
call parquet_reader::read_column_int_dictionaries as before (keep function
signature and reader/metadata/predicate/out usage unchanged).
crates/icegate-query/src/engine/metadata_scan/mod.rs (1)
283-293: LGTM — small operational note on plan_files.

stream.try_collect() materializes the full Vec<FileScanTask> before any per-file work begins. Combined with MAX_QUERY_WINDOW = 30 days from tempo::validation, a worst-case tenant could plan tens-of-thousands of tasks before the first scan_labels_for_file runs. For today's deployments this is fine; if metadata scans ever start hitting memory ceilings, switching to a streamed try_buffered_unordered-style pipeline (no intermediate Vec) is the natural next step.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 283 - 293,
The current plan_files function collects the entire stream into a Vec via
stream.try_collect(), which can materialize tens-of-thousands of FileScanTask
items and spike memory; change plan_files to return a streaming/iterator-based
pipeline instead of a Vec by processing the stream from scan.plan_files() as it
yields items (e.g., replace try_collect() usage with a streamed approach using
try_buffered_unordered or similar combinators) so callers can handle
FileScanTask items incrementally; locate the function plan_files (takes &Table
and &Predicate), the call to table.scan().with_filter(...).build() and the
variable stream = scan.plan_files(), and modify the downstream usage to avoid
creating an intermediate Vec<FileScanTask>.
crates/icegate-query/src/tempo/validation.rs (1)
169-173: Redundant trimmed.eq_ignore_ascii_case("tenant_id") clause.

last_segment is computed as trimmed.rsplit('.').next().unwrap_or(trimmed). When trimmed == "tenant_id" (no dots), rsplit('.').next() returns the entire string "tenant_id", so last_segment.eq_ignore_ascii_case("tenant_id") already matches that case. The trailing || trimmed.eq_ignore_ascii_case("tenant_id") clause never adds new coverage — safe to drop.
♻️ Proposed simplification
-    let trimmed = name.trim_start_matches('.');
-    let last_segment = trimmed.rsplit('.').next().unwrap_or(trimmed);
-    if last_segment.eq_ignore_ascii_case("tenant_id") || trimmed.eq_ignore_ascii_case("tenant_id") {
+    let trimmed = name.trim_start_matches('.');
+    let last_segment = trimmed.rsplit('.').next().unwrap_or(trimmed);
+    if last_segment.eq_ignore_ascii_case("tenant_id") {
         return Err(QueryError::Validation("tag name 'tenant_id' is reserved".into()));
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/validation.rs` around lines 169 - 173, The
check redundantly tests trimmed.eq_ignore_ascii_case("tenant_id") after
computing last_segment from trimmed; remove the redundant clause so the
reserved-name validation only checks
last_segment.eq_ignore_ascii_case("tenant_id"), i.e., update the condition that
returns QueryError::Validation("tag name 'tenant_id' is reserved") to rely
solely on last_segment (remove the || trimmed.eq_ignore_ascii_case("tenant_id")
part) in the function that performs Tempo tag validation (uses trimmed and
last_segment).
crates/icegate-query/src/tempo/formatters.rs (2)
172-175: Inconsistent overflow handling for timestamp → nanos conversion.

These three call sites compute nanos via raw * 1_000 multiplication on i64:

Line 172: (start_micros * 1_000).to_string()

Line 174: (e * 1_000).to_string()

Line 519: (start_micros * 1_000).to_string()

But the matched-span builder (lines 562-563) uses saturating_mul, and the proto path (line 779) goes through micros_to_nanos_u64 which at least clamps negatives. With realistic timestamps this never triggers, but a pathological/test fixture value would panic in debug builds. Worth normalizing on a single helper.

♻️ Suggested approach

Add a small micros_to_nanos_str(i64) -> String helper that uses saturating_mul and route all three sites (plus the matched-span builder) through it. The proto-side micros_to_nanos_u64 could similarly use saturating_mul on the final * 1_000 to harden against i64::MAX micros wrapping u64.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/formatters.rs` around lines 172 - 175,
Introduce a small helper micros_to_nanos_str(i64) -> String that converts
microseconds to nanoseconds using saturating_mul(1_000) and returns the
stringified result, then replace the direct `(start_micros * 1_000).to_string()`
and `(e * 1_000).to_string()` call sites in the span insertion code and the
matched-span builder with this helper; also harden the proto-side conversion
function micros_to_nanos_u64 to use saturating_mul for the final
multiplication/path that converts i64 micros to u64 nanos so all conversion
sites share the same overflow-safe behavior.
235-247: Inconsistent error handling in *_opt column helpers: string_col_opt and i32_col_opt panic on type mismatch, while ts_col_opt and list_col_opt gracefully return None.

string_col_opt (line 235) calls as_string::<i32>() and i32_col_opt (line 242) calls as_primitive::<Int32Type>(), both of which panic if the column exists but has the wrong Arrow type. By contrast, ts_col_opt (line 256) and list_col_opt (line 277) use as_any().downcast_ref() and return None on type mismatch.

For consistency across the *_opt family, refactor string_col_opt and i32_col_opt to use as_any().downcast_ref::<StringArray>() and as_any().downcast_ref::<Int32Array>() respectively, mirroring the graceful behavior of their peers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/formatters.rs` around lines 235 - 247, The
helpers string_col_opt and i32_col_opt currently call as_string::<i32>() and
as_primitive::<Int32Type>() which panic on type mismatch; change both to the
safe downcast pattern used by ts_col_opt/list_col_opt: after finding the column
index via batch.schema().column_with_name(name) call
batch.column(idx).as_any().downcast_ref::<StringArray>() for string_col_opt and
.downcast_ref::<Int32Array>() for i32_col_opt, and return that Option so a wrong
Arrow type yields None instead of a panic.
crates/icegate-query/src/tempo/models.rs (1)
88-92: #[allow(dead_code)] here is likely unnecessary.

max_stale_values is read by Serde during deserialization, so the compiler should not flag it as dead code, even with the repo-wide dead_code = "deny" lint. Removing the suppression keeps the deny lint honest — and if the compiler ever does start flagging the field, that's a real signal something has changed. If you want to make the "intentionally ignored" intent explicit, prefer #[serde(rename = "maxStaleValues")] _max_stale_values: ... or drop the field altogether and use #[serde(deny_unknown_fields = false)] semantics.

As per coding guidelines: "Configure dead_code = "deny" lint to disallow unused code in Cargo.toml".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/models.rs` around lines 88 - 92, Remove the
unnecessary #[allow(dead_code)] on the struct field max_stale_values: this field
is deserialized by Serde so the dead_code suppression should be dropped to
respect the crate-wide deny lint; locate the field definition named
max_stale_values (with #[serde(rename = "maxStaleValues", default)]) and delete
the #[allow(dead_code)] attribute, or alternatively rename the field to a
private unused-prefixed name like _max_stale_values if you intend to signal it
is intentionally ignored, or remove the field entirely if not needed.
crates/icegate-query/src/tempo/metadata.rs (1)
168-230: Prefer Option::is_none_or over map_or(true, …).

The five scope_filter.map_or(true, |s| s == Scope::X) checks are clearer (and silence clippy::unnecessary_map_or under pedantic) as scope_filter.is_none_or(|s| s == Scope::X).
♻️ Proposed change
-    if scope_filter.map_or(true, |s| s == Scope::Resource) {
+    if scope_filter.is_none_or(|s| s == Scope::Resource) {
(Apply to all five map_or(true, …) sites in list_tags_v2.)
As per coding guidelines: "Enable clippy pedantic and nursery lints in Cargo.toml for strict code quality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/metadata.rs` around lines 168 - 230, Replace
the five occurrences of scope_filter.map_or(true, |s| s == Scope::X) in
list_tags_v2 with the clearer Option::is_none_or form: call
scope_filter.is_none_or(|s| s == Scope::Resource/Span/Intrinsic/Event/Link)
respectively (affects the checks around Scope::Resource, Scope::Span,
Scope::Intrinsic, Scope::Event, Scope::Link in metadata.rs); ensure you import
or use the Option::is_none_or method available on Option and preserve existing
logic and error handling in the surrounding code (e.g., the blocks that
construct ScopeGroup and call metadata_scan::scan_labels).
crates/icegate-query/src/traceql/spanset.rs (1)
62-91: Take self by value for Copy enum methods.

SpansetOp derives Copy, so &self on as_str and is_hierarchy may trip clippy::trivially_copy_pass_by_ref under pedantic lints. Note Scope::as_str in tempo/models.rs already uses self by value — making this consistent also matches pedantic guidance.
♻️ Proposed change
-    pub const fn as_str(&self) -> &'static str {
+    pub const fn as_str(self) -> &'static str {
         match self {
             Self::And => "&&",
@@
-    pub const fn is_hierarchy(&self) -> bool {
+    pub const fn is_hierarchy(self) -> bool {
         !matches!(self, Self::And | Self::Or)
     }
As per coding guidelines: "Enable clippy pedantic and nursery lints in Cargo.toml for strict code quality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/spanset.rs` around lines 62 - 91, The
methods as_str and is_hierarchy take &self but SpansetOp is a Copy enum, so
change their signatures to take self by value (pub const fn as_str(self) ->
&'static str and pub const fn is_hierarchy(self) -> bool) to satisfy
clippy::trivially_copy_pass_by_ref; update any internal pattern matches to match
on self and keep the existing match arms and logic unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/kustomize/base/values-grafana.yaml`:
- Around line 87-97: Remove the redundant flags filterByTraceID and
filterBySpanID under the tracesToLogsV2 block since customQuery: true makes them
no-ops; update the tracesToLogsV2 section (the keys customQuery, query,
spanStartTimeShift, spanEndTimeShift, datasourceUid) by deleting filterByTraceID
and filterBySpanID so the configuration relies solely on the provided custom
query '{trace_id="$${__span.traceId}"}'.

In `@crates/icegate-ingest/src/transform.rs`:
- Around line 526-538: The code currently appends scope and span attributes
directly to span_attrs_builder using add_flattened_attributes_dotted
(referencing scope_spans.scope and span.attributes), which can produce duplicate
keys and relies on reader-specific last-write-wins semantics; change this to
dedupe per-span by building a small HashMap<String,String> (insert scope
attributes first, then span attributes so span wins) for each span, then iterate
that map to feed span_attrs_builder (and keep the existing append calls to
resource_attrs_builder/span_attrs_builder). Update
add_flattened_attributes_dotted usage accordingly or replace it with a helper
that merges into the per-span HashMap before writing to span_attrs_builder to
guarantee one entry per key and preserve the documented "span shadows scope"
intent.

In `@crates/icegate-query/src/loki/mod.rs`:
- Around line 21-36: The doc comment on LOGS_METADATA_CONFIG is misleading: it
says `/label_values` can resolve excluded_map_keys like `trace_id`/`span_id`
"explicitly via the alias logic", but those keys are actually resolved via the
indexed_columns set used in LOGS_VALUES_METADATA_CONFIG
(LOG_INDEXED_ATTRIBUTE_COLUMNS), not via label_aliases; update the comment to
state that label_aliases only contain the three mappings (level→severity_text,
detected_level→severity_text, service→service_name) and clarify that
`/label_values` resolution for trace_id/span_id is handled by the
indexed_columns configuration referenced in LOGS_VALUES_METADATA_CONFIG rather
than by label_aliases or excluded_map_keys.

In `@crates/icegate-query/src/tempo/executor.rs`:
- Around line 123-142: The parsed TraceQL expression (expr from
AntlrParser::parse) can be a metrics-mode query which DataFusionPlanner::plan
will accept but which spansets_to_search_response cannot format; reject
non-search expressions immediately after parsing and before planning by matching
on TraceQLExpr (allow TraceQLExpr::Spanset and PipelineExpr::Search variants
only) and return a TempoError::new(QueryError::Validation(...)) with a clear
message like "metrics-mode queries are not supported on /api/search; use the
metrics endpoint" so metric RecordBatches never reach DataFusionPlanner::plan or
spansets_to_search_response.
- Around line 62-74: The eager duration validation is dead because
params.min_duration/max_duration are checked for NotImplemented before parsing;
move the validation so you call
params.min_duration.as_deref().and_then(parse_duration_opt) and
params.max_duration.as_deref().and_then(parse_duration_opt) before the
TempoError::new(QueryError::NotImplemented(...)) short-circuit, and if
parse_duration_opt indicates a malformed value return a BadRequest-style
TempoError (400) with a clear message about invalid duration format instead of
proceeding to the 501 path (ensure you reference params.min_duration,
params.max_duration, parse_duration_opt, and the NotImplemented short-circuit
when making the change).

In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 52-57: The helper parse_epoch_seconds currently returns
DateTime::UNIX_EPOCH on invalid/ambiguous inputs which lets malformed start/end
values silently become 1970; change parse_epoch_seconds to return a
TempoResult<DateTime<Utc>> (or Result<DateTime<Utc>, TempoError>) that errors on
out-of-range or non-single timestamp (use
Utc.timestamp_opt(...).single().ok_or(...)), update get_trace to call the new
fallible parse and propagate the error to produce an HTTP 400 (similar to
parse_q_to_predicate handling), and ensure validate_query_window only runs after
successful parsing so bad epoch inputs surface as client errors instead of being
clamped.

In `@crates/icegate-query/src/tempo/mod.rs`:
- Around line 6-15: The new pub mod declarations expose many undocumented public
items and violate missing_docs; either add docs to each public symbol or make
the modules private again—specifically either (A) add /// doc comments for all
public types/functions in error.rs (e.g., TempoError), formatters.rs (e.g.,
spansets_to_search_response), models.rs (all ~24 public items), trace_by_id.rs
(e.g., FetchResult, default_window), and confirm executor.rs's execute remains
documented, or (B) revert the module declarations back to private (change pub
mod error/executor/formatters/models/trace_by_id to mod ...) so their public
surfaces are not exported from the crate.

In `@crates/icegate-query/src/traceql/antlr/mod.rs`:
- Line 83: The tracing span currently declares an empty field because of
fields(query); update the instrument attribute on the function (the
#[tracing::instrument(...)] above the method in mod.rs) to capture the parameter
value, e.g. change fields(query) to fields(query = %query) to record the query
string via Display (or use fields(query = ?query) for Debug), keeping skip(self)
as-is.

In `@crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4`:
- Around line 121-122: Lexer rules FLOAT and INT currently include an optional
leading '-' which causes the lexer to consume minus signs (e.g., "5-3" becomes
INT(5) INT(-3)) and will break arithmetic parsing; remove the optional '-' from
the FLOAT and INT lexer rules so minus becomes a separate MINUS token that the
parser can handle (i.e., change FLOAT to DIGIT+ '.' DIGIT+ and INT to DIGIT+),
and ensure the grammar uses the existing MINUS terminal for unary negation where
needed or document the intended behavior if you intentionally want signed
numeric tokens.
- Line 19: The lexer rule NEQ_RE ('!~') currently shadows NOT_SIBLING because
ANTLR picks the first matching token, so update the grammar (TraceQLLexer.g4 /
parser spansetOp) to consistently accept the '!~' operator in spanset contexts
by either renaming the lexeme used for NOT_SIBLING (e.g., to a distinct symbol)
or by adding NEQ_RE into the parser's spansetOp alternatives and disambiguating
at parser level; remove the string-based fallback in visitors.rs that checks
ctx.get_text() and instead rely on the proper token from the parser; finally,
add a parser/integration test exercising spanset not-sibling (for example "{
kind = server } !~ { kind = client }") to prevent regressions.

In `@crates/icegate-query/src/traceql/antlr/TraceQLParser.g4`:
- Around line 57-66: The current spansetExpr rule treats all operators equally;
split it into precedence-tiered rules so relationship operators bind tighter
than boolean ones: introduce (for example) spansetExpr -> spansetOr ->
spansetAnd -> spansetRel -> spansetPrimary, move boolean tokens (AND, OR) into
the lower-precedence rules (spansetOr/spansetAnd) and move relationship tokens
(DESC, GT, ANC, LT, SIBLING and their NOT_* variants) into a new spansetRelOp
used by spansetRel; update/remove the combined spansetOp rule and ensure
spansetParen and spanSelector remain as the primary/leaf cases (mirroring the
spanFilter precedence pattern).

In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 112-120: time_bucket_expr currently computes step_micros = step_ns
/ 1_000 which can be zero for sub-microsecond steps and produce a
division-by-zero expression; change time_bucket_expr to validate or clamp the
computed step_micros to a minimum of 1 (or return a clear Err) before
constructing the bucket expression (referencing function time_bucket_expr,
COL_TIMESTAMP and BUCKET_ALIAS), and add a runtime assertion or schema check
that the incoming timestamp column is Timestamp(Microsecond, _) so the
cast_to(&DataType::Int64) produces correct bucket boundaries; ensure the error
message clearly references the invalid step value if you choose to return an
error.

In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 135-138: In field_arg_expr, replace the
not_implemented("aggregation requires argument") error with the crate's
validation/invalid-input error variant so missing aggregation args return a 400
client error; specifically, change the Err returned in field_arg_expr (the
function handling Option<&FieldRef>) to something like
QueryError::Validation(...) or the appropriate Validation/InvalidInput
constructor used in this crate, preserving the existing message text
("aggregation requires argument") so the HTTP layer maps it to a client error
instead of 501.
- Around line 84-94: The grouping function group_key_for_attribute currently
returns lit(NULL) for Scope::Event and Scope::Link which collapses all rows;
change group_key_for_attribute to return Result<Expr, QueryError> and on
Scope::Event or Scope::Link return Err(QueryError::NotImplemented("grouping by
event/link attributes not implemented")); update all callers (notably
field_ref_to_group_key and field_arg_expr) to propagate the Result (use ? or
map/and_then) so plan-time errors are surfaced instead of emitting a NULL
literal, and adjust any downstream code to handle the propagated QueryError.

In `@crates/icegate-query/src/traceql/datafusion/selectors.rs`:
- Around line 222-227: The match arm handling
(FieldRef::Intrinsic(IntrinsicField::Duration), LiteralValue::Duration(ns))
currently does integer-truncate with *ns / 1_000 and silently turns
sub-microsecond thresholds to 0; change it to preserve intent by rounding up
instead (e.g., compute micro = (ns + 999) / 1_000) or alternatively return a
parse/validation error for sub-microsecond literals — update the arm in
selectors.rs that matches FieldRef::Intrinsic(IntrinsicField::Duration) /
LiteralValue::Duration to use the chosen behavior and adjust the surrounding
comment to explain rounding-up or the rejection policy.
- Around line 255-266: The regex_match helper currently casts both operands
using df_dummy_schema which triggers Expr::get_type against an empty DFSchema
and breaks plan-time column refs; remove the unnecessary cast_to calls in
regex_match (stop using df_dummy_schema), instead pass the raw lhs and rhs into
regexp_like().call(vec![lhs, rhs]) and keep the existing negation handling
(m.not() if negated), and delete or stop referencing df_dummy_schema; also add
planner-level integration tests exercising expressions like `{ name =~ "GET.*"
}` and `{ resource.service.name =~ ".*end" }` to ensure end-to-end
planning/coercion works.

In `@crates/icegate-query/src/traceql/iceberg_predicate.rs`:
- Around line 290-307: The catch-all string arm in datum_for lets string
literals fall through onto INT columns (Status/Kind); add an explicit match arm
for (FieldRef::Intrinsic(IntrinsicField::Status) |
FieldRef::Intrinsic(IntrinsicField::Kind), LiteralValue::String(_)) that returns
None (i.e., drops the predicate) so string RHS against those INT enum columns
does not produce a string Datum; update datum_for accordingly and add the
suggested regression test (e.g., using translate(...) and asserting
Predicate::AlwaysTrue) to lock the behavior.

In `@crates/icegate-query/src/traceql/metric.rs`:
- Around line 91-114: AggregationOp::Quantile currently maps to
"quantile_over_time" in as_str() and the docs expect two args (a field and
percentile), but PipelineStage::Aggregate / AggregateFilter only provide arg:
Option<FieldRef> so the percentile is lost; fix by either removing the Quantile
variant from AggregationOp (and from as_str()) if you want to defer support, or
extend the aggregate AST to carry a second optional literal argument (e.g.,
change AggregateFilter / the Aggregate payload to include arg: Option<FieldRef>
plus percentile: Option<Literal> or similar) and update
AggregationOp::requires_arg/as_str handling and any planner/visitor code that
constructs/reads Aggregate to consume the new percentile field.

---

Nitpick comments:
In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 283-293: The current plan_files function collects the entire
stream into a Vec via stream.try_collect(), which can materialize
tens-of-thousands of FileScanTask items and spike memory; change plan_files to
return a streaming/iterator-based pipeline instead of a Vec by processing the
stream from scan.plan_files() as it yields items (e.g., replace try_collect()
usage with a streamed approach using try_buffered_unordered or similar
combinators) so callers can handle FileScanTask items incrementally; locate the
function plan_files (takes &Table and &Predicate), the call to
table.scan().with_filter(...).build() and the variable stream =
scan.plan_files(), and modify the downstream usage to avoid creating an
intermediate Vec<FileScanTask>.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 326-341: The code in parquet_reader.rs treats failures converting
dictionary offsets (calls to u64::try_from on dict_offset or data_offset) by
incrementing no_dict_no_stats and continuing, which skips the later row-group
stats fallback; instead, when conversion fails for dict_start or dict_end, do
not early-continue—increment no_dict_no_stats but allow control to fall through
to the existing stats-path (the same path taken when dict_end <= dict_start).
Update the block that handles col_chunk.dictionary_page_offset() so failures
converting to u64 for dict_start or dict_end do not execute continue; keep
updates to no_dict_no_stats and then let the function proceed to the stats
fallback (affecting symbols: col_chunk.dictionary_page_offset(), dict_start,
dict_end, no_dict_no_stats, ranges, selected_rgs, rg_idx).

In `@crates/icegate-query/src/engine/metadata_scan/values.rs`:
- Around line 105-118: collect_indexed_int_values_via_dict is missing the same
system-reserved column guard used by collect_indexed_values_via_dict; add a call
to is_system_reserved_value_column(column_name) at the start of
collect_indexed_int_values_via_dict and return Ok(()) when it returns true so
reserved columns are skipped consistently, then proceed to find leaf_idx and
call parquet_reader::read_column_int_dictionaries as before (keep function
signature and reader/metadata/predicate/out usage unchanged).

In `@crates/icegate-query/src/loki/mod.rs`:
- Around line 43-48: LOGS_VALUES_METADATA_CONFIG duplicates parts of
LOGS_METADATA_CONFIG (it inlines excluded_map_keys: &[] and re-states
map_column), which can drift; change LOGS_VALUES_METADATA_CONFIG to build from
LOGS_METADATA_CONFIG by copying its label_aliases and map_column (or
cloning/using a struct update pattern) and only override indexed_columns and
excluded_map_keys to &[] so future updates to LOGS_METADATA_CONFIG.label_aliases
or map_column are reflected automatically; specifically adjust the
MetadataScanConfig construction for LOGS_VALUES_METADATA_CONFIG to derive fields
from LOGS_METADATA_CONFIG instead of repeating them.

In `@crates/icegate-query/src/tempo/formatters.rs`:
- Around line 172-175: Introduce a small helper micros_to_nanos_str(i64) ->
String that converts microseconds to nanoseconds using saturating_mul(1_000) and
returns the stringified result, then replace the direct `(start_micros *
1_000).to_string()` and `(e * 1_000).to_string()` call sites in the span
insertion code and the matched-span builder with this helper; also harden the
proto-side conversion function micros_to_nanos_u64 to use saturating_mul for the
final multiplication/path that converts i64 micros to u64 nanos so all
conversion sites share the same overflow-safe behavior.
- Around line 235-247: The helpers string_col_opt and i32_col_opt currently call
as_string::<i32>() and as_primitive::<Int32Type>() which panic on type mismatch;
change both to the safe downcast pattern used by ts_col_opt/list_col_opt: after
finding the column index via batch.schema().column_with_name(name) call
batch.column(idx).as_any().downcast_ref::<StringArray>() for string_col_opt and
.downcast_ref::<Int32Array>() for i32_col_opt, and return that Option so a wrong
Arrow type yields None instead of a panic.

In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 168-230: Replace the five occurrences of scope_filter.map_or(true,
|s| s == Scope::X) in list_tags_v2 with the clearer Option::is_none_or form:
call scope_filter.is_none_or(|s| s == Scope::Resource/Span/Intrinsic/Event/Link)
respectively (affects the checks around Scope::Resource, Scope::Span,
Scope::Intrinsic, Scope::Event, Scope::Link in metadata.rs); ensure you import
or use the Option::is_none_or method available on Option and preserve existing
logic and error handling in the surrounding code (e.g., the blocks that
construct ScopeGroup and call metadata_scan::scan_labels).

In `@crates/icegate-query/src/tempo/models.rs`:
- Around line 88-92: Remove the unnecessary #[allow(dead_code)] on the struct
field max_stale_values: this field is deserialized by Serde so the dead_code
suppression should be dropped to respect the crate-wide deny lint; locate the
field definition named max_stale_values (with #[serde(rename = "maxStaleValues",
default)]) and delete the #[allow(dead_code)] attribute, or alternatively rename
the field to a private unused-prefixed name like _max_stale_values if you intend
to signal it is intentionally ignored, or remove the field entirely if not
needed.

In `@crates/icegate-query/src/tempo/validation.rs`:
- Around line 169-173: The check redundantly tests
trimmed.eq_ignore_ascii_case("tenant_id") after computing last_segment from
trimmed; remove the redundant clause so the reserved-name validation only checks
last_segment.eq_ignore_ascii_case("tenant_id"), i.e., update the condition that
returns QueryError::Validation("tag name 'tenant_id' is reserved") to rely
solely on last_segment (remove the || trimmed.eq_ignore_ascii_case("tenant_id")
part) in the function that performs Tempo tag validation (uses trimmed and
last_segment).

In `@crates/icegate-query/src/traceql/antlr/visitors.rs`:
- Around line 419-426: pop_two_filters currently only reads filters[0] and
filters[1], silently dropping any extra operands if the grammar ever produces >2
children; add a debug-only check to catch this regression by inserting something
like debug_assert_eq!(filters.len(), 2) near the start of pop_two_filters (after
the existing len < 2 error check), and keep using visit_span_filter for
filters[0] and filters[1]; alternatively, if you want to support n-ary operands,
change the logic to visit and fold the entire filters slice into a
left-associative combined SpanFilter instead of indexing only [0] and [1].

In `@crates/icegate-query/src/traceql/common.rs`:
- Around line 255-259: The test kind_otlp_code_matches_otlp_spec currently only
checks Server and Client; extend it to assert all KindValue variants map to
their OTLP integer codes: assert KindValue::Unspecified.otlp_code() == 0,
KindValue::Internal.otlp_code() == 1, KindValue::Server.otlp_code() == 2,
KindValue::Client.otlp_code() == 3, KindValue::Producer.otlp_code() == 4, and
KindValue::Consumer.otlp_code() == 5 so any off-by-one or missing mapping is
caught (update the test function kind_otlp_code_matches_otlp_spec to include
these four additional assertions).
- Around line 166-220: StatusValue::otlp_code and KindValue::otlp_code manually
duplicate numeric mappings; add #[repr(i32)] to the StatusValue and KindValue
enum definitions and replace the match bodies in their otlp_code methods with a
simple cast (return *self as i32) so the numeric encoding is derived from the
enum layout; update the otlp_code functions (still const fn) to perform the cast
and remove the match arms for StatusValue::otlp_code and KindValue::otlp_code.

In `@crates/icegate-query/src/traceql/datafusion/mod.rs`:
- Around line 11-16: The current pub(crate) re-exports add redundant prefixes
(pipeline_ and selectors_) that duplicate module path info; update the
re-exports in mod.rs to use the original symbol names instead of aliased ones so
callers can rely on the module path for disambiguation — e.g., stop exporting
pipeline::field_ref_to_group_key as pipeline_field_ref_to_group_key and
selectors::intrinsic_column/resource_attribute_lhs/span_attribute_lhs as
selectors_intrinsic_column/selectors_resource_attribute_lhs/selectors_span_attribute_lhs,
and instead re-export them directly (keep DataFusionPlanner export as-is).

In `@crates/icegate-query/src/traceql/datafusion/planner_tests.rs`:
- Around line 196-210: The test QueryContext in make_query_ctx currently
hard-codes start/end to 2026/2099 which can drop fixtures on systems with
different clocks; update make_query_ctx to compute start as Utc::now() -
Duration::days(1) and end as Utc::now() + Duration::days(1) (using
chrono::Duration) so the QueryContext (tenant_id, start, end, limit,
spans_per_spanset, etc.) uses a relative wall-clock window instead of fixed
calendar dates; ensure you import or qualify Duration and keep other fields
(spans_per_spanset, max_grid_points) unchanged.
- Around line 270-353: Add tests exercising the regex operators =~ and !~ to
cover attribute routing and catch the cast_to(empty DFSchema) bug in
selectors.rs::regex_match: create new async tests in planner_tests.rs similar to
resource_service_name_quoted_eq_returns_matching_rows that call
fixture_session(), make_query_ctx("t1") and run_and_count(ctx, qctx, "{ name =~
\"GET.*\" }") and "{ resource.service.name =~ \"fro.*\" }" (also consider
unquoted RHS like "{ name =~ GET.* }"), asserting the expected row counts;
ensure you include at least one attribute-scope, one resource-scope, and a
negative/!~ case so regex routing/NULL semantics are exercised and will trigger
the selector code path that previously hit cast_to(empty DFSchema).

In `@crates/icegate-query/src/traceql/Makefile`:
- Around line 3-16: Make the Makefile targets robust by declaring the phony
targets and using idiomatic rm flags: add a .PHONY line listing install, gen,
and clean, and replace the "rm ... || true" patterns in the clean target with
proper rm flags (e.g., rm -f for files and rm -rf for directories) so failures
are surfaced and removals are idempotent; update the clean rule operating on
patterns like antlr/traceql*.rs, antlr/*.tokens, antlr/*.interp, and antlr/gen
accordingly and keep the existing behavior for non-fatal removals.

In `@crates/icegate-query/src/traceql/spanset.rs`:
- Around line 62-91: The methods as_str and is_hierarchy take &self but
SpansetOp is a Copy enum, so change their signatures to take self by value (pub
const fn as_str(self) -> &'static str and pub const fn is_hierarchy(self) ->
bool) to satisfy clippy::trivially_copy_pass_by_ref; update any internal pattern
matches to match on self and keep the existing match arms and logic unchanged.

In `@crates/icegate-query/tests/tempo/tags.rs`:
- Around line 1-3: Update the module doc comment at the top of the tags.rs test
module to also mention the `/api/v2/search/tag/{name}/values` endpoint;
explicitly note that the file covers the typed v2 values endpoint (with
`inspectedBytes` represented as a string and the `status` enum using
keyword-typed variants) so readers know v2-values behavior is tested alongside
`/api/search/tags`, `/api/v2/search/tags`, and `/api/search/tag/{name}/values`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f6e91aa1-8181-4a8d-802f-a2923fcd90df

📥 Commits

Reviewing files that changed from the base of the PR and between b63b818 and fc2c1fe.

⛔ Files ignored due to path filters (4)

Cargo.lock is excluded by !**/*.lock
crates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.interp is excluded by !**/gen/**
crates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.java is excluded by !**/gen/**
crates/icegate-query/src/traceql/antlr/gen/TraceQLLexer.tokens is excluded by !**/gen/**

📒 Files selected for processing (82)

Cargo.toml
config/docker/grafana/provisioning/datasources/datasources.yaml
config/docker/otel-collector/config.yaml
config/kustomize/base/otel-collector/configmap.yaml
config/kustomize/base/values-grafana.yaml
config/kustomize/overlays/skaffold/values-icegate.yaml
crates/icegate-common/Cargo.toml
crates/icegate-common/src/schema.rs
crates/icegate-common/src/tracing.rs
crates/icegate-ingest/src/cli/commands/run.rs
crates/icegate-ingest/src/shift/iceberg_storage.rs
crates/icegate-ingest/src/shift/mod.rs
crates/icegate-ingest/src/transform.rs
crates/icegate-query/Cargo.toml
crates/icegate-query/src/engine/log_metadata_scan/labels.rs
crates/icegate-query/src/engine/log_metadata_scan/mod.rs
crates/icegate-query/src/engine/log_metadata_scan/predicate.rs
crates/icegate-query/src/engine/log_metadata_scan/values.rs
crates/icegate-query/src/engine/metadata_scan/error.rs
crates/icegate-query/src/engine/metadata_scan/labels.rs
crates/icegate-query/src/engine/metadata_scan/mod.rs
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs
crates/icegate-query/src/engine/metadata_scan/predicate.rs
crates/icegate-query/src/engine/metadata_scan/values.rs
crates/icegate-query/src/engine/mod.rs
crates/icegate-query/src/engine/provider/mod.rs
crates/icegate-query/src/engine/provider/schema.rs
crates/icegate-query/src/engine/provider/table.rs
crates/icegate-query/src/lib.rs
crates/icegate-query/src/loki/executor.rs
crates/icegate-query/src/loki/mod.rs
crates/icegate-query/src/loki/predicate.rs
crates/icegate-query/src/tempo/error.rs
crates/icegate-query/src/tempo/executor.rs
crates/icegate-query/src/tempo/formatters.rs
crates/icegate-query/src/tempo/handlers.rs
crates/icegate-query/src/tempo/metadata.rs
crates/icegate-query/src/tempo/mod.rs
crates/icegate-query/src/tempo/models.rs
crates/icegate-query/src/tempo/routes.rs
crates/icegate-query/src/tempo/server.rs
crates/icegate-query/src/tempo/trace_by_id.rs
crates/icegate-query/src/tempo/validation.rs
crates/icegate-query/src/traceql/Makefile
crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4
crates/icegate-query/src/traceql/antlr/TraceQLLexer.interp
crates/icegate-query/src/traceql/antlr/TraceQLLexer.tokens
crates/icegate-query/src/traceql/antlr/TraceQLParser.g4
crates/icegate-query/src/traceql/antlr/TraceQLParser.interp
crates/icegate-query/src/traceql/antlr/TraceQLParser.tokens
crates/icegate-query/src/traceql/antlr/mod.rs
crates/icegate-query/src/traceql/antlr/tests.rs
crates/icegate-query/src/traceql/antlr/traceqllexer.rs
crates/icegate-query/src/traceql/antlr/traceqlparser.rs
crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs
crates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rs
crates/icegate-query/src/traceql/antlr/traceqlparserlistener.rs
crates/icegate-query/src/traceql/antlr/traceqlparservisitor.rs
crates/icegate-query/src/traceql/antlr/visitors.rs
crates/icegate-query/src/traceql/common.rs
crates/icegate-query/src/traceql/datafusion/metrics.rs
crates/icegate-query/src/traceql/datafusion/mod.rs
crates/icegate-query/src/traceql/datafusion/pipeline.rs
crates/icegate-query/src/traceql/datafusion/planner.rs
crates/icegate-query/src/traceql/datafusion/planner_tests.rs
crates/icegate-query/src/traceql/datafusion/registry.rs
crates/icegate-query/src/traceql/datafusion/selectors.rs
crates/icegate-query/src/traceql/duration.rs
crates/icegate-query/src/traceql/expr.rs
crates/icegate-query/src/traceql/iceberg_predicate.rs
crates/icegate-query/src/traceql/metric.rs
crates/icegate-query/src/traceql/mod.rs
crates/icegate-query/src/traceql/parser.rs
crates/icegate-query/src/traceql/planner.rs
crates/icegate-query/src/traceql/spanset.rs
crates/icegate-query/tests/tempo/echo.rs
crates/icegate-query/tests/tempo/harness.rs
crates/icegate-query/tests/tempo/mod.rs
crates/icegate-query/tests/tempo/tags.rs
crates/icegate-query/tests/tempo_tests.rs
crates/icegate-queue/src/writer.rs
rustfmt.toml

💤 Files with no reviewable changes (4)

crates/icegate-query/src/engine/log_metadata_scan/labels.rs
crates/icegate-query/src/engine/log_metadata_scan/values.rs
crates/icegate-query/src/engine/log_metadata_scan/predicate.rs
crates/icegate-query/src/engine/log_metadata_scan/mod.rs

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/icegate-ingest/src/transform.rs (1)

249-253: ⚠️ Potential issue | 🟡 Minor

Stale doc references the pre-split merged attributes map.

The function-level doc still says it produces "the merged attributes map (resource + scope + span attributes plus indexed-column duplicates)", but the implementation now emits two separate columns (resource_attributes and span_attributes) and the indexed-column mirror keys have been removed. Please update so future readers don't expect the old shape.

📝 Suggested doc update

-/// Produces every column in the spans schema: top-level fields, the merged
-/// attributes map (resource + scope + span attributes plus indexed-column
-/// duplicates), and the nested `events` / `links` `List<Struct>` arrays.
+/// Produces every column in the spans schema: top-level fields, the two
+/// attribute maps (`resource_attributes` for resource-level OTLP attrs, and
+/// `span_attributes` for the deduplicated merge of scope + span attrs with
+/// span winning on collision), and the nested `events` / `links`
+/// `List<Struct>` arrays.
 /// Links with invalid `trace_id` / `span_id` are dropped from the list and
 /// counted into the parent span's `dropped_links_count`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-ingest/src/transform.rs` around lines 249 - 253, Update the
stale function-level docstring that currently claims a single "merged attributes
map" to reflect the current output shape: mention that the implementation emits
separate columns named resource_attributes and span_attributes (instead of a
merged attributes map) and that the indexed-column mirror keys have been
removed; keep the rest of the description about top-level fields, nested
events/links arrays, and dropped links being counted in dropped_links_count
intact so readers of the function (the doc attached to the transform function in
transform.rs) see the accurate schema shape.

🧹 Nitpick comments (10)

crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs (1)
12-21: Generated doc-comment templates reference a placeholder rule (#s) for every method.

All enter_* / exit_* doc comments in this file reference \{@link TraceQLParserBaseParser#s} regardless of the actual grammar rule (e.g., enter_pipelineexpr should reference pipelineExpr, enter_filterparen should reference FilterParen, etc.). This appears to be an artifact of the ANTLR Rust target's template — purely cosmetic, no functional impact, but the docs are not informative as-is.

Since this is generated code, the practical fix is either (a) leave as-is and rely on the rule name in the method signature, or (b) post-process the generated file in make gen to substitute rule names. No action required if regeneration is manual and rare.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs` around
lines 12 - 21, The generated doc-comments all reference a placeholder rule
`\{`@link` TraceQLParserBaseParser#s}`; fix by post-processing the generated file
to substitute the correct rule name for each enter_/exit_ method: for each
method like enter_root/exit_root (or enter_pipelineexpr/enter_filterparen),
parse the method signature's context type (e.g., RootContext<'input>,
PipelineExprContext, FilterParenContext), strip the trailing "Context" and
lifetimes, derive the rule identifier, and replace the `#s` placeholder in that
method's doc-comment with the derived rule name (formatted to match the parser
link style); implement this substitution step in the `make gen` post-processing
step so generated files (e.g., traceqlparserbaselistener.rs and methods
enter_root, exit_root, enter_pipelineexpr, enter_filterparen) get correct links.
crates/icegate-common/src/testing/minio.rs (1)
16-19: Consider sharing the credential constants with create_s3_bucket.

The doc comments here state these constants are "used for both the container and downstream S3 SDK calls", but per the relevant snippet from crates/icegate-common/src/testing/s3.rs, create_s3_bucket constructs its Credentials with hardcoded "minioadmin" literals rather than importing these constants. They happen to agree today, but if MINIO_ROOT_USER/MINIO_ROOT_PASSWORD are ever rotated, bucket creation will silently fail authentication. Either expose these as pub(crate) (or pub) and use them from s3.rs, or tone down the docstring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-common/src/testing/minio.rs` around lines 16 - 19, The
constants MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are declared for MinIO
credentials but create_s3_bucket constructs Credentials with hardcoded
"minioadmin" literals, which will break if the constants change; modify the code
so create_s3_bucket uses the shared constants instead of string literals (export
the constants as pub(crate) or pub as appropriate and replace the literal usages
in create_s3_bucket with MINIO_ROOT_USER and MINIO_ROOT_PASSWORD to ensure a
single source of truth for credentials).
crates/icegate-query/src/traceql/metric.rs (1)
50-76: Quantile/percentile invariant is enforced by convention only.

The shape op: AggregationOp, arg: Option<FieldRef>, percentile: Option<f64> is flexible but doesn't prevent two illegal AST states by construction:

op = Quantile, percentile = None (planner must produce a 400)

op = Avg, percentile = Some(0.95) (silently ignored per the docstring)

A small constructor (or a fallible try_new) on PipelineStage that asserts these invariants would let the parser/visitor surface validation errors at AST-construction time and let the planner assume a well-formed AST. Not blocking — just noting since this was the source of the prior round-trip concern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/metric.rs` around lines 50 - 76, The
Aggregate and AggregateFilter variants allow invalid combinations (e.g.,
AggregationOp::Quantile with percentile None, or non-Quantile ops with
Some(percentile)); add a fallible constructor (e.g.,
PipelineStage::Aggregate::try_new or PipelineStage::try_new that validates
Aggregate/AggregateFilter) or implement TryFrom<(AggregationOp,
Option<FieldRef>, Option<f64>, ...)> for the enum so the AST cannot be
constructed with illegal states; inside the constructor/checker assert that if
op == AggregationOp::Quantile then percentile.is_some() and for other ops
percentile.is_none(), returning a clear error type on violation so callers
(parser/visitor) surface validation errors early and the planner can assume a
well-formed AST.
crates/icegate-ingest/src/transform.rs (1)
522-544: Resource-attribute path still admits duplicate keys.

The new merge_dotted_attributes cleanly deduplicates scope+span into span_attributes, but resource_attrs is still pushed via add_flattened_attributes_dotted, which appends every entry verbatim. OTLP proto does not forbid duplicate Resource.attributes keys; if a producer sends them, the persisted MAP will contain duplicates and downstream readers will again disagree on which one "wins" (the same problem the merge fixes for span attrs).

For symmetry and to avoid surprising MAP-reader behavior, consider running resource attrs through a single-list BTreeMap dedup as well.
♻️ Suggested approach
-                add_flattened_attributes_dotted(resource_attrs, &mut resource_attrs_builder);
+                let merged_resource_attrs = merge_dotted_attributes(&[], resource_attrs);
+                for (key, value) in &merged_resource_attrs {
+                    resource_attrs_builder.keys().append_value(key);
+                    resource_attrs_builder.values().append_value(value);
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-ingest/src/transform.rs` around lines 522 - 544, The resource
attributes are still appended verbatim via add_flattened_attributes_dotted which
allows duplicate keys; instead, deduplicate merged dotted resource attributes
before appending to resource_attrs_builder similar to how scope+span were
handled: collect resource_attrs (the same variable passed to
add_flattened_attributes_dotted) into a single-list ordered map (e.g., BTreeMap
or the same merge_dotted_attributes helper), then iterate that deduped map and
append keys/values into resource_attrs_builder (like the loop used for
merged_span_attrs) rather than calling add_flattened_attributes_dotted directly.
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs (1)
419-443: MAX_STATS_SPREAD = 1024 will silently drop wider INT columns from discovery.

extend_from_int_stats returns false when the row-group [min, max] exceeds 1024, which then increments no_dict_no_stats and emits a generic warning. For the documented use (status_code, kind enums) this is fine, but if this reader is reused for any wider INT column in the future, callers will see no values for those row groups even though min/max are available.

Consider either: (a) narrowing the function to a name that signals "small-enum only", or (b) when the spread is exceeded, fall back to inserting just min and max (a 2-point over-approximation) so the result is never empty. Not blocking for v1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs` around lines
419 - 443, The helper extend_from_int_stats currently returns false when
(max-min) > MAX_STATS_SPREAD which causes callers (and counters like
no_dict_no_stats) to treat the row-group as having no discoverable values;
change extend_from_int_stats so that when spread > MAX_STATS_SPREAD it falls
back to inserting just min and max into out (a 2-point over-approximation) and
returns true instead of false; keep the MAX_STATS_SPREAD constant for the
small-enum fast-path but ensure callers see at least min/max for wide ranges so
values aren’t silently dropped.
crates/icegate-query/src/traceql/datafusion/metrics.rs (1)
60-79: Duplicate group keys when prelude by(...) overlaps with terminal group_by.

If the user writes {} | by(.svc) | rate() by (.svc) (or the parser allows both), .svc will be appended twice to group_keys. DataFusion will likely tolerate it but the optimizer's behavior on duplicate group exprs is implementation-defined and the result schema gets two columns with the same logical name.

Consider deduplicating group_keys (e.g., with a HashSet<String> keyed on the rendered expression) before passing to df.aggregate(...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/datafusion/metrics.rs` around lines 60 - 79,
The code currently appends grouping expressions into group_keys (built from
time_bucket_expr, prelude stages via PipelineStage::By and terminal group_by)
which can produce duplicates (e.g., same field from prelude and terminal
group_by); change the construction so you deduplicate group_keys before using
them in df.aggregate: after pushing time_bucket_expr and all keys from
PipelineStage::By and group_by, iterate the vector and build a new Vec<Expr>
preserving order while filtering duplicates using a HashSet<String> keyed by a
stable rendering/identifier of each expression (for example the string produced
by the same renderer used elsewhere, or Expr::to_string if appropriate); keep
references to the existing functions/values (group_keys,
super::field_ref_to_group_key, PipelineStage::By, group_by, time_bucket_expr) so
the change is limited to de-duplicating the group_keys vector prior to
aggregation.
crates/icegate-query/src/traceql/datafusion/selectors.rs (1)
267-275: Strengthen the duration-normalization assertion.

format!("{e:?}").contains("1000000") is a substring match — it would also pass if the encoded scalar were 10_000_000 or 100_000_000. Pinning the exact decoded scalar (or at minimum asserting absence of the original 1000000000-ns value) makes the rounding contract harder to silently regress.
♻️ Sketch
-        // 1s = 1e9 ns = 1e6 micros — encoded as Int64(1_000_000).
-        assert!(format!("{e:?}").contains("1000000"));
+        // 1s = 1e9 ns = 1e6 micros — encoded as Int64(1_000_000).
+        let dbg = format!("{e:?}");
+        assert!(dbg.contains("1000000") && !dbg.contains("1000000000"), "unexpected scalar: {dbg}");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/traceql/datafusion/selectors.rs` around lines 267 -
275, The test duration_literal_is_normalized_to_micros currently uses a
substring match which can false-positive; update the assertion on the result `e`
(produced by literal_to_scalar called with
FieldRef::Intrinsic(IntrinsicField::Duration)) to be stricter: either
pattern-match/inspect the produced scalar to assert it equals the expected micro
value (e.g., Int64(1_000_000)) or at minimum assert format!("{e:?}") both
contains "1000000" and does NOT contain the original nanosecond string
"1000000000" so the normalization cannot silently regress.
crates/icegate-query/src/tempo/metadata.rs (1)
366-368: Use take(limit) for consistency with v2 and to skip allocating the full vec.

list_tag_values_v2 (line 403) already uses into_iter().take(limit).collect(). Doing the same here avoids materializing the entire Vec<String> when the BTreeSet is much larger than limit. Since BTreeSet's iterator is sorted, the truncation result is identical.
♻️ Proposed change
-    let mut out: Vec<String> = values.into_iter().collect();
-    out.truncate(limit);
-    Ok(out)
+    Ok(values.into_iter().take(limit).collect())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/metadata.rs` around lines 366 - 368, The code
currently collects the entire BTreeSet into a Vec then truncates it; change the
collection to streamline and avoid allocating the full vector by using
values.into_iter().take(limit).collect() (mirror the approach used in
list_tag_values_v2) inside the same function (list_tag_values or the block
shown) so the iterator yields only up to limit elements before collecting and
returning Ok(out).
crates/icegate-query/src/tempo/handlers.rs (1)
95-115: Consider sharing the q → AST parsing helper with executor::execute.

The spawn_blocking + make_query_error_send + join_error pattern here mirrors executor::execute (per the relevant snippet from tempo/executor.rs:46-150). If a future change modifies how non-Send ANTLRError is scrubbed or how the blocking parse is instrumented, both call sites need to stay in sync. Extracting a parse_query_blocking(q: String) -> TempoResult<Expr> (or similar) into a shared module would keep them aligned.

Not blocking — flag if you'd rather defer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/tempo/handlers.rs` around lines 95 - 115, The
parsing logic in parse_q_to_predicate duplicates the spawn_blocking +
make_query_error_send + join_error pattern used in executor::execute; extract
that into a shared helper like parse_query_blocking(q: String) ->
TempoResult<Expr> (placed in a common module) that performs the spawn_blocking
call, uses span.in_scope, applies
AntlrParser::new().parse(&q_owned).map_err(super::executor::make_query_error_send),
awaits and maps join errors with super::executor::join_error, and returns the
parsed Expr; then replace the inline spawn_blocking flow in parse_q_to_predicate
and in executor::execute to call this new helper so both sites share the same
error-scrubbing and instrumentation logic.
crates/icegate-query/src/engine/metadata_scan/mod.rs (1)
263-279: Optional: extract a per-file helper to mirror the other two scan functions.

scan_labels and scan_label_values both delegate per-file work to dedicated helpers (scan_labels_for_file, scan_label_values_for_file). scan_label_int_values inlines the equivalent work in the closure. Extracting a scan_label_int_values_for_file helper would keep the three entry points symmetrical and make the int-values path independently testable / #[tracing::instrument]-able like its siblings on line 317 and line 339.
♻️ Proposed refactor
-        .map_ok(|task| {
-            let file_io = file_io.clone();
-            let predicate = predicate.clone();
-            let column = column.clone();
-            async move {
-                let (mut reader, metadata) = parquet_reader::open_file_direct(&file_io, &task).await?;
-                let mut out: BTreeSet<i32> = BTreeSet::new();
-                values::collect_indexed_int_values_via_dict(&mut reader, &metadata, &predicate, &column, &mut out)
-                    .await?;
-                Ok::<BTreeSet<i32>, MetadataScanError>(out)
-            }
-        })
+        .map_ok(|task| {
+            let file_io = file_io.clone();
+            let predicate = predicate.clone();
+            let column = column.clone();
+            async move { scan_label_int_values_for_file(&file_io, task, &predicate, &column).await }
+        })
Add the helper alongside the existing per-file functions:
#[tracing::instrument(skip_all, fields(file = %task.data_file_path, column = %column))]
async fn scan_label_int_values_for_file(
    file_io: &iceberg::io::FileIO,
    task: FileScanTask,
    predicate: &Predicate,
    column: &str,
) -> Result<BTreeSet<i32>, MetadataScanError> {
    let (mut reader, metadata) = parquet_reader::open_file_direct(file_io, &task).await?;
    let mut out: BTreeSet<i32> = BTreeSet::new();
    values::collect_indexed_int_values_via_dict(&mut reader, &metadata, predicate, column, &mut out).await?;
    Ok(out)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 263 - 279,
Extract the per-file logic inside the closure in scan_label_int_values into a
new async helper function scan_label_int_values_for_file(file_io:
&iceberg::io::FileIO, task: FileScanTask, predicate: &Predicate, column: &str)
-> Result<BTreeSet<i32>, MetadataScanError> that mirrors
scan_labels_for_file/scan_label_values_for_file: move the
parquet_reader::open_file_direct call, the BTreeSet creation, and the
values::collect_indexed_int_values_via_dict call into that helper, annotate it
with #[tracing::instrument(skip_all, fields(file = %task.data_file_path, column
= %column))], and then change the plan_stream.map_ok closure to call the helper
(e.g. let file_io = file_io.clone(); let predicate = predicate.clone(); let
column = column.clone(); async move { scan_label_int_values_for_file(&file_io,
task, &predicate, &column).await }). Ensure signature and error type match
MetadataScanError and tests/tracing behavior remain consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/icegate-common/src/testing/minio.rs`:
- Around line 67-76: Update the doc comment to correct the type and clarify the
behavior: change the incorrect `RequestedContainer<GenericImage>` to
`ContainerRequest<GenericImage>`, remove the misleading claim about keeping the
chain "uniform with the no-options path", and state that since earlier calls
like `with_env_var()` and `with_cmd()` (from `ImageExt`) already make `image` a
`ContainerRequest<GenericImage>`, the conditional `if let Some(net)` / `if let
Some(name)` blocks simply continue mutating the same `image` via `with_network`
and `with_container_name`.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 392-408: The docstring promises a conservative true when
null_count is unknown and the chunk has rows, but has_nulls currently returns
false on None; update has_nulls to return true when stats.null_count_opt() is
None and the chunk contains at least one row (i.e., treat unknown null_count as
"may contain nulls"), otherwise keep the existing Some(n) if n > 0 logic; locate
the function has_nulls and use the Statistics API (e.g.,
Statistics::null_count_opt() and the stats' row-count accessor available in this
crate) to check row count and return true when null_count_opt() is None &&
row_count > 0.

In `@crates/icegate-query/src/tempo/formatters.rs`:
- Around line 420-489: The timestamp/null-handling is wrong: compute ts_micros
and end_micros only when the respective column is non-null for that row (use
starts/ends is_null(row) guards) so null timestamps don't become 0 and bias
BestSpan; when updating acc.start_micros and acc.end_micros only call
s.value(row)/e.value(row) if s.is_null(row)/e.is_null(row) is false (skip
otherwise); update the construction of BestSpan (ts_micros, duration_micros) to
treat missing timestamps as "absent" (e.g., fallback to i64::MAX for ordering
but do not read underlying 0) and ensure the span accumulation logic that uses
starts, ends, acc.start_micros, acc.end_micros, ts_micros, and end_micros is
guarded by per-row null checks.

In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 274-277: The metadata scan for "rootServiceName" currently uses
scan_values_either(..., COL_SERVICE_NAME) and therefore enumerates service_name
values from all spans, not only root spans; update the call site in metadata.rs
(the "rootServiceName" | "trace:rootService" match arm) to add a predicate
requiring parent_span_id IS NULL (i.e., combine extra_predicate with a root-span
predicate) so scan_values_either only returns root service names, or if that
pushdown is too expensive add a clear comment and create a follow-up issue
noting the limitation; reference scan_values_either and COL_SERVICE_NAME when
making the change.

In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 52-104: Compute the number of time buckets after you determine
step_ns in apply_metrics and enforce QueryContext::max_grid_points: obtain the
query start/end from the QueryContext (e.g., ctx.range.start and ctx.range.end
or the equivalent fields), compute duration_ns = end - start, compute buckets =
duration_ns / step_ns (ceil or integer division as appropriate), and if buckets
> ctx.max_grid_points return a 400 error (use the project’s
bad_request/invalid_input helper) before calling time_bucket_expr or performing
aggregation; reference apply_metrics, step_ns, and QueryContext::max_grid_points
when locating where to add this check.

In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 74-81: The current field_ref_to_group_key silently maps unmodelled
intrinsics to a constant NULL group key; update it so FieldRef::Intrinsic
handles only TraceID (col(COL_TRACE_ID)) and any other intrinsic that is not
explicitly modelled should return an Err(NotImplemented) instead of calling
super::intrinsic_column and producing a NULL literal. Concretely: in
field_ref_to_group_key, keep the TraceID short-circuit, explicitly accept any
other intrinsics that you know are modelled by intrinsic_column, but for the
remaining/unmodelled Intrinsic variants return a not-implemented error (include
a clear message like "grouping by intrinsic X not supported") rather than
letting intrinsic_column produce lit(NULL); reference the symbols
FieldRef::Intrinsic, field_ref_to_group_key, super::intrinsic_column and
group_key_for_attribute when making this change.
- Around line 185-197: The duration literal conversion in literal_to_filter_expr
currently truncates nanoseconds with `*ns / 1_000`, causing sub-microsecond
values to become 0; change the Duration arm to round up like the selector path
by using saturating addition before division, e.g. replace `lit(*ns / 1_000)`
with `lit((*ns.saturating_add(999)) / 1_000)` (use the same integer type as
LiteralValue::Duration) so `literal_to_filter_expr` and the selector logic
produce identical semantics.

In `@crates/icegate-query/src/traceql/Makefile`:
- Line 12: The Makefile loop that prepends a header to antlr/traceql*.rs uses
sed with an escaped '\n' (the line "for f in antlr/traceql*.rs; do sed -i.bak
'1s/^/...\\n/' $$f ..."), which is non-portable on macOS; replace that sed-based
prepend with a portable approach (for both the traceql and logql Makefiles) such
as piping a printf of the header into cat with the target file (e.g., "printf
'...\\n' | cat - $$f > $$f.tmp && mv $$f.tmp $$f") or using sed's '1i\' form
with a literal newline, ensuring the header string includes the same clippy
allowances and that .bak cleanup is handled similarly.

---

Outside diff comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 249-253: Update the stale function-level docstring that currently
claims a single "merged attributes map" to reflect the current output shape:
mention that the implementation emits separate columns named resource_attributes
and span_attributes (instead of a merged attributes map) and that the
indexed-column mirror keys have been removed; keep the rest of the description
about top-level fields, nested events/links arrays, and dropped links being
counted in dropped_links_count intact so readers of the function (the doc
attached to the transform function in transform.rs) see the accurate schema
shape.

---

Nitpick comments:
In `@crates/icegate-common/src/testing/minio.rs`:
- Around line 16-19: The constants MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are
declared for MinIO credentials but create_s3_bucket constructs Credentials with
hardcoded "minioadmin" literals, which will break if the constants change;
modify the code so create_s3_bucket uses the shared constants instead of string
literals (export the constants as pub(crate) or pub as appropriate and replace
the literal usages in create_s3_bucket with MINIO_ROOT_USER and
MINIO_ROOT_PASSWORD to ensure a single source of truth for credentials).

In `@crates/icegate-ingest/src/transform.rs`:
- Around line 522-544: The resource attributes are still appended verbatim via
add_flattened_attributes_dotted which allows duplicate keys; instead,
deduplicate merged dotted resource attributes before appending to
resource_attrs_builder similar to how scope+span were handled: collect
resource_attrs (the same variable passed to add_flattened_attributes_dotted)
into a single-list ordered map (e.g., BTreeMap or the same
merge_dotted_attributes helper), then iterate that deduped map and append
keys/values into resource_attrs_builder (like the loop used for
merged_span_attrs) rather than calling add_flattened_attributes_dotted directly.

In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 263-279: Extract the per-file logic inside the closure in
scan_label_int_values into a new async helper function
scan_label_int_values_for_file(file_io: &iceberg::io::FileIO, task:
FileScanTask, predicate: &Predicate, column: &str) -> Result<BTreeSet<i32>,
MetadataScanError> that mirrors scan_labels_for_file/scan_label_values_for_file:
move the parquet_reader::open_file_direct call, the BTreeSet creation, and the
values::collect_indexed_int_values_via_dict call into that helper, annotate it
with #[tracing::instrument(skip_all, fields(file = %task.data_file_path, column
= %column))], and then change the plan_stream.map_ok closure to call the helper
(e.g. let file_io = file_io.clone(); let predicate = predicate.clone(); let
column = column.clone(); async move { scan_label_int_values_for_file(&file_io,
task, &predicate, &column).await }). Ensure signature and error type match
MetadataScanError and tests/tracing behavior remain consistent.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 419-443: The helper extend_from_int_stats currently returns false
when (max-min) > MAX_STATS_SPREAD which causes callers (and counters like
no_dict_no_stats) to treat the row-group as having no discoverable values;
change extend_from_int_stats so that when spread > MAX_STATS_SPREAD it falls
back to inserting just min and max into out (a 2-point over-approximation) and
returns true instead of false; keep the MAX_STATS_SPREAD constant for the
small-enum fast-path but ensure callers see at least min/max for wide ranges so
values aren’t silently dropped.

In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 95-115: The parsing logic in parse_q_to_predicate duplicates the
spawn_blocking + make_query_error_send + join_error pattern used in
executor::execute; extract that into a shared helper like
parse_query_blocking(q: String) -> TempoResult<Expr> (placed in a common module)
that performs the spawn_blocking call, uses span.in_scope, applies
AntlrParser::new().parse(&q_owned).map_err(super::executor::make_query_error_send),
awaits and maps join errors with super::executor::join_error, and returns the
parsed Expr; then replace the inline spawn_blocking flow in parse_q_to_predicate
and in executor::execute to call this new helper so both sites share the same
error-scrubbing and instrumentation logic.

In `@crates/icegate-query/src/tempo/metadata.rs`:
- Around line 366-368: The code currently collects the entire BTreeSet into a
Vec then truncates it; change the collection to streamline and avoid allocating
the full vector by using values.into_iter().take(limit).collect() (mirror the
approach used in list_tag_values_v2) inside the same function (list_tag_values
or the block shown) so the iterator yields only up to limit elements before
collecting and returning Ok(out).

In `@crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs`:
- Around line 12-21: The generated doc-comments all reference a placeholder rule
`\{`@link` TraceQLParserBaseParser#s}`; fix by post-processing the generated file
to substitute the correct rule name for each enter_/exit_ method: for each
method like enter_root/exit_root (or enter_pipelineexpr/enter_filterparen),
parse the method signature's context type (e.g., RootContext<'input>,
PipelineExprContext, FilterParenContext), strip the trailing "Context" and
lifetimes, derive the rule identifier, and replace the `#s` placeholder in that
method's doc-comment with the derived rule name (formatted to match the parser
link style); implement this substitution step in the `make gen` post-processing
step so generated files (e.g., traceqlparserbaselistener.rs and methods
enter_root, exit_root, enter_pipelineexpr, enter_filterparen) get correct links.

In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 60-79: The code currently appends grouping expressions into
group_keys (built from time_bucket_expr, prelude stages via PipelineStage::By
and terminal group_by) which can produce duplicates (e.g., same field from
prelude and terminal group_by); change the construction so you deduplicate
group_keys before using them in df.aggregate: after pushing time_bucket_expr and
all keys from PipelineStage::By and group_by, iterate the vector and build a new
Vec<Expr> preserving order while filtering duplicates using a HashSet<String>
keyed by a stable rendering/identifier of each expression (for example the
string produced by the same renderer used elsewhere, or Expr::to_string if
appropriate); keep references to the existing functions/values (group_keys,
super::field_ref_to_group_key, PipelineStage::By, group_by, time_bucket_expr) so
the change is limited to de-duplicating the group_keys vector prior to
aggregation.

In `@crates/icegate-query/src/traceql/datafusion/selectors.rs`:
- Around line 267-275: The test duration_literal_is_normalized_to_micros
currently uses a substring match which can false-positive; update the assertion
on the result `e` (produced by literal_to_scalar called with
FieldRef::Intrinsic(IntrinsicField::Duration)) to be stricter: either
pattern-match/inspect the produced scalar to assert it equals the expected micro
value (e.g., Int64(1_000_000)) or at minimum assert format!("{e:?}") both
contains "1000000" and does NOT contain the original nanosecond string
"1000000000" so the normalization cannot silently regress.

In `@crates/icegate-query/src/traceql/metric.rs`:
- Around line 50-76: The Aggregate and AggregateFilter variants allow invalid
combinations (e.g., AggregationOp::Quantile with percentile None, or
non-Quantile ops with Some(percentile)); add a fallible constructor (e.g.,
PipelineStage::Aggregate::try_new or PipelineStage::try_new that validates
Aggregate/AggregateFilter) or implement TryFrom<(AggregationOp,
Option<FieldRef>, Option<f64>, ...)> for the enum so the AST cannot be
constructed with illegal states; inside the constructor/checker assert that if
op == AggregationOp::Quantile then percentile.is_some() and for other ops
percentile.is_none(), returning a clear error type on violation so callers
(parser/visitor) surface validation errors early and the planner can assume a
well-formed AST.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 522543eb-bb50-42ff-b158-007518fda3f6

📥 Commits

Reviewing files that changed from the base of the PR and between fc2c1fe and a024f2f.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (48)

.cargo/audit.toml
clippy.toml
config/kustomize/base/values-grafana.yaml
config/kustomize/overlays/skaffold/values-icegate.yaml
crates/icegate-common/src/testing/minio.rs
crates/icegate-ingest/src/transform.rs
crates/icegate-jobmanager/src/core/job.rs
crates/icegate-maintain/Cargo.toml
crates/icegate-maintain/tests/migrate_integration_test.rs
crates/icegate-query/src/engine/metadata_scan/labels.rs
crates/icegate-query/src/engine/metadata_scan/mod.rs
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs
crates/icegate-query/src/engine/metadata_scan/values.rs
crates/icegate-query/src/loki/executor.rs
crates/icegate-query/src/loki/mod.rs
crates/icegate-query/src/tempo/executor.rs
crates/icegate-query/src/tempo/formatters.rs
crates/icegate-query/src/tempo/handlers.rs
crates/icegate-query/src/tempo/metadata.rs
crates/icegate-query/src/tempo/models.rs
crates/icegate-query/src/tempo/validation.rs
crates/icegate-query/src/traceql/Makefile
crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4
crates/icegate-query/src/traceql/antlr/TraceQLLexer.interp
crates/icegate-query/src/traceql/antlr/TraceQLLexer.tokens
crates/icegate-query/src/traceql/antlr/TraceQLParser.g4
crates/icegate-query/src/traceql/antlr/TraceQLParser.interp
crates/icegate-query/src/traceql/antlr/TraceQLParser.tokens
crates/icegate-query/src/traceql/antlr/mod.rs
crates/icegate-query/src/traceql/antlr/tests.rs
crates/icegate-query/src/traceql/antlr/traceqllexer.rs
crates/icegate-query/src/traceql/antlr/traceqlparser.rs
crates/icegate-query/src/traceql/antlr/traceqlparserbaselistener.rs
crates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rs
crates/icegate-query/src/traceql/antlr/traceqlparserlistener.rs
crates/icegate-query/src/traceql/antlr/traceqlparservisitor.rs
crates/icegate-query/src/traceql/antlr/visitors.rs
crates/icegate-query/src/traceql/common.rs
crates/icegate-query/src/traceql/datafusion/metrics.rs
crates/icegate-query/src/traceql/datafusion/mod.rs
crates/icegate-query/src/traceql/datafusion/pipeline.rs
crates/icegate-query/src/traceql/datafusion/planner.rs
crates/icegate-query/src/traceql/datafusion/planner_tests.rs
crates/icegate-query/src/traceql/datafusion/selectors.rs
crates/icegate-query/src/traceql/iceberg_predicate.rs
crates/icegate-query/src/traceql/metric.rs
crates/icegate-query/src/traceql/spanset.rs
crates/icegate-query/tests/tempo/tags.rs

✅ Files skipped from review due to trivial changes (9)

clippy.toml
crates/icegate-query/src/traceql/antlr/TraceQLLexer.interp
crates/icegate-query/src/traceql/antlr/TraceQLParser.interp
.cargo/audit.toml
crates/icegate-query/src/traceql/antlr/TraceQLParser.tokens
crates/icegate-query/src/traceql/antlr/TraceQLLexer.tokens
crates/icegate-jobmanager/src/core/job.rs
crates/icegate-query/src/traceql/antlr/TraceQLParser.g4
crates/icegate-query/tests/tempo/tags.rs

🚧 Files skipped from review as they are similar to previous changes (12)

config/kustomize/overlays/skaffold/values-icegate.yaml
crates/icegate-query/src/traceql/datafusion/mod.rs
crates/icegate-query/src/traceql/antlr/TraceQLLexer.g4
crates/icegate-query/src/traceql/antlr/traceqlparserbasevisitor.rs
crates/icegate-query/src/traceql/antlr/traceqlparserlistener.rs
crates/icegate-query/src/traceql/antlr/tests.rs
crates/icegate-query/src/traceql/common.rs
config/kustomize/base/values-grafana.yaml
crates/icegate-query/src/traceql/antlr/mod.rs
crates/icegate-query/src/tempo/executor.rs
crates/icegate-query/src/traceql/antlr/traceqllexer.rs
crates/icegate-query/src/traceql/antlr/visitors.rs

…kustomize namespace patches following the deprecation of all-in-one Jaeger services. Refactor trace duration, processing, and validation logic in Tempo. Adjust TraceQL and Tempo handlers for consistency and performance improvements. Set OTLP HTTP server body size limits for large batches.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (3)

crates/icegate-ingest/src/transform.rs (2)

529-545: Hoist invariant attribute flattening out of the per-span hot path.

Line 529 recomputes dedupe_dotted_attributes(resource_attrs) for every span even though resource_attrs are constant within a resource_spans block. On large batches this adds avoidable CPU/alloc overhead.

♻️ Suggested refactor

 for resource_spans in &request.resource_spans {
     let resource_attrs = resource_spans.resource.as_ref().map_or(&empty_attrs, |r| &r.attributes);
+    let merged_resource_attrs = dedupe_dotted_attributes(resource_attrs);

     // ...
     for scope_spans in &resource_spans.scope_spans {
+        let scope_attrs = scope_spans.scope.as_ref().map_or(&empty_attrs, |s| &s.attributes);
         for span in &scope_spans.spans {
-            let merged_resource_attrs = dedupe_dotted_attributes(resource_attrs);
             for (key, value) in &merged_resource_attrs {
                 resource_attrs_builder.keys().append_value(key);
                 resource_attrs_builder.values().append_value(value);
             }

-            let scope_attrs = scope_spans.scope.as_ref().map_or(&empty_attrs, |s| &s.attributes);
             let merged_span_attrs = merge_dotted_attributes(scope_attrs, &span.attributes);
             for (key, value) in &merged_span_attrs {
                 span_attrs_builder.keys().append_value(key);
                 span_attrs_builder.values().append_value(value);
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-ingest/src/transform.rs` around lines 529 - 545, The
dedupe_dotted_attributes(resource_attrs) call is inside the per-span loop
causing repeated work; hoist it so merged_resource_attrs is computed once per
resource_spans and reused for every span. Specifically, compute let
merged_resource_attrs = dedupe_dotted_attributes(resource_attrs); immediately
after obtaining resource_attrs (before looping over spans), then inside the span
loop just iterate over &merged_resource_attrs to append to
resource_attrs_builder; ensure you remove the duplicate call in the
span-processing block and keep the existing merge_dotted_attributes(scope_attrs,
&span.attributes) per-span behavior.

1949-2053: Add one collision test to lock in “span wins over scope”.

Current assertions validate routing and mirror-key absence, but they don’t explicitly cover same-key collision precedence. Add one key present in both scope and span and assert span value is emitted.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-ingest/src/transform.rs` around lines 1949 - 2053, In the test
spans_split_attributes_into_resource_and_span_columns add a collision case so
"span wins over scope": modify the ScopeSpans.InstrumentationScope.attributes to
include a key (e.g., "dup.key") with a scope value and the Span.attributes to
include the same key with a different span value, then after building batch
assert that span_attributes contains ("dup.key", spanValue) (and not the
scopeValue) — reference the test function
spans_split_attributes_into_resource_and_span_columns, the kv helper, and the
resource_attrs_col/span_attrs_col + map_row_as_pairs checks to locate where to
add the extra kv entries and assertion.

crates/icegate-query/src/engine/metadata_scan/mod.rs (1)

318-322: Apply the row-group predicate to indexed-label discovery as well.

collect_indexed_from_metadata sees the whole file, while MAP-key discovery is already filtered with predicate. If a planned file contains both matching and non-matching row groups, /labels and Tempo tag discovery can surface indexed labels that only exist outside the requested selector/time slice. Thread the predicate into the indexed metadata path so both discovery modes use the same pruning granularity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/icegate-query/src/engine/metadata_scan/mod.rs` around lines 318 - 322,
The indexed-label discovery currently calls
collect_indexed_from_metadata(&metadata, config, &mut out) which ignores the
row-group predicate used by collect_map_keys_via_dict; update
collect_indexed_from_metadata to accept the same predicate parameter (e.g.,
predicate: &RowGroupPredicate or the existing predicate type) and apply that
predicate when scanning file-level metadata so indexed labels are only collected
from matching row-groups, then update this call site in metadata_scan::mod.rs to
pass predicate into collect_indexed_from_metadata; ensure any other callers are
updated accordingly and tests/usage reflect the new signature.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 540-541: Update the misleading comment that says "HashMap" to
reference "BTreeMap" (and note the ordering semantics) so it matches the
implementation and the merge helper behavior in transform.rs; specifically edit
the comment around the merge logic that currently reads "merging into a HashMap"
to state "merging into a BTreeMap" and mention that scope-first then span
ensures span attrs win due to BTreeMap ordering.

In `@crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs`:
- Around line 278-285: In read_column_int_dictionaries, add a defensive
physical-type check that verifies the targeted leaf column is actually INT32
before attempting any dictionary decoding or stats-based fallback; locate the
column descriptor for the given leaf_idx from the Parquet metadata (the same
structure used to access dictionary bytes and statistics) and if its physical
type is not INT32 return a MetadataScanError with a clear message (including
column name/leaf_idx and actual physical type) so the function fails fast on
schema mismatches rather than decoding garbage; apply this single check once
up-front so both the dictionary-decoding path and the Statistics::Int32 fallback
are protected.

In `@crates/icegate-query/src/tempo/executor.rs`:
- Line 6: The import for RecordBatch is incorrect; replace the use of
datafusion::arrow::array::RecordBatch with the correct path
datafusion::arrow::record_batch::RecordBatch in executor.rs so the type resolves
(update the use statement referencing RecordBatch accordingly).

In `@crates/icegate-query/src/tempo/handlers.rs`:
- Around line 113-125: wants_protobuf currently returns true if the Accept
header contains protobuf anywhere, which ignores the caller's JSON preference;
update wants_protobuf to properly parse the Accept header into media-type
entries with q values (default q=1) and determine precedence: extract media
types from headers.get(header::ACCEPT), split on commas, parse each token's mime
and optional q parameter, then compute the highest-q match for JSON (e.g.,
"application/json" or "application/*" or "*/*") and for protobuf
("application/protobuf" or "application/x-protobuf"); return false if JSON has
higher or equal q than protobuf, return true if protobuf has higher q, and
default to true only when neither is present. Ensure changes are made inside the
wants_protobuf function and handle invalid/missing Accept gracefully.

In `@crates/icegate-query/src/traceql/datafusion/metrics.rs`:
- Around line 67-71: The bucket count calculation undercounts when duration is
an exact multiple of step: replace the manual ceil logic with inclusive-endpoint
logic by computing buckets as (duration_ns / step_ns) + 1 (using integer
division) so exact-multiple cases get the extra endpoint; update the calculation
in the block that uses ctx.start/ctx.end and step_ns and still respect the
existing guards (if duration_ns > 0 && step_ns > 0) and the subsequent
comparison against ctx.max_grid_points using the variable buckets.

In `@crates/icegate-query/src/traceql/datafusion/pipeline.rs`:
- Around line 142-143: The code only checks that `percentile` is present when
building the quantile aggregate; you must also validate that the extracted `p`
is finite and within [0.0, 1.0] before continuing. After obtaining `p` from
`percentile.ok_or_else(...)` (the variable `p` used in the
`quantile_over_time(.field, p)` handling in pipeline.rs), add a guard that
returns a validation error if p.is_nan(), !p.is_finite(), p < 0.0, or p > 1.0 so
invalid percentiles are rejected with a clear validation message instead of
failing later in the planner/runtime.
- Around line 185-190: The error message in field_arg_expr is specific to
sum/avg/min/max and misleads callers of AggregationOp::Quantile (e.g.,
quantile_over_time()); update the validation error text in field_arg_expr to
mention quantile or use a generic "aggregation requires a field argument"
message so quantile errors are correctly described; locate field_arg_expr and
change the string passed to validation(...) accordingly.

In `@crates/icegate-query/src/traceql/Makefile`:
- Around line 9-10: The gen Make target runs java -jar $(ANTLR4_JAR) but doesn't
declare a prerequisite for $(ANTLR4_JAR) or the install target, causing failures
on clean checkouts; update the gen target declaration to depend on either
$(ANTLR4_JAR) or the install target (e.g., change the target line to "gen:
antlr/TraceQLLexer.g4 antlr/TraceQLParser.g4 $(ANTLR4_JAR)" or add "install" as
a prerequisite) so the JAR is guaranteed present before invoking java -jar, and
ensure the existing install target builds or fetches $(ANTLR4_JAR).
- Around line 6-7: The Makefile's install rule currently wget's ANTLR4_JAR
without ensuring the destination directory exists or verifying file integrity;
update the install rule to first create the target directory (run mkdir -p $(dir
$(ANTLR4_JAR))) before wget, add an ANTLR4_SHA256 variable containing the
expected SHA256 checksum, download the jar to $(ANTLR4_JAR) (use a robust
curl/wget invocation) and then verify it with sha256sum -c or by comparing
`sha256sum` output to $(ANTLR4_SHA256), failing the build if the check fails,
and only after successful verification run chmod +x $(ANTLR4_JAR).

---

Nitpick comments:
In `@crates/icegate-ingest/src/transform.rs`:
- Around line 529-545: The dedupe_dotted_attributes(resource_attrs) call is
inside the per-span loop causing repeated work; hoist it so
merged_resource_attrs is computed once per resource_spans and reused for every
span. Specifically, compute let merged_resource_attrs =
dedupe_dotted_attributes(resource_attrs); immediately after obtaining
resource_attrs (before looping over spans), then inside the span loop just
iterate over &merged_resource_attrs to append to resource_attrs_builder; ensure
you remove the duplicate call in the span-processing block and keep the existing
merge_dotted_attributes(scope_attrs, &span.attributes) per-span behavior.
- Around line 1949-2053: In the test
spans_split_attributes_into_resource_and_span_columns add a collision case so
"span wins over scope": modify the ScopeSpans.InstrumentationScope.attributes to
include a key (e.g., "dup.key") with a scope value and the Span.attributes to
include the same key with a different span value, then after building batch
assert that span_attributes contains ("dup.key", spanValue) (and not the
scopeValue) — reference the test function
spans_split_attributes_into_resource_and_span_columns, the kv helper, and the
resource_attrs_col/span_attrs_col + map_row_as_pairs checks to locate where to
add the extra kv entries and assertion.

In `@crates/icegate-query/src/engine/metadata_scan/mod.rs`:
- Around line 318-322: The indexed-label discovery currently calls
collect_indexed_from_metadata(&metadata, config, &mut out) which ignores the
row-group predicate used by collect_map_keys_via_dict; update
collect_indexed_from_metadata to accept the same predicate parameter (e.g.,
predicate: &RowGroupPredicate or the existing predicate type) and apply that
predicate when scanning file-level metadata so indexed labels are only collected
from matching row-groups, then update this call site in metadata_scan::mod.rs to
pass predicate into collect_indexed_from_metadata; ensure any other callers are
updated accordingly and tests/usage reflect the new signature.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 083dea42-43f3-4f1f-8593-d2ef7ac0fd10

📥 Commits

Reviewing files that changed from the base of the PR and between a024f2f and d84dbe9.

📒 Files selected for processing (22)

config/kustomize/README.md
config/kustomize/base/kustomization.yaml
config/kustomize/base/otel-collector/configmap.yaml
config/kustomize/base/values-grafana.yaml
config/kustomize/base/values-jaeger.yaml
crates/icegate-common/src/testing/minio.rs
crates/icegate-common/src/testing/s3.rs
crates/icegate-ingest/src/otlp_http/config.rs
crates/icegate-ingest/src/otlp_http/routes.rs
crates/icegate-ingest/src/otlp_http/server.rs
crates/icegate-ingest/src/transform.rs
crates/icegate-query/src/engine/metadata_scan/mod.rs
crates/icegate-query/src/engine/metadata_scan/parquet_reader.rs
crates/icegate-query/src/logql/Makefile
crates/icegate-query/src/tempo/executor.rs
crates/icegate-query/src/tempo/formatters.rs
crates/icegate-query/src/tempo/handlers.rs
crates/icegate-query/src/tempo/metadata.rs
crates/icegate-query/src/traceql/Makefile
crates/icegate-query/src/traceql/datafusion/metrics.rs
crates/icegate-query/src/traceql/datafusion/pipeline.rs
crates/icegate-query/src/traceql/datafusion/selectors.rs

💤 Files with no reviewable changes (3)

config/kustomize/README.md
config/kustomize/base/values-jaeger.yaml
config/kustomize/base/kustomization.yaml

✅ Files skipped from review due to trivial changes (1)

crates/icegate-query/src/tempo/formatters.rs

🚧 Files skipped from review as they are similar to previous changes (2)

crates/icegate-query/src/traceql/datafusion/selectors.rs
crates/icegate-query/src/tempo/metadata.rs

…lumn type validation, and standardize attribute merging logic with deterministic ordering.

s-prosvirnin · 2026-05-01T11:47:59Z

+    validate_duration_param("minDuration", params.min_duration.as_deref())?;
+    validate_duration_param("maxDuration", params.max_duration.as_deref())?;
+    if params.min_duration.is_some() || params.max_duration.is_some() {
+        return Err(TempoError::new(QueryError::NotImplemented(


Maybe we should TODO mark here?

frisbeeman added 5 commits April 19, 2026 23:05

GH-119: Tempo endpoint with TraceQL query

562ad71

Refactor Tempo trace search to improve root span selection, handle mu…

69f0110

…lti-root traces, and enforce per-trace span caps (`spss`)

Refactor Icegate schema and table providers to support WAL-backed tab…

fc2c1fe

…les (`logs`, `spans`) and introduce allowlist-based dispatch.

frisbeeman requested review from a team and s-prosvirnin April 27, 2026 15:58

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Review fixes

a024f2f

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Refactor bucket computation in metrics queries, improve dictionary co…

97ecb61

…lumn type validation, and standardize attribute merging logic with deterministic ordering.

s-prosvirnin approved these changes May 2, 2026

View reviewed changes

frisbeeman merged commit b671fdd into main May 2, 2026
8 checks passed

This was referenced May 3, 2026

GH-135: Store trace_id, span_id, and parent_span_id as FixedSizeBinary #136

Merged

GH-120: optimise created parquet files #138

Merged

coderabbitai Bot mentioned this pull request May 16, 2026

GH-140: Remove cloud account id #141

Merged

Conversation

frisbeeman commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-prosvirnin May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frisbeeman commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading