Protobuf ingestion #1391

nikhilsinhaparseable · 2025-07-28T15:05:41Z

Summary by CodeRabbit

New Features
- Added support for ingesting OpenTelemetry logs, metrics, and traces in both JSON and Protobuf formats, automatically detecting the payload type based on the "Content-Type" header.
- Improved error handling for unsupported or missing content types and oversized Protobuf payloads.
Enhancements
- OpenTelemetry Protobuf payloads are now flattened into JSON for easier downstream processing.
- Unified and modularized ingestion handlers with consistent stream validation and content processing.
- Refactored flattening logic for logs, metrics, and traces to support both internal and Protobuf message types with shared processing functions.
- Exposed log pushing utility function for broader use.
- Added standardized constants for JSON and Protobuf content types.
Dependencies
- Updated and added dependencies to support Protobuf ingestion and processing.

coderabbitai · 2025-07-28T15:05:48Z

"""

Walkthrough

Support for ingesting OpenTelemetry logs, metrics, and traces in both JSON and Protobuf formats was added. The ingestion handlers now branch on the "Content-Type" header, using new flattening functions for Protobuf payloads. Dependencies were updated to enable Protobuf support, and relevant flattening utilities were implemented for each OTEL data type. Common validation and stream setup logic were centralized in helper functions for consistency.

Changes

Cohort / File(s)	Change Summary
Dependency Updates `Cargo.toml`	Updated `tonic` features to include "prost"; switched `opentelemetry-proto` from a Git ref to a versioned crate with explicit features; added direct `prost` dependency.
Ingestion Handler Refactor `src/handlers/http/ingest.rs`	Modified OTEL ingestion endpoints to accept raw bytes and support JSON and Protobuf payloads by branching on "Content-Type". Added helper functions `setup_otel_stream` for stream validation and `process_otel_content` for content processing including Protobuf decoding and flattening. Updated function signatures and error handling accordingly.
Ingestion Utility Visibility `src/handlers/http/modal/utils/ingest_utils.rs`	Made `push_logs` function public.
Handler Constants Addition `src/handlers/mod.rs`	Added public constants `CONTENT_TYPE_JSON` and `CONTENT_TYPE_PROTOBUF` for standardized MIME type strings.
OTEL Logs Protobuf Flattening `src/otel/logs.rs`	Refactored `flatten_otel_logs` to use a generic helper `process_resource_logs`. Added `flatten_otel_protobuf` to flatten Protobuf OTEL logs by reusing the helper with protobuf-specific accessors.
OTEL Metrics Protobuf Flattening `src/otel/metrics.rs`	Refactored `flatten_otel_metrics` to use a generic helper `process_resource_metrics`. Added `flatten_otel_metrics_protobuf` to flatten Protobuf OTEL metrics using the same helper with protobuf-specific accessors.
OTEL Traces Protobuf Flattening & Test Update `src/otel/traces.rs`	Added `flatten_otel_traces_protobuf` and refactored `flatten_otel_traces` to use a generic helper `process_resource_spans`. Updated tests to include new entity references in resource metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant FlatteningUtil

    Client->>Server: POST /ingest/{logs,metrics,traces} (Content-Type: application/json or application/x-protobuf)
    Server->>Server: Extract stream name and validate log source
    Server->>Server: Check Content-Type header
    alt Content-Type is JSON
        Server->>FlatteningUtil: flatten_and_push_JSON()
    else Content-Type is Protobuf
        Server->>FlatteningUtil: decode_and_flatten_protobuf()
        FlatteningUtil->>FlatteningUtil: push_logs(flattened_records)
    else Invalid Content-Type
        Server->>Client: Return error response
    end
    Server->>Client: Return HTTP response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

for next release

Poem

🐇
A hop, a leap, a bounding byte—
Now Protobuf and JSON both delight!
Logs, metrics, traces, all can flow,
In flattened form, they swiftly go.
With headers checked and bytes in tow,
This code’s a field where carrots grow!
🌱🥕
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94e5022 and b582d9d.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

Cargo.toml (3 hunks)
src/handlers/http/ingest.rs (5 hunks)
src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
src/handlers/mod.rs (1 hunks)
src/otel/logs.rs (3 hunks)
src/otel/metrics.rs (6 hunks)
src/otel/traces.rs (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

src/handlers/http/modal/utils/ingest_utils.rs
src/handlers/mod.rs
src/otel/logs.rs
Cargo.toml
src/otel/traces.rs

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

src/handlers/http/ingest.rs (5)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: nikhilsinhaparseable
PR: #1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

Learnt from: nikhilsinhaparseable
PR: #1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.552Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: coverage

🔇 Additional comments (9)

src/otel/metrics.rs (4)

18-18: LGTM!

The import of ExportMetricsServiceRequest is necessary for protobuf support and follows the established pattern.

503-514: Well-designed generic abstraction.

The generic helper function effectively abstracts the common traversal and flattening logic for resource metrics. While the function has many parameters, this is justified to support different input types (internal vs protobuf) through accessor functions.

583-596: Excellent refactoring for code reuse.

The refactored function successfully eliminates duplication by delegating to the generic helper with appropriate accessor closures. This maintains the same functionality while enabling code reuse.

598-610: Perfect protobuf integration.

The new function correctly implements protobuf support by reusing the generic helper with appropriate accessor closures for ExportMetricsServiceRequest. This ensures consistent flattening behavior across different input formats.

src/handlers/http/ingest.rs (5)

21-50: LGTM!

The import additions are well-organized and necessary for the protobuf functionality. The inclusion of protobuf message types, flattening functions, and content type constants supports the new unified ingestion approach.

171-241: Excellent consolidation of OTEL stream validation.

The function correctly centralizes stream setup and validation logic. The stream compatibility checks properly implement the learned constraints:

For logs: reject streams with metrics/traces log sources

For metrics/traces: only allow same type (strict restrictions)

This eliminates code duplication and ensures consistent validation across all OTEL handlers.

311-337: Excellent refactoring for logs ingestion.

The function is now much cleaner and focused, delegating common functionality to the helper functions. The use of ExportLogsServiceRequest::decode and flatten_otel_protobuf is appropriate for logs processing.

339-365: Consistent metrics ingestion refactor.

The function follows the same clean pattern as the logs handler, using ExportMetricsServiceRequest::decode and flatten_otel_metrics_protobuf appropriately for metrics processing.

367-393: Completes consistent OTEL handler pattern.

The function maintains the same clean, consistent pattern as the other OTEL handlers, using ExportTraceServiceRequest::decode and flatten_otel_traces_protobuf for traces processing.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc29387 and a22a12c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

Cargo.toml (3 hunks)
src/handlers/http/ingest.rs (8 hunks)
src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
src/otel/logs.rs (2 hunks)
src/otel/metrics.rs (2 hunks)
src/otel/traces.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

src/handlers/http/modal/utils/ingest_utils.rs (1)

Learnt from: nikhilsinhaparseable
PR: #1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

src/handlers/http/ingest.rs (2)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin

🔇 Additional comments (16)

Cargo.toml (3)

36-36: LGTM!

Adding the "prost" feature to tonic is necessary for protobuf support and aligns with the PR objective.

79-85: LGTM!

Moving from git dependency to versioned crate (0.30.0) improves stability. The enabled features ("gen-tonic", "with-serde", "logs", "metrics", "trace") are appropriate for the protobuf ingestion functionality.

142-142: LGTM!

Adding prost as a direct dependency is necessary for protobuf message decoding in the application code.

src/handlers/http/modal/utils/ingest_utils.rs (1)

95-95: LGTM!

Making push_logs public is appropriate to support the new protobuf ingestion handlers that need to process individual flattened records after protobuf decoding.

src/otel/logs.rs (2)

21-21: LGTM!

Import is necessary for the new protobuf flattening functionality.

146-176: LGTM!

The flatten_otel_protobuf function correctly implements protobuf support for OTEL logs. The implementation properly:

Iterates over resource logs in the protobuf message

Extracts resource attributes and metadata

Reuses existing flatten_scope_log for consistency

Merges resource-level data into individual log records

Follows the same pattern as the existing flatten_otel_logs function

src/otel/metrics.rs (2)

18-18: LGTM!

Import is necessary for the new protobuf metrics flattening functionality.

607-661: LGTM!

The flatten_otel_metrics_protobuf function correctly implements protobuf support for OTEL metrics. The implementation properly:

Processes resource metrics from the protobuf message

Extracts resource and scope-level metadata

Reuses existing flatten_metrics_record for consistency

Merges hierarchical metadata into individual metric records

Follows the same pattern as flatten_otel_metrics but for protobuf format

src/otel/traces.rs (4)

18-18: LGTM!

Import is necessary for the new protobuf traces flattening functionality.

343-343: LGTM!

Adding EntityRef import is necessary for the updated test data construction.

784-791: LGTM!

Adding entity_refs to the test Resource provides more comprehensive test coverage for the complete Resource structure.

938-997: LGTM!

The flatten_otel_traces_protobuf function correctly implements protobuf support for OTEL traces. The implementation properly:

Processes resource spans from the protobuf message

Extracts resource and scope-level metadata

Reuses existing flatten_span_record for consistency

Merges hierarchical metadata (scope and resource) into individual span records

Follows the same pattern as flatten_otel_traces but for protobuf format

src/handlers/http/ingest.rs (4)

21-21: LGTM! Import changes support the new Protobuf functionality.

The import changes are well-aligned with the new functionality:

Removing Json from the web import since handlers now accept raw bytes

Adding push_logs for individual record processing in the Protobuf path

Adding the necessary OTEL flattening functions and Protobuf message types

Also applies to: 32-32, 36-46

165-165: LGTM! Parameter type change supports both JSON and Protobuf.

Changing from Json<StrictValue> to web::Bytes allows the handler to inspect the raw payload and determine the format based on Content-Type header.

264-264: LGTM! Parameter type change supports both JSON and Protobuf.

Consistent with the logs handler, changing to web::Bytes enables content-type based format detection.

351-351: LGTM! Parameter type change supports both JSON and Protobuf.

Consistent with other OTEL handlers, enabling content-type based format detection.

src/handlers/http/ingest.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/handlers/http/ingest.rs (1)

249-290: Improve Content-Type handling for robustness.

The Content-Type handling has several issues that were previously identified and remain unaddressed:

Direct string equality won't handle parameters like application/json; charset=utf-8
No explicit error for unsupported Content-Type values
Size limit only enforced for Protobuf, not JSON
Generic error for missing Content-Type header

Apply this improved implementation:

     match req
         .headers()
         .get("Content-Type")
         .and_then(|h| h.to_str().ok())
     {
         Some(content_type) => {
-            if content_type == CONTENT_TYPE_JSON {
+            if content_type.starts_with(CONTENT_TYPE_JSON) {
+                // Apply size limit to JSON as well
+                if body.len() > MAX_EVENT_PAYLOAD_SIZE {
+                    return Err(PostError::Invalid(anyhow::anyhow!(
+                        "JSON payload size {} exceeds maximum allowed size of {} bytes",
+                        body.len(),
+                        MAX_EVENT_PAYLOAD_SIZE
+                    )));
+                }
                 flatten_and_push_logs(
                     serde_json::from_slice(&body)?,
                     stream_name,
                     log_source,
                     &p_custom_fields,
                 )
                 .await?;
-            } else if content_type == CONTENT_TYPE_PROTOBUF {
+            } else if content_type.starts_with(CONTENT_TYPE_PROTOBUF) {
                 // 10MB limit
                 if body.len() > MAX_EVENT_PAYLOAD_SIZE {
                     return Err(PostError::Invalid(anyhow::anyhow!(
                         "Protobuf message size {} exceeds maximum allowed size of {} bytes",
                         body.len(),
                         MAX_EVENT_PAYLOAD_SIZE
                     )));
                 }
                 match decode_protobuf(body) {
                     Ok(decoded) => {
                         for record in flatten_protobuf(&decoded) {
                             push_logs(stream_name, record, log_source, &p_custom_fields).await?;
                         }
                     }
                     Err(e) => {
                         return Err(PostError::Invalid(anyhow::anyhow!(
                             "Failed to decode protobuf message: {}",
                             e
                         )));
                     }
                 }
+            } else {
+                return Err(PostError::Invalid(anyhow::anyhow!(
+                    "Unsupported Content-Type: {}. Expected {} or {}",
+                    content_type,
+                    CONTENT_TYPE_JSON,
+                    CONTENT_TYPE_PROTOBUF
+                )));
             }
         }
         None => {
-            return Err(PostError::Header(ParseHeaderError::InvalidValue));
+            return Err(PostError::Invalid(anyhow::anyhow!(
+                "Missing Content-Type header. Expected {} or {}",
+                CONTENT_TYPE_JSON,
+                CONTENT_TYPE_PROTOBUF
+            )));
         }
     }

🧹 Nitpick comments (3)

src/otel/metrics.rs (2)

503-514: Consider reducing function parameters for better maintainability.

While the generic approach is good, having 8 function parameters makes the function signature complex and harder to maintain. Consider grouping related functions into a trait or struct.

Consider defining a trait to encapsulate the accessor functions:

-#[allow(clippy::too_many_arguments)]
-fn process_resource_metrics<T, S, M>(
-    resource_metrics: &[T],
-    get_resource: fn(&T) -> Option<&opentelemetry_proto::tonic::resource::v1::Resource>,
-    get_scope_metrics: fn(&T) -> &[S],
-    get_schema_url: fn(&T) -> &str,
-    get_scope: fn(&S) -> Option<&opentelemetry_proto::tonic::common::v1::InstrumentationScope>,
-    get_scope_schema_url: fn(&S) -> &str,
-    get_metrics: fn(&S) -> &[M],
-    get_metric: fn(&M) -> &Metric,
-) -> Vec<Value> {
+trait ResourceMetricsAccessor<S, M> {
+    fn get_resource(&self) -> Option<&opentelemetry_proto::tonic::resource::v1::Resource>;
+    fn get_scope_metrics(&self) -> &[S];
+    fn get_schema_url(&self) -> &str;
+}
+
+trait ScopeMetricsAccessor<M> {
+    fn get_scope(&self) -> Option<&opentelemetry_proto::tonic::common::v1::InstrumentationScope>;
+    fn get_scope_schema_url(&self) -> &str;
+    fn get_metrics(&self) -> &[M];
+}
+
+trait MetricAccessor {
+    fn get_metric(&self) -> &Metric;
+}
+
+fn process_resource_metrics<T, S, M>(
+    resource_metrics: &[T],
+) -> Vec<Value>
+where
+    T: ResourceMetricsAccessor<S, M>,
+    S: ScopeMetricsAccessor<M>,
+    M: MetricAccessor,
+{

This would make the function calls cleaner and more type-safe.

571-577: Optimize cloning of resource metadata.

The clone() operation on line 576 is performed for each metric, which could be inefficient when processing large numbers of metrics. Since resource metadata is the same for all metrics within a resource, consider a more efficient approach.

Consider pre-cloning the resource metadata once per resource:

         for resource_metric_json in &mut vec_scope_metrics_json {
-            for (key, value) in &resource_metrics_json {
-                resource_metric_json.insert(key.clone(), value.clone());
-            }
-
-            vec_otel_json.push(Value::Object(resource_metric_json.clone()));
+            resource_metric_json.extend(resource_metrics_json.clone());
+            vec_otel_json.push(Value::Object(std::mem::take(resource_metric_json)));
         }

Alternatively, consider using Rc or Arc for shared metadata to avoid repeated cloning.

src/handlers/http/ingest.rs (1)

199-225: Stream compatibility validation logic is correct.

The implementation correctly enforces the learned restrictions where OTEL logs can coexist with other log types (except metrics/traces), while OTEL metrics/traces require exact type matching. Consider adding a comment to document these rules for future maintainers.

Add a clarifying comment:

     // Validate stream compatibility
     if let Ok(stream) = PARSEABLE.get_stream(&stream_name) {
         match log_source {
             LogSource::OtelLogs => {
-                // For logs, reject if stream is metrics or traces
+                // OTEL logs can coexist with other log types but not with OTEL metrics/traces
                 stream
                     .get_log_source()
                     .iter()
                     .find(|&stream_log_source_entry| {
                         stream_log_source_entry.log_source_format != LogSource::OtelTraces
                             && stream_log_source_entry.log_source_format != LogSource::OtelMetrics
                     })
                     .ok_or(PostError::IncorrectLogFormat(stream_name.clone()))?;
             }
             LogSource::OtelMetrics | LogSource::OtelTraces => {
-                // For metrics/traces, only allow same type
+                // OTEL metrics/traces require strict type matching - no mixing allowed
                 stream
                     .get_log_source()
                     .iter()
                     .find(|&stream_log_source_entry| {
                         stream_log_source_entry.log_source_format == log_source
                     })
                     .ok_or(PostError::IncorrectLogFormat(stream_name.clone()))?;
             }
             _ => {}
         }
     }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a22a12c and bf2b2cf.

📒 Files selected for processing (5)

src/handlers/http/ingest.rs (6 hunks)
src/handlers/mod.rs (1 hunks)
src/otel/logs.rs (3 hunks)
src/otel/metrics.rs (6 hunks)
src/otel/traces.rs (4 hunks)

✅ Files skipped from review due to trivial changes (1)

src/handlers/mod.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/otel/logs.rs
src/otel/traces.rs

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

src/handlers/http/ingest.rs (5)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

Learnt from: nikhilsinhaparseable
PR: #1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.552Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.395Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: coverage

🔇 Additional comments (2)

src/otel/metrics.rs (1)

583-610: LGTM! Clean refactoring that promotes code reuse.

The refactoring successfully extracts common logic while maintaining type safety through closures. Both flatten_otel_metrics and flatten_otel_metrics_protobuf cleanly delegate to the generic helper with appropriate accessors.

src/handlers/http/ingest.rs (1)

295-362: Excellent refactoring of OTEL handlers.

The refactoring successfully eliminates code duplication by extracting common logic into setup_otel_stream and process_otel_content helpers. All three handlers now follow a consistent pattern and properly support both JSON and Protobuf formats.

coderabbitai bot requested changes Jul 28, 2025

View reviewed changes

src/handlers/http/ingest.rs Outdated Show resolved Hide resolved

src/handlers/http/ingest.rs Outdated Show resolved Hide resolved

src/handlers/http/ingest.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jul 29, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jul 29, 2025

View reviewed changes

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 94e5022 July 29, 2025 05:45

coderabbitai bot previously approved these changes Jul 29, 2025

View reviewed changes

nikhilsinhaparseable added 4 commits July 28, 2025 23:45

add protobuf support for otel logs

03e67b4

add protobuf support for otel metrics and traces

ce66c19

refactor protobuf ingestion code

6f21cba

error handling

b582d9d

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via b582d9d July 29, 2025 06:51

nikhilsinhaparseable force-pushed the protobuf-ingestion branch from 94e5022 to b582d9d Compare July 29, 2025 06:51

coderabbitai bot approved these changes Jul 29, 2025

View reviewed changes

nitisht merged commit 8c8e86b into parseablehq:main Jul 29, 2025
13 checks passed

nikhilsinhaparseable deleted the protobuf-ingestion branch July 29, 2025 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Protobuf ingestion #1391

Protobuf ingestion #1391

Uh oh!

nikhilsinhaparseable commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Protobuf ingestion #1391

Protobuf ingestion #1391

Uh oh!

Conversation

nikhilsinhaparseable commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikhilsinhaparseable commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)