Skip to content

Protobuf ingestion #1391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 29, 2025
Merged

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Jul 28, 2025

Summary by CodeRabbit

  • New Features
    • Added support for ingesting OpenTelemetry logs, metrics, and traces in both JSON and Protobuf formats, automatically detecting the payload type based on the "Content-Type" header.
    • Improved error handling for unsupported or missing content types and oversized Protobuf payloads.
  • Enhancements
    • OpenTelemetry Protobuf payloads are now flattened into JSON for easier downstream processing.
    • Unified and modularized ingestion handlers with consistent stream validation and content processing.
    • Refactored flattening logic for logs, metrics, and traces to support both internal and Protobuf message types with shared processing functions.
    • Exposed log pushing utility function for broader use.
    • Added standardized constants for JSON and Protobuf content types.
  • Dependencies
    • Updated and added dependencies to support Protobuf ingestion and processing.

Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

"""

Walkthrough

Support for ingesting OpenTelemetry logs, metrics, and traces in both JSON and Protobuf formats was added. The ingestion handlers now branch on the "Content-Type" header, using new flattening functions for Protobuf payloads. Dependencies were updated to enable Protobuf support, and relevant flattening utilities were implemented for each OTEL data type. Common validation and stream setup logic were centralized in helper functions for consistency.

Changes

Cohort / File(s) Change Summary
Dependency Updates
Cargo.toml
Updated tonic features to include "prost"; switched opentelemetry-proto from a Git ref to a versioned crate with explicit features; added direct prost dependency.
Ingestion Handler Refactor
src/handlers/http/ingest.rs
Modified OTEL ingestion endpoints to accept raw bytes and support JSON and Protobuf payloads by branching on "Content-Type". Added helper functions setup_otel_stream for stream validation and process_otel_content for content processing including Protobuf decoding and flattening. Updated function signatures and error handling accordingly.
Ingestion Utility Visibility
src/handlers/http/modal/utils/ingest_utils.rs
Made push_logs function public.
Handler Constants Addition
src/handlers/mod.rs
Added public constants CONTENT_TYPE_JSON and CONTENT_TYPE_PROTOBUF for standardized MIME type strings.
OTEL Logs Protobuf Flattening
src/otel/logs.rs
Refactored flatten_otel_logs to use a generic helper process_resource_logs. Added flatten_otel_protobuf to flatten Protobuf OTEL logs by reusing the helper with protobuf-specific accessors.
OTEL Metrics Protobuf Flattening
src/otel/metrics.rs
Refactored flatten_otel_metrics to use a generic helper process_resource_metrics. Added flatten_otel_metrics_protobuf to flatten Protobuf OTEL metrics using the same helper with protobuf-specific accessors.
OTEL Traces Protobuf Flattening & Test Update
src/otel/traces.rs
Added flatten_otel_traces_protobuf and refactored flatten_otel_traces to use a generic helper process_resource_spans. Updated tests to include new entity references in resource metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant FlatteningUtil

    Client->>Server: POST /ingest/{logs,metrics,traces} (Content-Type: application/json or application/x-protobuf)
    Server->>Server: Extract stream name and validate log source
    Server->>Server: Check Content-Type header
    alt Content-Type is JSON
        Server->>FlatteningUtil: flatten_and_push_JSON()
    else Content-Type is Protobuf
        Server->>FlatteningUtil: decode_and_flatten_protobuf()
        FlatteningUtil->>FlatteningUtil: push_logs(flattened_records)
    else Invalid Content-Type
        Server->>Client: Return error response
    end
    Server->>Client: Return HTTP response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

for next release

Poem

🐇
A hop, a leap, a bounding byte—
Now Protobuf and JSON both delight!
Logs, metrics, traces, all can flow,
In flattened form, they swiftly go.
With headers checked and bytes in tow,
This code’s a field where carrots grow!
🌱🥕
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94e5022 and b582d9d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • Cargo.toml (3 hunks)
  • src/handlers/http/ingest.rs (5 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
  • src/handlers/mod.rs (1 hunks)
  • src/otel/logs.rs (3 hunks)
  • src/otel/metrics.rs (6 hunks)
  • src/otel/traces.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/handlers/mod.rs
  • src/otel/logs.rs
  • Cargo.toml
  • src/otel/traces.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
src/handlers/http/ingest.rs (5)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: nikhilsinhaparseable
PR: #1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

Learnt from: nikhilsinhaparseable
PR: #1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.552Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
🔇 Additional comments (9)
src/otel/metrics.rs (4)

18-18: LGTM!

The import of ExportMetricsServiceRequest is necessary for protobuf support and follows the established pattern.


503-514: Well-designed generic abstraction.

The generic helper function effectively abstracts the common traversal and flattening logic for resource metrics. While the function has many parameters, this is justified to support different input types (internal vs protobuf) through accessor functions.


583-596: Excellent refactoring for code reuse.

The refactored function successfully eliminates duplication by delegating to the generic helper with appropriate accessor closures. This maintains the same functionality while enabling code reuse.


598-610: Perfect protobuf integration.

The new function correctly implements protobuf support by reusing the generic helper with appropriate accessor closures for ExportMetricsServiceRequest. This ensures consistent flattening behavior across different input formats.

src/handlers/http/ingest.rs (5)

21-50: LGTM!

The import additions are well-organized and necessary for the protobuf functionality. The inclusion of protobuf message types, flattening functions, and content type constants supports the new unified ingestion approach.


171-241: Excellent consolidation of OTEL stream validation.

The function correctly centralizes stream setup and validation logic. The stream compatibility checks properly implement the learned constraints:

  • For logs: reject streams with metrics/traces log sources
  • For metrics/traces: only allow same type (strict restrictions)

This eliminates code duplication and ensures consistent validation across all OTEL handlers.


311-337: Excellent refactoring for logs ingestion.

The function is now much cleaner and focused, delegating common functionality to the helper functions. The use of ExportLogsServiceRequest::decode and flatten_otel_protobuf is appropriate for logs processing.


339-365: Consistent metrics ingestion refactor.

The function follows the same clean pattern as the logs handler, using ExportMetricsServiceRequest::decode and flatten_otel_metrics_protobuf appropriately for metrics processing.


367-393: Completes consistent OTEL handler pattern.

The function maintains the same clean, consistent pattern as the other OTEL handlers, using ExportTraceServiceRequest::decode and flatten_otel_traces_protobuf for traces processing.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc29387 and a22a12c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • Cargo.toml (3 hunks)
  • src/handlers/http/ingest.rs (8 hunks)
  • src/handlers/http/modal/utils/ingest_utils.rs (1 hunks)
  • src/otel/logs.rs (2 hunks)
  • src/otel/metrics.rs (2 hunks)
  • src/otel/traces.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
src/handlers/http/modal/utils/ingest_utils.rs (1)

Learnt from: nikhilsinhaparseable
PR: #1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.

src/handlers/http/ingest.rs (2)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (16)
Cargo.toml (3)

36-36: LGTM!

Adding the "prost" feature to tonic is necessary for protobuf support and aligns with the PR objective.


79-85: LGTM!

Moving from git dependency to versioned crate (0.30.0) improves stability. The enabled features ("gen-tonic", "with-serde", "logs", "metrics", "trace") are appropriate for the protobuf ingestion functionality.


142-142: LGTM!

Adding prost as a direct dependency is necessary for protobuf message decoding in the application code.

src/handlers/http/modal/utils/ingest_utils.rs (1)

95-95: LGTM!

Making push_logs public is appropriate to support the new protobuf ingestion handlers that need to process individual flattened records after protobuf decoding.

src/otel/logs.rs (2)

21-21: LGTM!

Import is necessary for the new protobuf flattening functionality.


146-176: LGTM!

The flatten_otel_protobuf function correctly implements protobuf support for OTEL logs. The implementation properly:

  • Iterates over resource logs in the protobuf message
  • Extracts resource attributes and metadata
  • Reuses existing flatten_scope_log for consistency
  • Merges resource-level data into individual log records
  • Follows the same pattern as the existing flatten_otel_logs function
src/otel/metrics.rs (2)

18-18: LGTM!

Import is necessary for the new protobuf metrics flattening functionality.


607-661: LGTM!

The flatten_otel_metrics_protobuf function correctly implements protobuf support for OTEL metrics. The implementation properly:

  • Processes resource metrics from the protobuf message
  • Extracts resource and scope-level metadata
  • Reuses existing flatten_metrics_record for consistency
  • Merges hierarchical metadata into individual metric records
  • Follows the same pattern as flatten_otel_metrics but for protobuf format
src/otel/traces.rs (4)

18-18: LGTM!

Import is necessary for the new protobuf traces flattening functionality.


343-343: LGTM!

Adding EntityRef import is necessary for the updated test data construction.


784-791: LGTM!

Adding entity_refs to the test Resource provides more comprehensive test coverage for the complete Resource structure.


938-997: LGTM!

The flatten_otel_traces_protobuf function correctly implements protobuf support for OTEL traces. The implementation properly:

  • Processes resource spans from the protobuf message
  • Extracts resource and scope-level metadata
  • Reuses existing flatten_span_record for consistency
  • Merges hierarchical metadata (scope and resource) into individual span records
  • Follows the same pattern as flatten_otel_traces but for protobuf format
src/handlers/http/ingest.rs (4)

21-21: LGTM! Import changes support the new Protobuf functionality.

The import changes are well-aligned with the new functionality:

  • Removing Json from the web import since handlers now accept raw bytes
  • Adding push_logs for individual record processing in the Protobuf path
  • Adding the necessary OTEL flattening functions and Protobuf message types

Also applies to: 32-32, 36-46


165-165: LGTM! Parameter type change supports both JSON and Protobuf.

Changing from Json<StrictValue> to web::Bytes allows the handler to inspect the raw payload and determine the format based on Content-Type header.


264-264: LGTM! Parameter type change supports both JSON and Protobuf.

Consistent with the logs handler, changing to web::Bytes enables content-type based format detection.


351-351: LGTM! Parameter type change supports both JSON and Protobuf.

Consistent with other OTEL handlers, enabling content-type based format detection.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/handlers/http/ingest.rs (1)

249-290: Improve Content-Type handling for robustness.

The Content-Type handling has several issues that were previously identified and remain unaddressed:

  1. Direct string equality won't handle parameters like application/json; charset=utf-8
  2. No explicit error for unsupported Content-Type values
  3. Size limit only enforced for Protobuf, not JSON
  4. Generic error for missing Content-Type header

Apply this improved implementation:

     match req
         .headers()
         .get("Content-Type")
         .and_then(|h| h.to_str().ok())
     {
         Some(content_type) => {
-            if content_type == CONTENT_TYPE_JSON {
+            if content_type.starts_with(CONTENT_TYPE_JSON) {
+                // Apply size limit to JSON as well
+                if body.len() > MAX_EVENT_PAYLOAD_SIZE {
+                    return Err(PostError::Invalid(anyhow::anyhow!(
+                        "JSON payload size {} exceeds maximum allowed size of {} bytes",
+                        body.len(),
+                        MAX_EVENT_PAYLOAD_SIZE
+                    )));
+                }
                 flatten_and_push_logs(
                     serde_json::from_slice(&body)?,
                     stream_name,
                     log_source,
                     &p_custom_fields,
                 )
                 .await?;
-            } else if content_type == CONTENT_TYPE_PROTOBUF {
+            } else if content_type.starts_with(CONTENT_TYPE_PROTOBUF) {
                 // 10MB limit
                 if body.len() > MAX_EVENT_PAYLOAD_SIZE {
                     return Err(PostError::Invalid(anyhow::anyhow!(
                         "Protobuf message size {} exceeds maximum allowed size of {} bytes",
                         body.len(),
                         MAX_EVENT_PAYLOAD_SIZE
                     )));
                 }
                 match decode_protobuf(body) {
                     Ok(decoded) => {
                         for record in flatten_protobuf(&decoded) {
                             push_logs(stream_name, record, log_source, &p_custom_fields).await?;
                         }
                     }
                     Err(e) => {
                         return Err(PostError::Invalid(anyhow::anyhow!(
                             "Failed to decode protobuf message: {}",
                             e
                         )));
                     }
                 }
+            } else {
+                return Err(PostError::Invalid(anyhow::anyhow!(
+                    "Unsupported Content-Type: {}. Expected {} or {}",
+                    content_type,
+                    CONTENT_TYPE_JSON,
+                    CONTENT_TYPE_PROTOBUF
+                )));
             }
         }
         None => {
-            return Err(PostError::Header(ParseHeaderError::InvalidValue));
+            return Err(PostError::Invalid(anyhow::anyhow!(
+                "Missing Content-Type header. Expected {} or {}",
+                CONTENT_TYPE_JSON,
+                CONTENT_TYPE_PROTOBUF
+            )));
         }
     }
🧹 Nitpick comments (3)
src/otel/metrics.rs (2)

503-514: Consider reducing function parameters for better maintainability.

While the generic approach is good, having 8 function parameters makes the function signature complex and harder to maintain. Consider grouping related functions into a trait or struct.

Consider defining a trait to encapsulate the accessor functions:

-#[allow(clippy::too_many_arguments)]
-fn process_resource_metrics<T, S, M>(
-    resource_metrics: &[T],
-    get_resource: fn(&T) -> Option<&opentelemetry_proto::tonic::resource::v1::Resource>,
-    get_scope_metrics: fn(&T) -> &[S],
-    get_schema_url: fn(&T) -> &str,
-    get_scope: fn(&S) -> Option<&opentelemetry_proto::tonic::common::v1::InstrumentationScope>,
-    get_scope_schema_url: fn(&S) -> &str,
-    get_metrics: fn(&S) -> &[M],
-    get_metric: fn(&M) -> &Metric,
-) -> Vec<Value> {
+trait ResourceMetricsAccessor<S, M> {
+    fn get_resource(&self) -> Option<&opentelemetry_proto::tonic::resource::v1::Resource>;
+    fn get_scope_metrics(&self) -> &[S];
+    fn get_schema_url(&self) -> &str;
+}
+
+trait ScopeMetricsAccessor<M> {
+    fn get_scope(&self) -> Option<&opentelemetry_proto::tonic::common::v1::InstrumentationScope>;
+    fn get_scope_schema_url(&self) -> &str;
+    fn get_metrics(&self) -> &[M];
+}
+
+trait MetricAccessor {
+    fn get_metric(&self) -> &Metric;
+}
+
+fn process_resource_metrics<T, S, M>(
+    resource_metrics: &[T],
+) -> Vec<Value>
+where
+    T: ResourceMetricsAccessor<S, M>,
+    S: ScopeMetricsAccessor<M>,
+    M: MetricAccessor,
+{

This would make the function calls cleaner and more type-safe.


571-577: Optimize cloning of resource metadata.

The clone() operation on line 576 is performed for each metric, which could be inefficient when processing large numbers of metrics. Since resource metadata is the same for all metrics within a resource, consider a more efficient approach.

Consider pre-cloning the resource metadata once per resource:

         for resource_metric_json in &mut vec_scope_metrics_json {
-            for (key, value) in &resource_metrics_json {
-                resource_metric_json.insert(key.clone(), value.clone());
-            }
-
-            vec_otel_json.push(Value::Object(resource_metric_json.clone()));
+            resource_metric_json.extend(resource_metrics_json.clone());
+            vec_otel_json.push(Value::Object(std::mem::take(resource_metric_json)));
         }

Alternatively, consider using Rc or Arc for shared metadata to avoid repeated cloning.

src/handlers/http/ingest.rs (1)

199-225: Stream compatibility validation logic is correct.

The implementation correctly enforces the learned restrictions where OTEL logs can coexist with other log types (except metrics/traces), while OTEL metrics/traces require exact type matching. Consider adding a comment to document these rules for future maintainers.

Add a clarifying comment:

     // Validate stream compatibility
     if let Ok(stream) = PARSEABLE.get_stream(&stream_name) {
         match log_source {
             LogSource::OtelLogs => {
-                // For logs, reject if stream is metrics or traces
+                // OTEL logs can coexist with other log types but not with OTEL metrics/traces
                 stream
                     .get_log_source()
                     .iter()
                     .find(|&stream_log_source_entry| {
                         stream_log_source_entry.log_source_format != LogSource::OtelTraces
                             && stream_log_source_entry.log_source_format != LogSource::OtelMetrics
                     })
                     .ok_or(PostError::IncorrectLogFormat(stream_name.clone()))?;
             }
             LogSource::OtelMetrics | LogSource::OtelTraces => {
-                // For metrics/traces, only allow same type
+                // OTEL metrics/traces require strict type matching - no mixing allowed
                 stream
                     .get_log_source()
                     .iter()
                     .find(|&stream_log_source_entry| {
                         stream_log_source_entry.log_source_format == log_source
                     })
                     .ok_or(PostError::IncorrectLogFormat(stream_name.clone()))?;
             }
             _ => {}
         }
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a22a12c and bf2b2cf.

📒 Files selected for processing (5)
  • src/handlers/http/ingest.rs (6 hunks)
  • src/handlers/mod.rs (1 hunks)
  • src/otel/logs.rs (3 hunks)
  • src/otel/metrics.rs (6 hunks)
  • src/otel/traces.rs (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/handlers/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/otel/logs.rs
  • src/otel/traces.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
src/handlers/http/ingest.rs (5)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

Learnt from: nikhilsinhaparseable
PR: #1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.552Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.395Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
🔇 Additional comments (2)
src/otel/metrics.rs (1)

583-610: LGTM! Clean refactoring that promotes code reuse.

The refactoring successfully extracts common logic while maintaining type safety through closures. Both flatten_otel_metrics and flatten_otel_metrics_protobuf cleanly delegate to the generic helper with appropriate accessors.

src/handlers/http/ingest.rs (1)

295-362: Excellent refactoring of OTEL handlers.

The refactoring successfully eliminates code duplication by extracting common logic into setup_otel_stream and process_otel_content helpers. All three handlers now follow a consistent pattern and properly support both JSON and Protobuf formats.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 29, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 29, 2025
@nitisht nitisht merged commit 8c8e86b into parseablehq:main Jul 29, 2025
13 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the protobuf-ingestion branch July 29, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants