Skip to content

Feature Request: Custom Metrics Collection and Registration #20

@laplaque

Description

@laplaque

Overview

Extend the Instana plugins architecture to support custom metrics collection beyond the default process monitoring metrics, with TOML-based configuration and multiple OpenTelemetry metric types.

Current Limitations

Fixed Metric Set: All services currently collect the same predefined process metrics:

  • CPU/memory usage, process count, disk I/O, file descriptors, thread counts, context switches

Single Metric Type: All metrics are implemented as OpenTelemetry Observable Gauges, regardless of their semantic meaning

No Extensibility: Services cannot register application-specific or business metrics

No Configuration: The metric_types field in plugin.toml is not actively used for customization

Proposed Solution

1. TOML-Based Custom Metrics Configuration

Extend plugin.toml to support custom metric definitions:

[custom_metrics]
enabled = true

[[custom_metrics.sources]]
name = "jmx_collector"
type = "jmx"
endpoint = "localhost:9999"
metrics = [
    {name = "heap_memory", otel_type = "gauge", unit = "bytes"},
    {name = "thread_pool_size", otel_type = "updowncounter", unit = "1"}
]

[[custom_metrics.sources]]
name = "log_parser"
type = "log_file" 
path = "/var/log/app.log"
metrics = [
    {name = "error_count", otel_type = "counter", pattern = "ERROR"},
    {name = "response_time", otel_type = "histogram", pattern = "Transaction.*completed in (\\d+)ms", buckets = [10, 50, 100, 500, 1000]}
]

[[custom_metrics.sources]]
name = "http_endpoint"
type = "http"
url = "http://localhost:8080/metrics"
format = "json"
interval = 30
metrics = [
    {name = "active_sessions", otel_type = "gauge", json_path = "$.sessions.active"},
    {name = "requests_per_second", otel_type = "counter", json_path = "$.requests.total"}
]

2. Multiple OpenTelemetry Metric Types

Support all OpenTelemetry metric instruments:

Counter (monotonically increasing):

counter = self.meter.create_counter("requests_total")
counter.add(1, {"method": "GET"})

UpDownCounter (can increase/decrease):

updown = self.meter.create_up_down_counter("active_connections") 
updown.add(1)  # connection opened

Histogram (distributions/latencies):

histogram = self.meter.create_histogram("request_duration_ms")
histogram.record(125.3, {"endpoint": "/api/users"})

Observable Gauge (current implementation):

gauge = self.meter.create_observable_gauge("cpu_usage")

3. Pluggable Metric Source System

Built-in Source Types:

  • jmx: Java Management Extensions
  • log_file: Log file parsing with regex patterns
  • http: HTTP/REST endpoint polling
  • database: SQL query execution
  • file: File-based metrics (JSON, CSV, etc.)
  • command: Execute shell commands

Custom Source Plugins: Allow third-party metric collectors

4. Architecture Changes Required

New Components:

  • MetricSourceRegistry: Manage metric source plugins
  • CustomMetricCollector: Orchestrate custom metric collection
  • MetricSourceBase: Abstract base class for metric sources
  • ConfigValidator: Validate TOML metric configurations

Modified Components:

  • base_sensor.py: Integrate custom metrics into monitoring loop
  • otel_connector.py: Support multiple OpenTelemetry metric types
  • toml_utils.py: Parse and validate custom metrics configuration
  • metadata_store.py: Store custom metric metadata

5. Implementation Phases

Phase 1: Core Infrastructure

  • Metric source plugin system
  • TOML configuration parsing
  • Multiple OTel metric type support

Phase 2: Basic Source Types

  • HTTP endpoint polling
  • Log file parsing
  • Command execution

Phase 3: Advanced Sources

  • JMX integration
  • Database connectivity
  • File-based metrics

Phase 4: Advanced Features

  • Custom aggregation functions
  • Metric transformations
  • Conditional collection

Impact Assessment

Complexity: Major architectural change requiring significant development and testing effort

Backward Compatibility: Must maintain existing functionality without breaking changes

Performance: Custom metric collection could impact monitoring overhead

Dependencies: May require additional Python packages (JMX, database drivers, etc.)

Success Criteria

  • Services can define custom metrics via TOML configuration
  • Support for all OpenTelemetry metric types
  • Pluggable architecture for metric sources
  • Zero impact on existing services
  • Comprehensive test coverage for all source types
  • Performance benchmarks showing acceptable overhead

Target Release

Proposed for v1.0.0 as a major feature release after psutil migration stabilization.

Dependencies

  • Completion of psutil migration (recently completed)
  • Stable OpenTelemetry integration
  • TOML configuration framework enhancements

Related Work

This feature request emerged from discussions about extending the current process monitoring capabilities to support application-specific metrics and business KPIs that go beyond standard system metrics.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestmajorRelated to a major fix or architectural change

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions