Skip to content

[Feature Request] TSDB on OpenSearch for large-scale metrics #19461

@yupeng9

Description

@yupeng9

Is your feature request related to a problem? Please describe

OpenSearch is widely adopted for logging and tracing analytics in Observability. And there’s a growing industry trend of a unified observability solution of logging, tracing and metrics. However, OpenSearch is not efficient for storing large-scale metrics. We did a benchmark by ingesting metrics directly into OpenSearch, with one sample per document. The benchmark shows OpenSearch is (1) >3x storage inefficiency (2) 66.5% lower ingestion throughput compared to a purposely built TSDB like M3.

The root cause lies in OpenSearch’s underlying storage engine: Lucene. While Lucene is a powerful general-purpose search library, it is not optimized for metrics workloads.
Considering an example as shown in the Figure below.

Image

There are several inefficiencies with this one-sample-per-Lucene-document design:

  • Label Duplication
    Although Lucene is a column store, each sample document repeats the same set of labels—even though all samples from a time series share identical labels.

  • Lack of Sample Encoding
    Metrics data is highly compressible using delta encoding. However, because timestamps and values are shared across all time series, consecutive samples cannot be efficiently compressed. Samples ingested at the same time across series are stored adjacently, preventing series-level delta encoding.

Describe the solution you'd like

A native time-series index in OpenSearch

To overcome these inefficiencies, we propose a time-series index concept in OpenSearch. Inspired by Prometheus, which groups time series into 2-hour blocks (further split into 20-minute chunks), this design stores chunks of samples for each series as a single Lucene document.

Key benefits:

  • Shared labels: Labels are stored once per chunk, not per sample.
  • Efficient encoding: Consecutive samples can be compressed using techniques like XOR or delta encoding.

As shown in figure below, samples from the same series are grouped into a chunk, enabling compression and efficient retrieval. We’ll elaborate further on the chunk and block details in the design section below.

Image

Real-Time Queryability

A critical requirement is that metrics must be queryable in real time. Lucene supports only Near-Real-Time (NRT) queries, requiring indexed data to be flushed before becoming visible. However, waiting for 20-minute chunks to flush is unacceptable—metrics systems must make the latest sample queryable within seconds for monitoring use cases.

To address this, we introduce a LiveIndexSeries concept, which is an in-memory data structure consuming new data and buffering in memory. The live index allows quick look up and updates via a map structure, this is also useful for handling temporary late arrivals via some out-of-order insertion. The live index can be queried via an IndexReader API that integrates with Lucene.

When we close a live index, we can perform optimizations on the values in the chunk, such as applying the delta-of-delta encoding, so that the closed chunk has an efficient compression format for smaller storage size. Once a chunk is closed, the files become immutable, and it can be loaded into memory via mmap lazily when the chunk is queried.

Image

Ingestion via Metrics Engine

We will also introduce a new MetricsEngine as a plugin, which supports the TS index data structure. For indexing, the MetricsEngine can append the data points to the LiveSeriesIndex. When the flush method is invoked on the engine, it checks if the live index needs to close and it can close it once full. We’ll reuse the translog for recovery, and the samples in the LiveSeriesIndex can be reconstructed during recovery.

Extending OpenSearch query framework

To be able to retrieve the encoded samples, we first filter the relevant chunks by applying the condition on metrics labels during the OpenSearch query phase, and then we retrieve the real samples during the aggregation phase, and perform the pipelines of metric transformation. To be able to support the metrics pipeline, we made some extensions to the OpenSearch aggregators with two additions:

  • UnfoldAggregator, this aggregator is able to unfold the chunks and LiveSeriesIndex of a time series into samples during the collection phase, and then perform a series of Stage functions such as floor or sum. The aggregated results are then sent to the coordinator for global aggregation
Image
  • CoordinatorPipelineAggregator, this is an extension to SiblingPipelineAggregator for coordinator-only Stage transformations. Coordinator pipeline is crucial to support multiple bucket paths. In addition, Coordinator Pipeline also supports macro definition, so that a named macros can be referenced by the main pipeline. Such as the following in M3: macros: {e = a | asPercent(b)}, main pipeline: c | asPercent(e)

Metrics Language Support

Since we extended OpenSearch DSL for metric query execution, we also need to build an adapter to support the available metric languages like PromQL and M3QL. For each language, we provide a language parser and planner, which can translate into the OpenSearch DSL with the extended operators aforementioned.

Related component

Extensions

Describe alternatives you've considered

No response

Additional context

TSDB Plugin

We plan to implement this TSDB as a plugin to OpenSearch, and open source it once the core is complete. Also, we will share more detailed technical specifications as we progress more in this effort.

Contributors

Metadata

Metadata

Assignees

Labels

RFCIssues requesting major changesenhancementEnhancement or improvement to existing feature or requestuntriaged

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions