Skip to content

SOLR-17797: add prometheus formatter support for segment merge metrics #3407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liangkaiwen
Copy link

@liangkaiwen liangkaiwen commented Jun 26, 2025

https://issues.apache.org/jira/browse/SOLR-17797

Description

Add prometheus formatted metrics for:

INDEX.merge.errors
INDEX.merge.minor
INDEX.merge.minor.running
INDEX.merge.minor.running.docs
INDEX.merge.minor.running.segments
INDEX.merge.major
INDEX.merge.major.docs
INDEX.merge.major.deletedDocs
INDEX.merge.major.running
INDEX.merge.major.running.docs
INDEX.merge.major.running.segments

Solution

Add parsing and appropriate type based export for above listed metrics

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check every individual metric in Solr but we do test output structure. This confirms exposition format follows prometheus and our convention of metrics starting with solr_metrics_. Since this is enabled only metrics from solrconfig.xml, the structure is never tested by default. Not saying you need to write a test to the exact output of this but I think it would be at least beneficial that the structure is asserted from this test which means it needs to be enabled in that class.

Also don't forget ./gradlew tidy!

Comment on lines +47 to +57
* INDEX.merge.errors
* INDEX.merge.minor
* INDEX.merge.minor.running
* INDEX.merge.minor.running.docs
* INDEX.merge.minor.running.segments
* INDEX.merge.major
* INDEX.merge.major.docs
* INDEX.merge.major.deletedDocs
* INDEX.merge.major.running
* INDEX.merge.major.running.docs
* INDEX.merge.major.running.segments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to comment every metric here you export. Just a few should be fine. Might also be worth mentioning in the comment that these must be enabled

public static final String CORE_INDEX_FLUSH_METRICS = "solr_metrics_core_index_flush";
public static final String CORE_INDEX_MERGE_ERRORS_METRICS = "solr_metrics_core_index_merge_errors";

public static final String CORE_INDEX_MERGE_METRICS = "solr_metrics_core_index_merges";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric name seems wrong. It's a timer but nothing in this name explains its a timer. Name should be something like solr_core_index_merge_time_ms.

I actually think the times are ns but this PR fixes up some of the timer implementation to get the quantiles out from this metric and gives a better understanding of the unit being used by converting to ms. I'd like to get that merged in as well so this can pick up the quantiles.

Comment on lines +80 to +87
if (metricName.endsWith("sizeInBytes")) {
formatter.exportGauge(CORE_INDEX_METRICS, (Gauge<?>) dropwizardMetric, getLabels());
} else if (metricName.endsWith("running")) {
formatter.exportGauge(CORE_INDEX_MERGING_METRICS, (Gauge<?>) dropwizardMetric, getLabels());
} else if (metricName.endsWith("segments")) {
formatter.exportGauge(CORE_INDEX_MERGING_SEGMENTS_METRICS, (Gauge<?>) dropwizardMetric, getLabels());
} else if (metricName.endsWith("docs")) {
formatter.exportGauge(CORE_INDEX_MERGING_DOCS_METRICS, (Gauge<?>) dropwizardMetric, getLabels());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at these metrics myself, these are very confusing:

# TYPE solr_metrics_core_index_merged_deleted_docs_total counter
solr_metrics_core_index_merged_deleted_docs_total{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
# TYPE solr_metrics_core_index_merged_docs_total counter
solr_metrics_core_index_merged_docs_total{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
# TYPE solr_metrics_core_index_merges gauge
solr_metrics_core_index_merges{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merges{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merges_running gauge
solr_metrics_core_index_merges_running{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges_running{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merges_running{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges_running{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merging_docs gauge
solr_metrics_core_index_merging_docs{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_docs{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_docs{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_docs{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merging_segments gauge
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",shard="shard1"} 1.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",shard="shard1"} 0.0

They have running=true or false but at the same time one of the metrics has the name running in it with a label running=true. Is it possible that running=false but the metric name says running? Can we aggregate these better? Like one metric name saying these are the running merges then the labels to differentiate by are type=minor/major and item=docs/segments. I'm also fine with keeping them separate metric names of docs vs segments but the running label is confusing as all of them is running=true and it doesn't seem to ever be running=false But then also what are these last 2 metrics?

solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",shard="shard1"} 1.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",shard="shard1"} 0.0

It has no label for minor/major

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last two are INDEX.segments metric (number of segments) that got inadvertently pulled in by the .endsWith convention

Comment on lines +98 to +103
if (metricName.endsWith("flush")) {
formatter.exportMeter(CORE_INDEX_FLUSH_METRICS, (Meter) dropwizardMetric, getLabels());
} else if (metricName.endsWith("docs")) {
formatter.exportMeter(CORE_INDEX_MERGE_DOCS_METRICS, (Meter) dropwizardMetric, getLabels());
} else if (metricName.endsWith("deletedDocs")) {
formatter.exportMeter(CORE_INDEX_MERGE_DELETED_DOCS_METRICS, (Meter) dropwizardMetric, getLabels());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be merged together? One single metric name with labels operation minor or major which you are doing then the type like type=merged or type=errors. Even though they are Counter and Meter I believe you can still group them because both Dropwizard meter and counters are just exported to prometheus counters.

@mlbiscoc mlbiscoc changed the title add prometheus formatter support for segment merge metrics SOLR-17797: add prometheus formatter support for segment merge metrics Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants