-
Notifications
You must be signed in to change notification settings - Fork 738
SOLR-17797: add prometheus formatter support for segment merge metrics #3407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't check every individual metric in Solr but we do test output structure. This confirms exposition format follows prometheus and our convention of metrics starting with solr_metrics_
. Since this is enabled only metrics from solrconfig.xml, the structure is never tested by default. Not saying you need to write a test to the exact output of this but I think it would be at least beneficial that the structure is asserted from this test which means it needs to be enabled in that class.
Also don't forget ./gradlew tidy
!
* INDEX.merge.errors | ||
* INDEX.merge.minor | ||
* INDEX.merge.minor.running | ||
* INDEX.merge.minor.running.docs | ||
* INDEX.merge.minor.running.segments | ||
* INDEX.merge.major | ||
* INDEX.merge.major.docs | ||
* INDEX.merge.major.deletedDocs | ||
* INDEX.merge.major.running | ||
* INDEX.merge.major.running.docs | ||
* INDEX.merge.major.running.segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to comment every metric here you export. Just a few should be fine. Might also be worth mentioning in the comment that these must be enabled
public static final String CORE_INDEX_FLUSH_METRICS = "solr_metrics_core_index_flush"; | ||
public static final String CORE_INDEX_MERGE_ERRORS_METRICS = "solr_metrics_core_index_merge_errors"; | ||
|
||
public static final String CORE_INDEX_MERGE_METRICS = "solr_metrics_core_index_merges"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric name seems wrong. It's a timer but nothing in this name explains its a timer. Name should be something like solr_core_index_merge_time_ms
.
I actually think the times are ns but this PR fixes up some of the timer implementation to get the quantiles out from this metric and gives a better understanding of the unit being used by converting to ms. I'd like to get that merged in as well so this can pick up the quantiles.
if (metricName.endsWith("sizeInBytes")) { | ||
formatter.exportGauge(CORE_INDEX_METRICS, (Gauge<?>) dropwizardMetric, getLabels()); | ||
} else if (metricName.endsWith("running")) { | ||
formatter.exportGauge(CORE_INDEX_MERGING_METRICS, (Gauge<?>) dropwizardMetric, getLabels()); | ||
} else if (metricName.endsWith("segments")) { | ||
formatter.exportGauge(CORE_INDEX_MERGING_SEGMENTS_METRICS, (Gauge<?>) dropwizardMetric, getLabels()); | ||
} else if (metricName.endsWith("docs")) { | ||
formatter.exportGauge(CORE_INDEX_MERGING_DOCS_METRICS, (Gauge<?>) dropwizardMetric, getLabels()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at these metrics myself, these are very confusing:
# TYPE solr_metrics_core_index_merged_deleted_docs_total counter
solr_metrics_core_index_merged_deleted_docs_total{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
# TYPE solr_metrics_core_index_merged_docs_total counter
solr_metrics_core_index_merged_docs_total{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
# TYPE solr_metrics_core_index_merges gauge
solr_metrics_core_index_merges{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merges{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="false",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merges_running gauge
solr_metrics_core_index_merges_running{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges_running{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merges_running{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merges_running{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merging_docs gauge
solr_metrics_core_index_merging_docs{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_docs{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_docs{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_docs{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
# TYPE solr_metrics_core_index_merging_segments gauge
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",shard="shard1"} 1.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",shard="shard1"} 0.0
They have running=true or false
but at the same time one of the metrics has the name running
in it with a label running=true
. Is it possible that running=false but the metric name says running? Can we aggregate these better? Like one metric name saying these are the running merges then the labels to differentiate by are type=minor/major
and item=docs/segments
. I'm also fine with keeping them separate metric names of docs vs segments
but the running label is confusing as all of them is running=true
and it doesn't seem to ever be running=false
But then also what are these last 2 metrics?
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="major"} 0.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",running="true",shard="shard1",type="minor"} 0.0
solr_metrics_core_index_merging_segments{collection="demo",core="core_demo_shard1_replica_n1",replica="replica_n1",shard="shard1"} 1.0
solr_metrics_core_index_merging_segments{collection="foobar",core="core_foobar_shard1_replica_n1",replica="replica_n1",shard="shard1"} 0.0
It has no label for minor/major
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last two are INDEX.segments
metric (number of segments) that got inadvertently pulled in by the .endsWith convention
if (metricName.endsWith("flush")) { | ||
formatter.exportMeter(CORE_INDEX_FLUSH_METRICS, (Meter) dropwizardMetric, getLabels()); | ||
} else if (metricName.endsWith("docs")) { | ||
formatter.exportMeter(CORE_INDEX_MERGE_DOCS_METRICS, (Meter) dropwizardMetric, getLabels()); | ||
} else if (metricName.endsWith("deletedDocs")) { | ||
formatter.exportMeter(CORE_INDEX_MERGE_DELETED_DOCS_METRICS, (Meter) dropwizardMetric, getLabels()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be merged together? One single metric name with labels operation minor or major
which you are doing then the type like type=merged
or type=errors
. Even though they are Counter and Meter
I believe you can still group them because both Dropwizard meter and counters are just exported to prometheus counters.
https://issues.apache.org/jira/browse/SOLR-17797
Description
Add prometheus formatted metrics for:
INDEX.merge.errors
INDEX.merge.minor
INDEX.merge.minor.running
INDEX.merge.minor.running.docs
INDEX.merge.minor.running.segments
INDEX.merge.major
INDEX.merge.major.docs
INDEX.merge.major.deletedDocs
INDEX.merge.major.running
INDEX.merge.major.running.docs
INDEX.merge.major.running.segments
Solution
Add parsing and appropriate type based export for above listed metrics
Tests
Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.