[metrics] Add per-instance revisit interval histogram for capacity planning#206
[metrics] Add per-instance revisit interval histogram for capacity planning#206lpdink wants to merge 19 commits into
Conversation
…tracking - Add RevisitIntervalHistogram class with configurable bucket boundaries - Integrate histogram into MetaLocalBackend::GetForOneKey() hot path - Track revisit intervals per-instance using instance_id label - Support Prometheus histogram format export - Add revisit_interval_buckets config in server_config.conf - Default buckets: 1,5,30,60,120,180,300,600,900,1800,3600,21600,86400 (seconds) The histogram records the time interval between consecutive cache hits for each block, enabling capacity planning and cache efficiency analysis.
Critical bugs fixed: - _sum stored in microseconds (not milliseconds), PrometheusExporter converts to seconds on output to avoid 1000x error in quantile calcs - Histogram TYPE header no longer duplicates when _bucket/_sum/_count are different metric names in registry - Default bucket boundaries hardcoded as fallback when config is empty Quality improvements: - Eliminated duplicate ParseRevisitIntervalBuckets; consolidated into ServerConfig::ParseRevisitIntervalBuckets static method - Replaced dynamic_cast with virtual SetRevisitHistogram on MetaCacheBaseBackend (default no-op) - Removed redundant forward declaration in meta_storage_backend_manager.h - Added missing newline at end of default_server_config.conf Test updates: - revisit_interval_histogram_test: sum expectations updated to microseconds - prometheus_exporter_test: _sum output verified as seconds (15.0 not 15000)
- Use only _bucket suffix to identify histogram families (avoids false positives on metrics like meta_indexer.total_key_count) - Leverage alphabetical ordering: _bucket < _count < _sum, so _bucket is always encountered first in the sorted metric iteration - Fix missed ObserveSingleValue test expectation (3000 -> 3000000 us) - Add RevisitIntervalHistogramTest target to metrics/test/BUILD
- Fix le label to use clean integer format (1, 60, 3600) instead of std::to_string output (1.000000, 60.000000, 3600.000000) - Update Observe() method comment: sum stored in microseconds, not ms - Add explicit constraint note about histogram family detection relying on alphabetical ordering of metric names - Add LeLabelFormatting test to verify integer boundaries are clean
…ction with explicit family mapping - Add histogram_families_ set and metric_to_family_ map to MetricsRegistry - Add RegisterHistogramFamily, MapMetricToFamily, GetMetricFamily, GetHistogramFamilies APIs - RevisitIntervalHistogram::Init() explicitly registers family and metric mappings - PrometheusExporter uses GetMetricFamily() query instead of suffix stripping - Remove ExtractHistogramFamilyFromBucket and IsHistogramSubMetric helpers - Exporter no longer contains histogram naming convention logic - Update tests with family registration and add 3 new test cases
…ndle single-backend mode - Move SetRevisitHistogram from MetaCacheBaseBackend to MetaStorageBackend base class - Update MetaStorageBackendManager to forward histogram to both persistent_backend_ and cache_backend_ - This fixes the issue where histogram was not being injected in single-backend mode - Simplifies the interface by having a single virtual method at the storage backend level
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f22b9aed08
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Review Summary
The PR adds a well-structured per-instance revisit interval histogram with good test coverage (363 lines of tests), clean propagation through the backend chain, and correct Prometheus cumulative semantics. The overall approach is sound.
Issues found (not covered by the existing Codex review):
std::moveon a shared registry member (meta_indexer_manager.cc:12) —std::moveon the caller'sshared_ptris unnecessary and a readability trap; prefer a plain copy.- Header comment unit mismatch (
revisit_interval_histogram.h:63) —sum_counter_is documented as "milliseconds" but stores microseconds. - Double-parse of bucket config (
server.cc:58-63) —ParseRevisitIntervalBucketsruns in bothCheck()andInit(), with theInit()version silently falling back to defaults even after a successfulCheck(). _sumus→s conversion is family-agnostic (prometheus_exporter.cc:167-170) — the division by1e6applies to any future histogram family's_summetric, making the "reusable infrastructure" claim fragile.
The Codex bot already flagged the concurrent stored_time/TouchAccessTime() race, missing coverage of the GetLocations(location_ids) path, invalid bucket config silently ignored at parse time, and empty bucket filtering. Those are all still valid and worth addressing.
🤖 Generated by Qoder
…ation_ids) path and fix comment - Add Observe() call in GetLocations(location_ids) before TouchAccessTime() This path bypasses GetForOneKey() and was missing histogram recording - Fix sum_counter_ comment: milliseconds -> microseconds
oldsharp
left a comment
There was a problem hiding this comment.
Added some minor feedback.
Revert clang-format/buildifier side-effects on unrelated code: - metrics_registry.h: DEFINE_METRICS_NAME_ macro reformatting - metrics/BUILD: deps alphabetical reordering - meta_indexer.cc: KVCM_METRICS_COLLECTOR_SET_METRICS line breaks (3 places) - meta_indexer.h: RmwStats comment alignment - meta_local_backend.cc: DeleteForOneKey collapsed to single line - meta_local_backend.h: CreateAndInsertIfAbsent/UpdateInPlace declaration wraps - meta_storage_backend_manager.cc: KVCM_METRICS_COLLECTOR_SET_METRICS line breaks (4 places) - meta/BUILD: trailing empty line removed
- Replace postfix with prefix increment operators (bucket_counters_[i]++, count_counter_++) since the return value is discarded - Remove unused #include <algorithm>
- Add explanation for 'le' (less-or-equal) Prometheus histogram label - Fix ParseRevisitIntervalBuckets doc: returns empty vector on error, not default boundaries (caller applies defaults)
… ServerConfig Move hardcoded default bucket boundaries from server.cc to ServerConfig::GetDefaultRevisitIntervalBuckets() to centralize configuration and avoid duplication with default_server_config.conf.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b85c5cd461
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| auto boundaries = ServerConfig::ParseRevisitIntervalBuckets(config_.GetRevisitIntervalBuckets()); | ||
| if (boundaries.empty()) { | ||
| // Default bucket boundaries (seconds) | ||
| boundaries = {1, 5, 30, 60, 120, 180, 300, 600, 900, 1800, 3600, 21600, 86400}; |
There was a problem hiding this comment.
Fail startup instead of defaulting invalid buckets
Fresh evidence for re-raising the invalid-config case is that the production startup path in CommandLine::Run calls config.Parse(...) and then server_->Init(config) without invoking ServerConfig::Check() anywhere (confirmed with repo-wide rg "\.Check\("). Therefore an invalid explicit value such as kvcm.metrics.revisit_interval_buckets=abc makes ParseRevisitIntervalBuckets() return empty here and is silently replaced with the defaults, so operators get misleading capacity-planning metrics instead of a startup error.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 41864d9. Added ServerConfig::Check() call in Server::Init() so invalid config fails startup instead of silently falling back to defaults.
… parsing - Call ServerConfig::Check() in Server::Init() to catch invalid config in production (previously only validated in tests) - ParseRevisitIntervalBuckets: verify full token consumption via std::stod pos parameter (reject '1s', '5abc' etc.) - Reject non-finite values (NaN, Inf)
…ward declaration Move forward declaration from meta_cache_base_backend.h to meta_storage_backend.h (where it is actually used), removing the inline forward declaration in the virtual member signature.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1bcbe9079a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Format "le" (less-or-equal) label — standard Prometheus histogram bucket tag. | ||
| // Integers without decimals, non-integers as-is. | ||
| double b = boundaries_[i]; | ||
| bucket_tags["le"] = (b == std::floor(b)) ? std::to_string(static_cast<int64_t>(b)) : std::to_string(b); |
There was a problem hiding this comment.
Preserve bucket boundary precision in le labels
When a custom bucket list contains two boundaries that differ only beyond std::to_string's six fractional digits, for example 0.1000001,0.1000002, both labels become the same string ("0.100000"). Because the registry keys counters by tags, those two buckets then share one Counter, and Observe() can increment the same exported bucket multiple times for a single hit, corrupting the histogram for sub-second/high-precision bucket configurations. Format le with enough precision (for example max_digits10) or preserve a validated canonical token so distinct boundaries produce distinct labels.
Useful? React with 👍 / 👎.
| if (metrics_registry_ && !revisit_boundaries_.empty()) { | ||
| auto histogram = std::make_shared<RevisitIntervalHistogram>(); | ||
| if (histogram->Init(metrics_registry_, revisit_boundaries_, instance_id)) { | ||
| indexer->SetRevisitHistogram(histogram); |
There was a problem hiding this comment.
Install the histogram before starting backend recovery
In cached-backend configurations whose persistent backend is local, MetaIndexer::Init() has already opened the backend manager before this setter runs, and Open() starts AsyncRecoverTask, which calls persistent_backend_->Get() during backfill. That means the recovery thread can be reading MetaLocalBackend::revisit_histogram_ in Get() while this call writes the same shared_ptr with no synchronization, producing a data race (and also counting internal recovery scans as revisits). Attach the histogram before Open() starts recovery, or otherwise synchronize/avoid setting it on the persistent backend while recovery is active.
Useful? React with 👍 / 👎.
| if (revisit_histogram_ && stored_time > 0) { | ||
| int64_t now = TimestampUtil::GetCurrentTimeUs(); | ||
| int64_t interval_us = now - stored_time; | ||
| revisit_histogram_->Observe(interval_us); |
There was a problem hiding this comment.
Exclude reclaimer probes from revisit observations
When the background reclaimer samples LRU candidates, it calls MetaIndexer::GetProperties(... {PROPERTY_LRU_TIME}, ...) (checked kv_cache_manager/manager/cache_reclaimer.cc:791), which reaches this helper even though no client cache hit occurred. With reclaim enabled, those internal sampling reads now record revisit observations for cold keys, so the histogram is biased by the reclaim cadence rather than actual cache reuse; gate observations to external read paths or skip the reclaimer/PROPERTY_LRU_TIME probe path.
Useful? React with 👍 / 👎.
|
Thanks you for the detailed review! @oldsharp To prevent this from happening in future PRs, Could we consider one of the proposal?
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d48532498a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (token.empty()) { | ||
| continue; |
There was a problem hiding this comment.
Reject empty revisit bucket tokens
Fresh evidence after the claimed 41864d9 fix is that the current parser still skips empty tokens here, so a configured value like kvcm.metrics.revisit_interval_buckets=1,,5 or ,1,5 produces a non-empty {1,5} vector and therefore passes ServerConfig::Check(). In those typo cases the service starts with bucket boundaries other than the operator requested, which silently mis-buckets the capacity-planning histogram instead of failing startup.
Useful? React with 👍 / 👎.
| static const std::string kSumSuffix = "_sum"; | ||
| if (!current_family.empty() && name.size() >= kSumSuffix.size() && | ||
| name.compare(name.size() - kSumSuffix.size(), kSumSuffix.size(), kSumSuffix) == 0) { | ||
| ss << ' ' << static_cast<double>(raw) / 1e6; |
There was a problem hiding this comment.
Preserve precision when exporting histogram sums
When _sum grows beyond a few significant digits, the default ostringstream precision rounds the converted seconds value; for example 15,000,001 us is emitted as 15, and large cumulative sums can hide per-scrape increases entirely. That makes rate(kvcm_revisit_interval_seconds_sum[...]) and average revisit-interval calculations inaccurate even though the counter stores microseconds precisely; format this value with sufficient double precision, such as max_digits10.
Useful? React with 👍 / 👎.
| return INT64_MAX; | ||
| } | ||
|
|
||
| void MetaStorageBackendManager::SetRevisitHistogram(std::shared_ptr<RevisitIntervalHistogram> histogram) { |
There was a problem hiding this comment.
@codex MetaStorageBackendManager 这个类之前好像没有,是最近做的什么重构吗?和之前的MetaCacheBackend的区别是什么?逻辑上有什么变化吗?
There was a problem hiding this comment.
看起来有段时间了,2026-04-28在commit 63563bd引入的。 CC @YoungRX
There was a problem hiding this comment.
好问题,这个类的确容易混淆。详细梳理一下:
MetaStorageBackendManager 的由来
在 4 月底之前,MetaIndexer 直接持有一个 std::unique_ptr<MetaStorageBackend> storage_,只支持单后端。YoungRX 在 commit 63563bd(2026-04-28)做了重构,引入了 MetaStorageBackendManager 作为编排层,支持双后端架构。
两个类的区别
MetaStorageBackendManager(编排层 / orchestrator):
- 管理 persistent backend(如 Redis)+ cache backend(如 Local LRU)两个后端的生命周期
- 写入路径:先写 persistent(source of truth),再写 cache
- 读取路径:先读 cache,miss 时 fallback 到 persistent
- 启动时开异步 recovery 线程,从 persistent 扫描回填到 cache
- 支持单后端模式(无 cache backend 时直接透传)
MetaCacheBaseBackend(接口层 / abstract interface):
- 由 changshaozhe.csz 在 commit
dfb4690(2026-05-08)引入 - 是 MetaStorageBackend 和具体缓存实现之间的中间基类
- 在基础读写 API 之上增加了条件写入(带
previous_error_codes参数的重载),用于双后端写入时跳过已失败的 key - 目前唯一实现是 MetaLocalBackend(基于 LRU Cache 的内存缓存后端)
类层次结构
MetaStorageBackend ← 抽象基类(定义所有后端的标准读写 API)
├── MetaCacheBaseBackend ← 缓存类后端中间基类(增加条件写入)
│ └── MetaLocalBackend ← LRU 内存缓存实现
├── MetaRedisBackend ← Redis 持久化后端
├── MetaAsyncRedisBackend ← 异步 Redis 后端
└── MetaDummyBackend ← 测试用 dummy
MetaStorageBackendManager ← 编排器(管理上面的 persistent + cache 后端组合)
本次 PR 的改动
我们只是在这个已有架构上加了一条 SetRevisitHistogram 的传递链:
MetaIndexerManager(创建 histogram 并注入)
→ MetaIndexer::SetRevisitHistogram()
→ MetaStorageBackendManager::SetRevisitHistogram()
→ MetaLocalBackend::SetRevisitHistogram()(实际存储 histogram 指针)
在 MetaLocalBackend::GetForOneKey() 和 GetLocations() 的 cache 命中路径上,计算 now - last_access_time 作为重访间隔并 Observe 到 histogram 中。
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d48532498a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (boundaries_[i] >= interval_s) { | ||
| ++bucket_counters_[i]; |
There was a problem hiding this comment.
Expose zero-valued histogram buckets
When an instance only observes intervals above one or more configured boundaries (for example, the first revisit is >1s), those lower bucket counters are never incremented here and therefore keep touched == false; since PrometheusExporter::Expose skips untouched series, /metrics omits the corresponding le buckets instead of exposing them as cumulative zeroes, which makes the Prometheus histogram incomplete and can break histogram_quantile/bucket-based alerts. Touch all bucket series during initialization or teach the exporter to emit zero-valued mapped histogram buckets.
Useful? React with 👍 / 👎.
| if (revisit_histogram_ && stored_time > 0) { | ||
| int64_t now = TimestampUtil::GetCurrentTimeUs(); | ||
| int64_t interval_us = now - stored_time; | ||
| revisit_histogram_->Observe(interval_us); |
There was a problem hiding this comment.
Update access time atomically when recording revisits
Under concurrent reads of the same key, multiple threads can load the same stored_time and each record a long revisit interval before either TouchAccessTime() store becomes visible; the real sequence should record one long interval followed by a near-zero interval. This overcounts cold revisits for hot concurrent keys and skews the capacity-planning histogram, so use an atomic exchange-style update (and the same pattern in the location-id overload) so each observation advances last_access_time exactly once.
Useful? React with 👍 / 👎.
| if (revisit_histogram_ && stored_time > 0) { | ||
| int64_t now = TimestampUtil::GetCurrentTimeUs(); | ||
| int64_t interval_us = now - stored_time; | ||
| revisit_histogram_->Observe(interval_us); |
There was a problem hiding this comment.
Skip reclaimer reads when observing revisits
When the cache reclaimer samples keys, CacheReclaimer::DoKeySampling calls MetaIndexer::GetProperties(..., {PROPERTY_LRU_TIME}, ...) only to choose eviction candidates, but that path now reaches this Observe call. With reclaim enabled, each sampling cycle records artificial revisit intervals for sampled keys even if no user request hit them, so the capacity-planning histogram is biased by background maintenance traffic; pass context/options through the read path and avoid recording these internal LRU probes.
Useful? React with 👍 / 👎.
RegistryStorageBackendFactory falls back to local backend when registry_storage_uri is empty, so Check() should not reject it. This fixes a regression where calling Check() in Server::Init() would break default single-node deployments. Updated tests to reflect that empty registry_storage_uri is valid.
…) path The GetLocations(location_ids) path is used by internal operations like FinishWriteCache -> ReadModifyWriteLocation, not user-facing cache reads. Recording revisit intervals here produces spurious sub-millisecond observations that corrupt the histogram distribution. Only GetForOneKey (used by user read paths like GetCacheLocation) should record revisit intervals.
Yes, this is a known issue (dating back to the initial commit of this repo); the conventional solution before was: fixing it when the code was touched by, or leaving it as is.
Practical suggestion, think we can do both 1 and 2 (in a follow-up work) to resolve this kind of issues thoroughly and in a one-time manner. BTW: might also consider fixing the missing |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9add15678c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| // Increment sum (stored in microseconds to preserve precision as uint64) | ||
| // PrometheusExporter will convert to seconds on output | ||
| sum_counter_ += static_cast<uint64_t>(interval_us); |
There was a problem hiding this comment.
Expose histogram sums in seconds everywhere
When metrics are consumed through AdminServiceImpl::GetMetrics or LoggingMetricsReporter rather than PrometheusExporter, this counter is read directly from MetricsRegistry while named revisit_interval_seconds_sum; a 15s revisit is stored as 15000000, so _sum/_count averages are 1e6 too high outside /metrics. Store seconds in the registry or centralize the conversion so every metrics output honors the seconds contract.
Useful? React with 👍 / 👎.
|
能不能把这个分桶配置项也做到instance group的配置上?server层配置改成default的分桶配置,如果instance group配了,就走instance group的。如果中途修改instance group上的这个配置,不影响内存里已经有的存量instance,但是新创建的instance会用新的配置。 @lpdink |
@wangxiyu191 好的,改动量可控,我来支持下,重新走下集成测试流程。 |
…onfiguration Support per-instance-group revisit_interval_buckets configuration: - InstanceGroup gains optional revisit_interval_buckets field (string) - ParseRevisitIntervalBuckets validates: positive, ascending, finite, full token consumption, rejects empty tokens - CreateMetaIndexer accepts optional boundaries override; resolves per-instance > global default at creation time - TryCreateMetaSearcher looks up InstanceGroup boundaries and passes them through to CreateMetaIndexer - Existing instances unaffected by group config changes (boundaries resolved once at creation, immutable thereafter) - Server-level kvcm.metrics.revisit_interval_buckets remains as default Tests: 14 instance_group tests + 3 meta_indexer_manager tests. Full suite: 85 pass, 0 failures.
…visit buckets Review fixes: - Extract ParseBucketBoundaries to StringUtil (shared by ServerConfig and InstanceGroup), eliminating ~40 lines of duplication - InstanceGroup now stores parsed vector<double> (not raw string), parsed once at write time via set_revisit_interval_buckets() - TryCreateMetaSearcher: single InstanceGroup lookup (was double), uses pre-parsed boundaries directly (zero overhead at creation) - ServerConfig::ParseRevisitIntervalBuckets delegates to StringUtil - Trailing/leading comma rejection in ParseBucketBoundaries - meta_indexer_manager tests verify actual histogram boundaries via GetBoundaries() (were previously shallow nullptr checks) - instance_group_test: added trailing comma, invalid JSON, and set_revisit_interval_buckets validation tests Full suite: 85 pass, 0 failures.
P0: Add revisit_interval_buckets to admin_service.proto (field 10),
update InstanceGroupToProto/FromProto conversion functions.
API path (gRPC + HTTP) now fully functional.
P1: Update spec to match warn+fallback behavior for invalid boundaries
(per user instruction — don't block production path for observability
config, but do validate and warn at write time).
P2: Remove dead RegistryManager::GetCacheConfig() — zero production
callers after TryCreateMetaSearcher refactor. Updated test.
P3: Fix missing EOF newline in instance_group.h.
Full suite: 85 pass, 0 failures.
…egression tests - FromRapidValue now calls set_revisit_interval_buckets() instead of duplicating parse+warn logic inline (DRY, single point of change) - ServerConfig::Check() no longer prints duplicate error message (ParseRevisitIntervalBuckets already prints on failure) - Added immutability regression test: existing instance's histogram boundaries unchanged after SetRevisitHistogramConfig called with new defaults (guards against future hot-reload mistakes) - Added proto round-trip tests: InstanceGroupToProto/FromProto preserves revisit_interval_buckets through gRPC/HTTP API path Full suite: 85 pass, 0 failures.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f6fe78307
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| parsed_revisit_interval_buckets_ = StringUtil::ParseBucketBoundaries(buckets_str); | ||
| if (parsed_revisit_interval_buckets_.empty()) { | ||
| KVCM_LOG_WARN("InstanceGroup [%s]: invalid revisit_interval_buckets '%s', will use server default", | ||
| name_.c_str(), | ||
| buckets_str.c_str()); |
There was a problem hiding this comment.
Reject invalid per-group revisit buckets
When an admin CreateInstanceGroup/UpdateInstanceGroup request includes a malformed revisit_interval_buckets value such as 1,,5 or 5,1, this setter preserves the raw string but collapses the parsed value to an empty vector; ValidateRequiredFields() does not reject it, so the invalid override is stored and MetaSearcherManager later treats the empty vector as “use server default.” That makes the API-visible group config disagree with the buckets actually used for capacity-planning metrics, so non-empty invalid bucket strings should fail validation instead of falling back silently.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
刻意设计的行为,可观测性不阻碍生产路径可行。
| if (interval_us <= 0) { | ||
| return; |
There was a problem hiding this comment.
Count same-microsecond revisits
When two successful reads of a hot key land in the same microsecond, TimestampUtil::GetCurrentTimeUs() produces the same timestamp and the revisit interval is exactly 0, which is a valid observation that should fall into every positive bucket and increment _count. Returning here drops those hottest revisits entirely, biasing the histogram toward longer intervals; only negative intervals from clock movement should be discarded.
Useful? React with 👍 / 👎.
| if (revisit_histogram_ && stored_time > 0) { | ||
| int64_t now = TimestampUtil::GetCurrentTimeUs(); | ||
| int64_t interval_us = now - stored_time; | ||
| revisit_histogram_->Observe(interval_us); |
There was a problem hiding this comment.
Skip write-filter lookups when observing revisits
When clients follow the normal read-then-write flow, CacheManager::FilterWriteCache and FilterWriteCacheWithMinReplica call BatchGetLocation only to decide which blocks already exist before StartWriteCache writes the missing ones, but that metadata probe reaches this unconditional Observe() call. Existing prefix blocks were already counted by the preceding cache read, so the write filter records an extra artificial revisit for each existing block and biases the histogram toward shorter intervals; pass read intent through the stack or restrict observation to actual cache-read APIs.
Useful? React with 👍 / 👎.
@wangxiyu191 and add a Integration test at https://github.com/lpdink/tair-kvcache/tree/feature_revisit_interval_stats_on_instance_test/feature_test SummaryExtend the revisit interval histogram (PR #206) with per-instance-group bucket boundary configuration. Each instance group can now specify its own New API field:
Resolution priority: MotivationPR #206 introduced a single server-level bucket configuration applied uniformly to all instances. In production, different model deployments exhibit distinct access patterns — a serving model may have sub-second revisits while a batch model revisits on the order of minutes. A one-size-fits-all bucket scheme wastes resolution where it matters most. Per-group configuration allows operators to:
Usage1. Create instance group with custom bucketscurl -X POST http://localhost:6492/api/createInstanceGroup \
-H 'Content-Type: application/json' \
-d '{
"trace_id": "setup",
"instance_group": {
"name": "group-serving",
"storage_candidates": ["local"],
"global_quota_group_name": "default",
"max_instance_count": 100,
"quota": {"capacity": 1073741824},
"cache_config": { ... },
"version": 1,
"revisit_interval_buckets": "0.1,0.5,1,2,5,10"
}
}'2. Register instance under the groupcurl -X POST http://localhost:6382/api/registerInstance \
-d '{
"trace_id": "reg",
"instance_group": "group-serving",
"instance_id": "model-a-tp0",
...
}'The instance's histogram will use boundaries 3. Update group config (existing instances unaffected)curl -X POST http://localhost:6492/api/updateInstanceGroup \
-d '{
"trace_id": "update",
"current_version": 1,
"instance_group": {
"name": "group-serving",
...
"version": 2,
"revisit_interval_buckets": "0.5,1,5,10,30"
}
}'
4. Invalid config handlingInvalid boundaries (non-ascending, negative, trailing chars) are accepted by the API with a WARN log. The group falls back to server-level default — no instance creation is blocked. Changes
Integration TestFull-stack verification with 3 instance groups, each with distinct bucket configuration:
Additional verification:
14/14 tests passed. Test report and reproduction: Known Limitations
TestingUnit tests (85 pass, 0 failures):
Integration test: 6-phase end-to-end test with dummy storage backend, admin API group creation, multi-worker traffic generation, and |
Summary
Add online revisit interval histogram tracking to help operators understand cache access patterns and make informed capacity decisions. The histogram records the time interval between consecutive cache hits for each block, exposed as Prometheus-compatible metrics with
histogram_quantile()support.New metrics (per instance):
kvcm_revisit_interval_seconds_bucket{le="...",instance_id="..."}— cumulative bucket countskvcm_revisit_interval_seconds_sum{instance_id="..."}— sum of all intervals (seconds)kvcm_revisit_interval_seconds_count{instance_id="..."}— total observation countGrafana quantile queries:
Motivation
Operators currently rely on aggregate hit rate metrics for capacity planning, which masks the temporal access patterns critical for right-sizing. Without knowing how often blocks are revisited, decisions about cache capacity reduction are guesses. This histogram provides the missing distribution data.
Framework Capabilities Introduced
Beyond the specific feature, this PR introduces reusable infrastructure:
RevisitIntervalHistogramclass — A generic Prometheus-compatible histogram built on the existing Counter infrastructure. Can be reused for any interval/distribution tracking need. Thread-safeObserve()uses only atomic operations (~10ns per call).Histogram family metadata in
MetricsRegistry— Explicit registration pattern viaRegisterHistogramFamily()/MapMetricToFamily()/GetMetricFamily(). Eliminates implicit naming convention guessing.Histogram
TYPEsupport inPrometheusExporter— Correctly emits# TYPE histogramfor registered families with automatic us→s conversion for_summetrics.Virtual
SetRevisitHistogram()onMetaStorageBackend— Enables metric injection for any backend type (single-backend or cached mode) withoutdynamic_cast.Changes
revisit_interval_histogram.h/ccmetrics_registry.h/ccprometheus_exporter.ccmeta_local_backend.cc/hObserve()inGetForOneKey()meta_storage_backend.h,meta_cache_base_backend.h,meta_storage_backend_manager.cc/h,meta_indexer.cc/h,meta_indexer_manager.cc/hserver.cc,server_config.cc/h,default_server_config.confrevisit_interval_histogram_test.cc,prometheus_exporter_test.cc,BUILDfiles26 files, +820 lines
Integration Test Results
Full-stack verification with 3 simulated model instances, each with a distinct access pattern:
Qwen3.7-Max (uniform distribution):

GLM-5.1 (skewed distribution):

DeepSeek-V4-Pro (uniform random 1-70s):

Test report and reproduction guide:
https://github.com/lpdink/tair-kvcache/tree/feature_revisit_interval_stats_test/feature_test
Known Limitations
Cardinality: Each instance adds
(N buckets + 2)time series. Acceptable for <1000 instances; evaluate for larger deployments.No cleanup: Histogram counters persist after instance deletion. The
touchedmechanism filters zero-valued series, but metadata remains.Empty bucket filtering: Buckets with zero observations are not exported (respects KVCM's
toucheddesign for cardinality control).Bucket boundaries are fixed at startup: Requires restart to change. Runtime adjustment would require significant refactoring.
Testing
Unit tests: RevisitIntervalHistogramTest (251 lines) + PrometheusExporterTest extensions (112 lines) — all passed.
Integration test: Multi-worker traffic generator with strategy pattern, full Prometheus + Grafana stack verification.
Could you please review this pr? @wangxiyu191 @oldsharp