Skip to content

[metrics] Add per-instance revisit interval histogram for capacity planning#206

Open
lpdink wants to merge 19 commits into
alibaba:mainfrom
lpdink:feature_revisit_interval_stats
Open

[metrics] Add per-instance revisit interval histogram for capacity planning#206
lpdink wants to merge 19 commits into
alibaba:mainfrom
lpdink:feature_revisit_interval_stats

Conversation

@lpdink

@lpdink lpdink commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add online revisit interval histogram tracking to help operators understand cache access patterns and make informed capacity decisions. The histogram records the time interval between consecutive cache hits for each block, exposed as Prometheus-compatible metrics with histogram_quantile() support.

New metrics (per instance):

  • kvcm_revisit_interval_seconds_bucket{le="...",instance_id="..."} — cumulative bucket counts
  • kvcm_revisit_interval_seconds_sum{instance_id="..."} — sum of all intervals (seconds)
  • kvcm_revisit_interval_seconds_count{instance_id="..."} — total observation count

Grafana quantile queries:

histogram_quantile(0.50, rate(kvcm_revisit_interval_seconds_bucket[5m]))
histogram_quantile(0.99, rate(kvcm_revisit_interval_seconds_bucket[5m]))

Motivation

Operators currently rely on aggregate hit rate metrics for capacity planning, which masks the temporal access patterns critical for right-sizing. Without knowing how often blocks are revisited, decisions about cache capacity reduction are guesses. This histogram provides the missing distribution data.

Framework Capabilities Introduced

Beyond the specific feature, this PR introduces reusable infrastructure:

  1. RevisitIntervalHistogram class — A generic Prometheus-compatible histogram built on the existing Counter infrastructure. Can be reused for any interval/distribution tracking need. Thread-safe Observe() uses only atomic operations (~10ns per call).

  2. Histogram family metadata in MetricsRegistry — Explicit registration pattern via RegisterHistogramFamily() / MapMetricToFamily() / GetMetricFamily(). Eliminates implicit naming convention guessing.

  3. Histogram TYPE support in PrometheusExporter — Correctly emits # TYPE histogram for registered families with automatic us→s conversion for _sum metrics.

  4. Virtual SetRevisitHistogram() on MetaStorageBackend — Enables metric injection for any backend type (single-backend or cached mode) without dynamic_cast.

Changes

Category Files Description
Histogram revisit_interval_histogram.h/cc New reusable histogram class
Registry metrics_registry.h/cc Add histogram family metadata APIs
Exporter prometheus_exporter.cc Histogram TYPE output + family query
Hot path meta_local_backend.cc/h Inline Observe() in GetForOneKey()
Injection meta_storage_backend.h, meta_cache_base_backend.h, meta_storage_backend_manager.cc/h, meta_indexer.cc/h, meta_indexer_manager.cc/h Histogram propagation chain
Config server.cc, server_config.cc/h, default_server_config.conf Bucket boundary configuration
Tests revisit_interval_histogram_test.cc, prometheus_exporter_test.cc, BUILD files 251 + 112 lines of new tests

26 files, +820 lines

Integration Test Results

Full-stack verification with 3 simulated model instances, each with a distinct access pattern:

Model Distribution Result
Qwen3.7-Max Uniform 25% × 4 buckets ✅ ≤1s: 25.4%, 1-5s: 24.5%, 5-30s: 24.0%, 30-60s: 25.2%
GLM-5.1 Skewed 40/30/20/10% ✅ ≤1s: 39.5%, 1-5s: 29.3%, 5-30s: 19.8%, 30-60s: 9.6%
DeepSeek-V4-Pro Uniform random 1-70s ✅ 5-30s: 35.6%, 30-60s: 43.7% (matches theoretical)

Qwen3.7-Max (uniform distribution):
Qwen3.7-Max

GLM-5.1 (skewed distribution):
GLM-5.1

DeepSeek-V4-Pro (uniform random 1-70s):
DeepSeek-V4-Pro

Test report and reproduction guide:
https://github.com/lpdink/tair-kvcache/tree/feature_revisit_interval_stats_test/feature_test

Known Limitations

  1. Cardinality: Each instance adds (N buckets + 2) time series. Acceptable for <1000 instances; evaluate for larger deployments.

  2. No cleanup: Histogram counters persist after instance deletion. The touched mechanism filters zero-valued series, but metadata remains.

  3. Empty bucket filtering: Buckets with zero observations are not exported (respects KVCM's touched design for cardinality control).

  4. Bucket boundaries are fixed at startup: Requires restart to change. Runtime adjustment would require significant refactoring.

Testing

Unit tests: RevisitIntervalHistogramTest (251 lines) + PrometheusExporterTest extensions (112 lines) — all passed.

Integration test: Multi-worker traffic generator with strategy pattern, full Prometheus + Grafana stack verification.

Could you please review this pr? @wangxiyu191 @oldsharp

lpdink added 6 commits June 12, 2026 11:43
…tracking

- Add RevisitIntervalHistogram class with configurable bucket boundaries
- Integrate histogram into MetaLocalBackend::GetForOneKey() hot path
- Track revisit intervals per-instance using instance_id label
- Support Prometheus histogram format export
- Add revisit_interval_buckets config in server_config.conf
- Default buckets: 1,5,30,60,120,180,300,600,900,1800,3600,21600,86400 (seconds)

The histogram records the time interval between consecutive cache hits
for each block, enabling capacity planning and cache efficiency analysis.
Critical bugs fixed:
- _sum stored in microseconds (not milliseconds), PrometheusExporter
  converts to seconds on output to avoid 1000x error in quantile calcs
- Histogram TYPE header no longer duplicates when _bucket/_sum/_count
  are different metric names in registry
- Default bucket boundaries hardcoded as fallback when config is empty

Quality improvements:
- Eliminated duplicate ParseRevisitIntervalBuckets; consolidated into
  ServerConfig::ParseRevisitIntervalBuckets static method
- Replaced dynamic_cast with virtual SetRevisitHistogram on
  MetaCacheBaseBackend (default no-op)
- Removed redundant forward declaration in meta_storage_backend_manager.h
- Added missing newline at end of default_server_config.conf

Test updates:
- revisit_interval_histogram_test: sum expectations updated to microseconds
- prometheus_exporter_test: _sum output verified as seconds (15.0 not 15000)
- Use only _bucket suffix to identify histogram families (avoids false
  positives on metrics like meta_indexer.total_key_count)
- Leverage alphabetical ordering: _bucket < _count < _sum, so _bucket
  is always encountered first in the sorted metric iteration
- Fix missed ObserveSingleValue test expectation (3000 -> 3000000 us)
- Add RevisitIntervalHistogramTest target to metrics/test/BUILD
- Fix le label to use clean integer format (1, 60, 3600) instead of
  std::to_string output (1.000000, 60.000000, 3600.000000)
- Update Observe() method comment: sum stored in microseconds, not ms
- Add explicit constraint note about histogram family detection relying
  on alphabetical ordering of metric names
- Add LeLabelFormatting test to verify integer boundaries are clean
…ction with explicit family mapping

- Add histogram_families_ set and metric_to_family_ map to MetricsRegistry
- Add RegisterHistogramFamily, MapMetricToFamily, GetMetricFamily, GetHistogramFamilies APIs
- RevisitIntervalHistogram::Init() explicitly registers family and metric mappings
- PrometheusExporter uses GetMetricFamily() query instead of suffix stripping
- Remove ExtractHistogramFamilyFromBucket and IsHistogramSubMetric helpers
- Exporter no longer contains histogram naming convention logic
- Update tests with family registration and add 3 new test cases
…ndle single-backend mode

- Move SetRevisitHistogram from MetaCacheBaseBackend to MetaStorageBackend base class
- Update MetaStorageBackendManager to forward histogram to both persistent_backend_ and cache_backend_
- This fixes the issue where histogram was not being injected in single-backend mode
- Simplifies the interface by having a single virtual method at the storage backend level

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f22b9aed08

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread kv_cache_manager/meta/meta_local_backend.cc
Comment thread kv_cache_manager/meta/meta_local_backend.cc
Comment thread kv_cache_manager/service/server_config.cc
Comment thread kv_cache_manager/metrics/revisit_interval_histogram.cc

@qoderai qoderai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The PR adds a well-structured per-instance revisit interval histogram with good test coverage (363 lines of tests), clean propagation through the backend chain, and correct Prometheus cumulative semantics. The overall approach is sound.

Issues found (not covered by the existing Codex review):

  1. std::move on a shared registry member (meta_indexer_manager.cc:12) — std::move on the caller's shared_ptr is unnecessary and a readability trap; prefer a plain copy.
  2. Header comment unit mismatch (revisit_interval_histogram.h:63) — sum_counter_ is documented as "milliseconds" but stores microseconds.
  3. Double-parse of bucket config (server.cc:58-63) — ParseRevisitIntervalBuckets runs in both Check() and Init(), with the Init() version silently falling back to defaults even after a successful Check().
  4. _sum us→s conversion is family-agnostic (prometheus_exporter.cc:167-170) — the division by 1e6 applies to any future histogram family's _sum metric, making the "reusable infrastructure" claim fragile.

The Codex bot already flagged the concurrent stored_time/TouchAccessTime() race, missing coverage of the GetLocations(location_ids) path, invalid bucket config silently ignored at parse time, and empty bucket filtering. Those are all still valid and worth addressing.


🤖 Generated by Qoder

Comment thread kv_cache_manager/metrics/revisit_interval_histogram.cc
Comment thread kv_cache_manager/meta/meta_indexer_manager.cc
Comment thread kv_cache_manager/metrics/revisit_interval_histogram.h Outdated
Comment thread kv_cache_manager/service/server.cc
Comment thread kv_cache_manager/metrics/prometheus_exporter.cc
@github-actions github-actions Bot added the ai reviewed AI has reviewed this PR label Jun 12, 2026
…ation_ids) path and fix comment

- Add Observe() call in GetLocations(location_ids) before TouchAccessTime()
  This path bypasses GetForOneKey() and was missing histogram recording
- Fix sum_counter_ comment: milliseconds -> microseconds

@oldsharp oldsharp left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor feedback.

Comment thread kv_cache_manager/metrics/metrics_registry.h Outdated
Comment thread kv_cache_manager/metrics/BUILD
Comment thread kv_cache_manager/metrics/revisit_interval_histogram.cc Outdated
Comment thread kv_cache_manager/metrics/revisit_interval_histogram.cc Outdated
Comment thread kv_cache_manager/metrics/revisit_interval_histogram.cc Outdated
Comment thread kv_cache_manager/service/server.cc
Comment thread kv_cache_manager/service/server_config.h Outdated
Comment thread kv_cache_manager/service/server.cc Outdated
Comment thread kv_cache_manager/meta/meta_cache_base_backend.h Outdated
Comment thread kv_cache_manager/meta/meta_storage_backend.h Outdated
lpdink added 4 commits June 16, 2026 10:26
Revert clang-format/buildifier side-effects on unrelated code:
- metrics_registry.h: DEFINE_METRICS_NAME_ macro reformatting
- metrics/BUILD: deps alphabetical reordering
- meta_indexer.cc: KVCM_METRICS_COLLECTOR_SET_METRICS line breaks (3 places)
- meta_indexer.h: RmwStats comment alignment
- meta_local_backend.cc: DeleteForOneKey collapsed to single line
- meta_local_backend.h: CreateAndInsertIfAbsent/UpdateInPlace declaration wraps
- meta_storage_backend_manager.cc: KVCM_METRICS_COLLECTOR_SET_METRICS line breaks (4 places)
- meta/BUILD: trailing empty line removed
- Replace postfix with prefix increment operators (bucket_counters_[i]++,
  count_counter_++) since the return value is discarded
- Remove unused #include <algorithm>
- Add explanation for 'le' (less-or-equal) Prometheus histogram label
- Fix ParseRevisitIntervalBuckets doc: returns empty vector on error,
  not default boundaries (caller applies defaults)
… ServerConfig

Move hardcoded default bucket boundaries from server.cc to
ServerConfig::GetDefaultRevisitIntervalBuckets() to centralize
configuration and avoid duplication with default_server_config.conf.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b85c5cd461

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread kv_cache_manager/service/server_config.cc Outdated
Comment thread kv_cache_manager/service/server.cc Outdated
Comment on lines +58 to +61
auto boundaries = ServerConfig::ParseRevisitIntervalBuckets(config_.GetRevisitIntervalBuckets());
if (boundaries.empty()) {
// Default bucket boundaries (seconds)
boundaries = {1, 5, 30, 60, 120, 180, 300, 600, 900, 1800, 3600, 21600, 86400};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail startup instead of defaulting invalid buckets

Fresh evidence for re-raising the invalid-config case is that the production startup path in CommandLine::Run calls config.Parse(...) and then server_->Init(config) without invoking ServerConfig::Check() anywhere (confirmed with repo-wide rg "\.Check\("). Therefore an invalid explicit value such as kvcm.metrics.revisit_interval_buckets=abc makes ParseRevisitIntervalBuckets() return empty here and is silently replaced with the defaults, so operators get misleading capacity-planning metrics instead of a startup error.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 41864d9. Added ServerConfig::Check() call in Server::Init() so invalid config fails startup instead of silently falling back to defaults.

lpdink added 2 commits June 16, 2026 11:00
… parsing

- Call ServerConfig::Check() in Server::Init() to catch invalid config
  in production (previously only validated in tests)
- ParseRevisitIntervalBuckets: verify full token consumption via
  std::stod pos parameter (reject '1s', '5abc' etc.)
- Reject non-finite values (NaN, Inf)
…ward declaration

Move forward declaration from meta_cache_base_backend.h to
meta_storage_backend.h (where it is actually used), removing
the inline forward declaration in the virtual member signature.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bcbe9079a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Format "le" (less-or-equal) label — standard Prometheus histogram bucket tag.
// Integers without decimals, non-integers as-is.
double b = boundaries_[i];
bucket_tags["le"] = (b == std::floor(b)) ? std::to_string(static_cast<int64_t>(b)) : std::to_string(b);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve bucket boundary precision in le labels

When a custom bucket list contains two boundaries that differ only beyond std::to_string's six fractional digits, for example 0.1000001,0.1000002, both labels become the same string ("0.100000"). Because the registry keys counters by tags, those two buckets then share one Counter, and Observe() can increment the same exported bucket multiple times for a single hit, corrupting the histogram for sub-second/high-precision bucket configurations. Format le with enough precision (for example max_digits10) or preserve a validated canonical token so distinct boundaries produce distinct labels.

Useful? React with 👍 / 👎.

if (metrics_registry_ && !revisit_boundaries_.empty()) {
auto histogram = std::make_shared<RevisitIntervalHistogram>();
if (histogram->Init(metrics_registry_, revisit_boundaries_, instance_id)) {
indexer->SetRevisitHistogram(histogram);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Install the histogram before starting backend recovery

In cached-backend configurations whose persistent backend is local, MetaIndexer::Init() has already opened the backend manager before this setter runs, and Open() starts AsyncRecoverTask, which calls persistent_backend_->Get() during backfill. That means the recovery thread can be reading MetaLocalBackend::revisit_histogram_ in Get() while this call writes the same shared_ptr with no synchronization, producing a data race (and also counting internal recovery scans as revisits). Attach the histogram before Open() starts recovery, or otherwise synchronize/avoid setting it on the persistent backend while recovery is active.

Useful? React with 👍 / 👎.

Comment on lines +399 to +402
if (revisit_histogram_ && stored_time > 0) {
int64_t now = TimestampUtil::GetCurrentTimeUs();
int64_t interval_us = now - stored_time;
revisit_histogram_->Observe(interval_us);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude reclaimer probes from revisit observations

When the background reclaimer samples LRU candidates, it calls MetaIndexer::GetProperties(... {PROPERTY_LRU_TIME}, ...) (checked kv_cache_manager/manager/cache_reclaimer.cc:791), which reaches this helper even though no client cache hit occurred. With reclaim enabled, those internal sampling reads now record revisit observations for cold keys, so the histogram is biased by the reclaim cadence rather than actual cache reuse; gate observations to external read paths or skip the reclaimer/PROPERTY_LRU_TIME probe path.

Useful? React with 👍 / 👎.

@lpdink

lpdink commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks you for the detailed review! @oldsharp
During the fix process, I discovered some existing code in the repo does not match the output of the project's own .clang-format style. This caused unintended formatting diffs when our editor auto-formatted on save, and I had to revert them manually and add -n for latter commit, because pre-commit hook will exec the fmt, make the workflow upset.

To prevent this from happening in future PRs, Could we consider one of the proposal?

  1. A dedicated format PR: run clang-format --style=file -i across the entire codebase as a one-time cleanup
  2. Format CI check: add clang-format --dry-run --Werror to CI to enforce consistency going forward

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d48532498a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +23 to +24
if (token.empty()) {
continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty revisit bucket tokens

Fresh evidence after the claimed 41864d9 fix is that the current parser still skips empty tokens here, so a configured value like kvcm.metrics.revisit_interval_buckets=1,,5 or ,1,5 produces a non-empty {1,5} vector and therefore passes ServerConfig::Check(). In those typo cases the service starts with bucket boundaries other than the operator requested, which silently mis-buckets the capacity-planning histogram instead of failing startup.

Useful? React with 👍 / 👎.

static const std::string kSumSuffix = "_sum";
if (!current_family.empty() && name.size() >= kSumSuffix.size() &&
name.compare(name.size() - kSumSuffix.size(), kSumSuffix.size(), kSumSuffix) == 0) {
ss << ' ' << static_cast<double>(raw) / 1e6;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve precision when exporting histogram sums

When _sum grows beyond a few significant digits, the default ostringstream precision rounds the converted seconds value; for example 15,000,001 us is emitted as 15, and large cumulative sums can hide per-scrape increases entirely. That makes rate(kvcm_revisit_interval_seconds_sum[...]) and average revisit-interval calculations inaccurate even though the counter stores microseconds precisely; format this value with sufficient double precision, such as max_digits10.

Useful? React with 👍 / 👎.

return INT64_MAX;
}

void MetaStorageBackendManager::SetRevisitHistogram(std::shared_ptr<RevisitIntervalHistogram> histogram) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex MetaStorageBackendManager 这个类之前好像没有,是最近做的什么重构吗?和之前的MetaCacheBackend的区别是什么?逻辑上有什么变化吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看起来有段时间了,2026-04-28在commit 63563bd引入的。 CC @YoungRX

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好问题,这个类的确容易混淆。详细梳理一下:

MetaStorageBackendManager 的由来

在 4 月底之前,MetaIndexer 直接持有一个 std::unique_ptr<MetaStorageBackend> storage_,只支持单后端。YoungRX 在 commit 63563bd(2026-04-28)做了重构,引入了 MetaStorageBackendManager 作为编排层,支持双后端架构。

两个类的区别

MetaStorageBackendManager(编排层 / orchestrator):

  • 管理 persistent backend(如 Redis)+ cache backend(如 Local LRU)两个后端的生命周期
  • 写入路径:先写 persistent(source of truth),再写 cache
  • 读取路径:先读 cache,miss 时 fallback 到 persistent
  • 启动时开异步 recovery 线程,从 persistent 扫描回填到 cache
  • 支持单后端模式(无 cache backend 时直接透传)

MetaCacheBaseBackend(接口层 / abstract interface):

  • 由 changshaozhe.csz 在 commit dfb4690(2026-05-08)引入
  • 是 MetaStorageBackend 和具体缓存实现之间的中间基类
  • 在基础读写 API 之上增加了条件写入(带 previous_error_codes 参数的重载),用于双后端写入时跳过已失败的 key
  • 目前唯一实现是 MetaLocalBackend(基于 LRU Cache 的内存缓存后端)

类层次结构

MetaStorageBackend                 ← 抽象基类(定义所有后端的标准读写 API)
├── MetaCacheBaseBackend           ← 缓存类后端中间基类(增加条件写入)
│   └── MetaLocalBackend           ← LRU 内存缓存实现
├── MetaRedisBackend               ← Redis 持久化后端
├── MetaAsyncRedisBackend          ← 异步 Redis 后端
└── MetaDummyBackend               ← 测试用 dummy

MetaStorageBackendManager          ← 编排器(管理上面的 persistent + cache 后端组合)

本次 PR 的改动

我们只是在这个已有架构上加了一条 SetRevisitHistogram 的传递链:

MetaIndexerManager(创建 histogram 并注入)
  → MetaIndexer::SetRevisitHistogram()
    → MetaStorageBackendManager::SetRevisitHistogram()
      → MetaLocalBackend::SetRevisitHistogram()(实际存储 histogram 指针)

MetaLocalBackend::GetForOneKey()GetLocations() 的 cache 命中路径上,计算 now - last_access_time 作为重访间隔并 Observe 到 histogram 中。

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d48532498a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +68 to +69
if (boundaries_[i] >= interval_s) {
++bucket_counters_[i];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose zero-valued histogram buckets

When an instance only observes intervals above one or more configured boundaries (for example, the first revisit is >1s), those lower bucket counters are never incremented here and therefore keep touched == false; since PrometheusExporter::Expose skips untouched series, /metrics omits the corresponding le buckets instead of exposing them as cumulative zeroes, which makes the Prometheus histogram incomplete and can break histogram_quantile/bucket-based alerts. Touch all bucket series during initialization or teach the exporter to emit zero-valued mapped histogram buckets.

Useful? React with 👍 / 👎.

Comment thread kv_cache_manager/service/server.cc
Comment on lines +399 to +402
if (revisit_histogram_ && stored_time > 0) {
int64_t now = TimestampUtil::GetCurrentTimeUs();
int64_t interval_us = now - stored_time;
revisit_histogram_->Observe(interval_us);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Update access time atomically when recording revisits

Under concurrent reads of the same key, multiple threads can load the same stored_time and each record a long revisit interval before either TouchAccessTime() store becomes visible; the real sequence should record one long interval followed by a near-zero interval. This overcounts cold revisits for hot concurrent keys and skews the capacity-planning histogram, so use an atomic exchange-style update (and the same pattern in the location-id overload) so each observation advances last_access_time exactly once.

Useful? React with 👍 / 👎.

Comment on lines +399 to +402
if (revisit_histogram_ && stored_time > 0) {
int64_t now = TimestampUtil::GetCurrentTimeUs();
int64_t interval_us = now - stored_time;
revisit_histogram_->Observe(interval_us);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip reclaimer reads when observing revisits

When the cache reclaimer samples keys, CacheReclaimer::DoKeySampling calls MetaIndexer::GetProperties(..., {PROPERTY_LRU_TIME}, ...) only to choose eviction candidates, but that path now reaches this Observe call. With reclaim enabled, each sampling cycle records artificial revisit intervals for sampled keys even if no user request hit them, so the capacity-planning histogram is biased by background maintenance traffic; pass context/options through the read path and avoid recording these internal LRU probes.

Useful? React with 👍 / 👎.

lpdink added 2 commits June 16, 2026 13:29
RegistryStorageBackendFactory falls back to local backend when
registry_storage_uri is empty, so Check() should not reject it.
This fixes a regression where calling Check() in Server::Init()
would break default single-node deployments.

Updated tests to reflect that empty registry_storage_uri is valid.
…) path

The GetLocations(location_ids) path is used by internal operations
like FinishWriteCache -> ReadModifyWriteLocation, not user-facing
cache reads. Recording revisit intervals here produces spurious
sub-millisecond observations that corrupt the histogram distribution.

Only GetForOneKey (used by user read paths like GetCacheLocation)
should record revisit intervals.
@oldsharp

Copy link
Copy Markdown
Collaborator

During the fix process, I discovered some existing code in the repo does not match the output of the project's own .clang-format style. This caused unintended formatting diffs when our editor auto-formatted on save, and I had to revert them manually and add -n for latter commit, because pre-commit hook will exec the fmt, make the workflow upset.

Yes, this is a known issue (dating back to the initial commit of this repo); the conventional solution before was: fixing it when the code was touched by, or leaving it as is.

To prevent this from happening in future PRs, Could we consider one of the proposal?

  1. A dedicated format PR: run clang-format --style=file -i across the entire codebase as a one-time cleanup
  2. Format CI check: add clang-format --dry-run --Werror to CI to enforce consistency going forward

Practical suggestion, think we can do both 1 and 2 (in a follow-up work) to resolve this kind of issues thoroughly and in a one-time manner.

BTW: might also consider fixing the missing <EOL> at the end of files and enforcing it as a rule.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9add15678c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


// Increment sum (stored in microseconds to preserve precision as uint64)
// PrometheusExporter will convert to seconds on output
sum_counter_ += static_cast<uint64_t>(interval_us);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose histogram sums in seconds everywhere

When metrics are consumed through AdminServiceImpl::GetMetrics or LoggingMetricsReporter rather than PrometheusExporter, this counter is read directly from MetricsRegistry while named revisit_interval_seconds_sum; a 15s revisit is stored as 15000000, so _sum/_count averages are 1e6 too high outside /metrics. Store seconds in the registry or centralize the conversion so every metrics output honors the seconds contract.

Useful? React with 👍 / 👎.

@wangxiyu191

Copy link
Copy Markdown
Collaborator

能不能把这个分桶配置项也做到instance group的配置上?server层配置改成default的分桶配置,如果instance group配了,就走instance group的。如果中途修改instance group上的这个配置,不影响内存里已经有的存量instance,但是新创建的instance会用新的配置。 @lpdink

@lpdink

lpdink commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

能不能把这个分桶配置项也做到instance group的配置上?server层配置改成default的分桶配置,如果instance group配了,就走instance group的。如果中途修改instance group上的这个配置,不影响内存里已经有的存量instance,但是新创建的instance会用新的配置。 @lpdink

@wangxiyu191 好的,改动量可控,我来支持下,重新走下集成测试流程。

lpdink added 4 commits June 16, 2026 17:29
…onfiguration

Support per-instance-group revisit_interval_buckets configuration:
- InstanceGroup gains optional revisit_interval_buckets field (string)
- ParseRevisitIntervalBuckets validates: positive, ascending, finite,
  full token consumption, rejects empty tokens
- CreateMetaIndexer accepts optional boundaries override; resolves
  per-instance > global default at creation time
- TryCreateMetaSearcher looks up InstanceGroup boundaries and passes
  them through to CreateMetaIndexer
- Existing instances unaffected by group config changes (boundaries
  resolved once at creation, immutable thereafter)
- Server-level kvcm.metrics.revisit_interval_buckets remains as default

Tests: 14 instance_group tests + 3 meta_indexer_manager tests.
Full suite: 85 pass, 0 failures.
…visit buckets

Review fixes:
- Extract ParseBucketBoundaries to StringUtil (shared by ServerConfig
  and InstanceGroup), eliminating ~40 lines of duplication
- InstanceGroup now stores parsed vector<double> (not raw string),
  parsed once at write time via set_revisit_interval_buckets()
- TryCreateMetaSearcher: single InstanceGroup lookup (was double),
  uses pre-parsed boundaries directly (zero overhead at creation)
- ServerConfig::ParseRevisitIntervalBuckets delegates to StringUtil
- Trailing/leading comma rejection in ParseBucketBoundaries
- meta_indexer_manager tests verify actual histogram boundaries
  via GetBoundaries() (were previously shallow nullptr checks)
- instance_group_test: added trailing comma, invalid JSON, and
  set_revisit_interval_buckets validation tests

Full suite: 85 pass, 0 failures.
P0: Add revisit_interval_buckets to admin_service.proto (field 10),
    update InstanceGroupToProto/FromProto conversion functions.
    API path (gRPC + HTTP) now fully functional.

P1: Update spec to match warn+fallback behavior for invalid boundaries
    (per user instruction — don't block production path for observability
    config, but do validate and warn at write time).

P2: Remove dead RegistryManager::GetCacheConfig() — zero production
    callers after TryCreateMetaSearcher refactor. Updated test.

P3: Fix missing EOF newline in instance_group.h.

Full suite: 85 pass, 0 failures.
…egression tests

- FromRapidValue now calls set_revisit_interval_buckets() instead of
  duplicating parse+warn logic inline (DRY, single point of change)
- ServerConfig::Check() no longer prints duplicate error message
  (ParseRevisitIntervalBuckets already prints on failure)
- Added immutability regression test: existing instance's histogram
  boundaries unchanged after SetRevisitHistogramConfig called with
  new defaults (guards against future hot-reload mistakes)
- Added proto round-trip tests: InstanceGroupToProto/FromProto
  preserves revisit_interval_buckets through gRPC/HTTP API path

Full suite: 85 pass, 0 failures.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f6fe78307

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +76 to +80
parsed_revisit_interval_buckets_ = StringUtil::ParseBucketBoundaries(buckets_str);
if (parsed_revisit_interval_buckets_.empty()) {
KVCM_LOG_WARN("InstanceGroup [%s]: invalid revisit_interval_buckets '%s', will use server default",
name_.c_str(),
buckets_str.c_str());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject invalid per-group revisit buckets

When an admin CreateInstanceGroup/UpdateInstanceGroup request includes a malformed revisit_interval_buckets value such as 1,,5 or 5,1, this setter preserves the raw string but collapses the parsed value to an empty vector; ValidateRequiredFields() does not reject it, so the invalid override is stored and MetaSearcherManager later treats the empty vector as “use server default.” That makes the API-visible group config disagree with the buckets actually used for capacity-planning metrics, so non-empty invalid bucket strings should fail validation instead of falling back silently.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

刻意设计的行为,可观测性不阻碍生产路径可行。

Comment on lines +58 to +59
if (interval_us <= 0) {
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count same-microsecond revisits

When two successful reads of a hot key land in the same microsecond, TimestampUtil::GetCurrentTimeUs() produces the same timestamp and the revisit interval is exactly 0, which is a valid observation that should fall into every positive bucket and increment _count. Returning here drops those hottest revisits entirely, biasing the histogram toward longer intervals; only negative intervals from clock movement should be discarded.

Useful? React with 👍 / 👎.

Comment on lines +399 to +402
if (revisit_histogram_ && stored_time > 0) {
int64_t now = TimestampUtil::GetCurrentTimeUs();
int64_t interval_us = now - stored_time;
revisit_histogram_->Observe(interval_us);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip write-filter lookups when observing revisits

When clients follow the normal read-then-write flow, CacheManager::FilterWriteCache and FilterWriteCacheWithMinReplica call BatchGetLocation only to decide which blocks already exist before StartWriteCache writes the missing ones, but that metadata probe reaches this unconditional Observe() call. Existing prefix blocks were already counted by the preceding cache read, so the write filter records an extra artificial revisit for each existing block and biases the histogram toward shorter intervals; pass read intent through the stack or restrict observation to actual cache-read APIs.

Useful? React with 👍 / 👎.

@lpdink

lpdink commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

能不能把这个分桶配置项也做到instance group的配置上?server层配置改成default的分桶配置,如果instance group配了,就走instance group的。如果中途修改instance group上的这个配置,不影响内存里已经有的存量instance,但是新创建的instance会用新的配置。

@wangxiyu191
In short, implement:
[metrics] Per-instance-group revisit interval bucket configuration
in c965323...3f6fe78

and add a Integration test at https://github.com/lpdink/tair-kvcache/tree/feature_revisit_interval_stats_on_instance_test/feature_test

Summary

Extend the revisit interval histogram (PR #206) with per-instance-group bucket boundary configuration. Each instance group can now specify its own revisit_interval_buckets via the admin API; groups without custom config fall back to the server-level default. Boundaries are resolved at instance creation time and remain immutable for the lifetime of the instance — updating a group's config does not affect existing instances.

New API field:

  • InstanceGroup.revisit_interval_buckets (string, e.g. "0.5,1,2,5") — configurable via CreateInstanceGroup / UpdateInstanceGroup (gRPC + HTTP)

Resolution priority:

InstanceGroup.revisit_interval_buckets  →  server-level kvcm.metrics.revisit_interval_buckets  →  hardcoded default

Motivation

PR #206 introduced a single server-level bucket configuration applied uniformly to all instances. In production, different model deployments exhibit distinct access patterns — a serving model may have sub-second revisits while a batch model revisits on the order of minutes. A one-size-fits-all bucket scheme wastes resolution where it matters most.

Per-group configuration allows operators to:

  • Tune histogram resolution per workload type without restarting the server
  • Use the same KVCM cluster for heterogeneous workloads with appropriate metrics granularity
  • Adjust bucket boundaries dynamically via the admin API as access patterns evolve

Usage

1. Create instance group with custom buckets

curl -X POST http://localhost:6492/api/createInstanceGroup \
  -H 'Content-Type: application/json' \
  -d '{
    "trace_id": "setup",
    "instance_group": {
      "name": "group-serving",
      "storage_candidates": ["local"],
      "global_quota_group_name": "default",
      "max_instance_count": 100,
      "quota": {"capacity": 1073741824},
      "cache_config": { ... },
      "version": 1,
      "revisit_interval_buckets": "0.1,0.5,1,2,5,10"
    }
  }'

2. Register instance under the group

curl -X POST http://localhost:6382/api/registerInstance \
  -d '{
    "trace_id": "reg",
    "instance_group": "group-serving",
    "instance_id": "model-a-tp0",
    ...
  }'

The instance's histogram will use boundaries [0.1, 0.5, 1, 2, 5, 10].

3. Update group config (existing instances unaffected)

curl -X POST http://localhost:6492/api/updateInstanceGroup \
  -d '{
    "trace_id": "update",
    "current_version": 1,
    "instance_group": {
      "name": "group-serving",
      ...
      "version": 2,
      "revisit_interval_buckets": "0.5,1,5,10,30"
    }
  }'
  • model-a-tp0 (existing): still uses [0.1, 0.5, 1, 2, 5, 10]
  • New instances registered after update: use [0.5, 1, 5, 10, 30]

4. Invalid config handling

Invalid boundaries (non-ascending, negative, trailing chars) are accepted by the API with a WARN log. The group falls back to server-level default — no instance creation is blocked.

Changes

Category Files Description
Config model instance_group.h/cc Add revisit_interval_buckets field (raw string + parsed vector<double>), set_revisit_interval_buckets() validates at write time
Common util string_util.h/cc, BUILD Extract ParseBucketBoundaries() — shared by ServerConfig and InstanceGroup, eliminates ~40 lines of duplication
Resolution meta_indexer_manager.h/cc CreateMetaIndexer accepts optional boundaries param; resolves per-instance > global default at creation time
Lookup meta_searcher_manager.cc Single GetInstanceGroup() call provides both cache_config and revisit_interval_buckets (eliminates double lookup + dead GetCacheConfig())
Proto admin_service.proto Add revisit_interval_buckets as field 10 on InstanceGroup message
Proto util manager_message_proto_util.cc InstanceGroupToProto / InstanceGroupFromProto handle the new field
Server config server_config.cc ParseRevisitIntervalBuckets delegates to StringUtil::ParseBucketBoundaries; Check() no longer prints duplicate error
Tests instance_group_test.cc, meta_indexer_manager_test.cc, BUILD files 17 instance_group tests (parse + JSON + proto round-trip) + 4 meta_indexer_manager tests (default/override/fallback/immutability)

Integration Test

Full-stack verification with 3 instance groups, each with distinct bucket configuration:

Group revisit_interval_buckets Instance Verified Boundaries
group-fast "0.5,1,2,5" inst-fast-1 [0.5, 1.0, 2.0, 5.0]
group-slow "30,60,300,3600" inst-slow-1 [30.0, 60.0, 300.0, 3600.0]
group-default (not set) inst-default-1 ✅ server default [1, 5, 30, 60, 120, 300, 600, 1800, 3600]

Additional verification:

  • Immutability: Update group-fast to "10,20,30"inst-fast-1 boundaries unchanged; new inst-fast-2 uses [10, 20, 30]
  • Invalid config: Set "5,1,30" (non-ascending) → WARN logged, new instance falls back to server default
  • Per-group isolation: Different groups produce different le label sets in /metrics

14/14 tests passed.

Test report and reproduction:
https://github.com/lpdink/tair-kvcache/tree/feature_revisit_interval_stats_on_instance_test/feature_test

Known Limitations

  1. Boundaries immutable after creation: Changing a group's revisit_interval_buckets does not retroactively update existing instances' histograms. This is by design — recreating histograms would lose all historical observation data.

  2. Validation is warn-only: Invalid bucket config does not block API calls. The group silently falls back to server default. This prioritizes availability over strict validation for an observability config.

  3. No per-instance override: Configuration granularity is at the instance group level. Per-instance bucket configuration is not supported.

Testing

Unit tests (85 pass, 0 failures):

  • InstanceGroupTest: 17 tests — parse validation (ascending, positive, finite, full consumption, trailing/leading comma), JSON round-trip, proto round-trip, invalid config handling
  • MetaIndexerManagerTest: 4 tests — default boundaries, per-instance override, empty fallback, immutability after config change

Integration test: 6-phase end-to-end test with dummy storage backend, admin API group creation, multi-worker traffic generation, and /metrics boundary verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai reviewed AI has reviewed this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants