Skip to content

refactor: rehome prometheus operational gauges#401

Open
Chaitu-Tatipamula wants to merge 10 commits intoFilOzone:mainfrom
Chaitu-Tatipamula:rehome-operational-gauges
Open

refactor: rehome prometheus operational gauges#401
Chaitu-Tatipamula wants to merge 10 commits intoFilOzone:mainfrom
Chaitu-Tatipamula:rehome-operational-gauges

Conversation

@Chaitu-Tatipamula
Copy link
Copy Markdown
Collaborator

@Chaitu-Tatipamula Chaitu-Tatipamula commented Mar 26, 2026

Summary

Moves the operational Prometheus gauges (wallet_balance, storage_providers_active, storage_providers_tested) out of MetricsSchedulerService to ensure they survive the upcoming deletion of the legacy metrics module.

What Changed

  • wallet_balance → now uses a scrape-time collect() callback in metrics-prometheus/wallet-balance.collector.ts with a 1-hour TTL cache to avoid hammering the chain RPC.
  • storage_providers_* → Now updated locally as a side-effect at the end of the existing providers.refresh pg-boss job in jobs.service.ts.
  • removed original gauge logic, constructor injections, and unneeded ConfigService from MetricsSchedulerService.

Why

part of the metrics removal epic (#275). MetricsSchedulerService is marked for deletion, but these 3 gauges report live telemetry to Grafana and must be preserved. The new locations follow the strategy agreed upon in the issue thread.

Testing

  • pnpm typecheck
  • pnpm test (all 280 tests pass) ✅
  • pnpm check:ci (Biome) ✅
  • Local tests confirm cache TTL works and chain reads are skipped when DEALBOT_DISABLE_CHAIN=true.

Part of #275

…metheus

move check-metrics.service.ts and check-metric-labels.ts to
metrics-prometheus/ and update all imports across deal/, retrieval/,
data-retention/, deal-addons/, and metrics-prometheus.module.ts.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Rehomes three operational Prometheus gauges (wallet_balance, storage_providers_active, storage_providers_tested) away from the legacy MetricsSchedulerService so they remain available as the legacy metrics module is removed (part of #275).

Changes:

  • Adds a scrape-time wallet_balance collector with a 1-hour TTL cache.
  • Updates storage_providers_* gauges at the end of the existing providers.refresh pg-boss job.
  • Moves check-metric label/metrics helpers to metrics-prometheus/ and updates imports across services/tests; removes the old gauge update logic from MetricsSchedulerService.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
apps/backend/src/retrieval/retrieval.service.ts Switches check metrics imports to metrics-prometheus/.
apps/backend/src/retrieval/retrieval.service.spec.ts Updates test imports to metrics-prometheus/.
apps/backend/src/providers/providers.service.ts New providers API service (DB queries + Curio version fetch).
apps/backend/src/providers/providers.module.ts Registers Providers controller/service + TypeORM repo.
apps/backend/src/providers/providers.controller.ts New /api/v1/providers endpoints (list + Curio version proxy).
apps/backend/src/providers/providers.controller.spec.ts Adds regression coverage for BigInt serialization in provider list.
apps/backend/src/metrics/services/metrics-scheduler.service.ts Removes legacy operational gauge update logic/injections.
apps/backend/src/metrics/controllers/providers.controller.ts Removes list/Curio-version endpoints from legacy metrics controller.
apps/backend/src/metrics-prometheus/wallet-balance.collector.ts New scrape-time collector implementing TTL caching.
apps/backend/src/metrics-prometheus/wallet-balance.collector.spec.ts Unit tests for TTL behavior and chain-disable behavior.
apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts Registers WalletBalanceCollector and updates check-metrics imports.
apps/backend/src/metrics-prometheus/check-metrics.service.ts Fixes relative imports after rehome.
apps/backend/src/metrics-prometheus/check-metric-labels.ts New home for check label builder + failure classification.
apps/backend/src/jobs/jobs.service.ts Injects provider gauges and updates them after providers.refresh.
apps/backend/src/jobs/jobs.service.spec.ts Adds coverage for provider gauge update logic and approval filtering.
apps/backend/src/deal/deal.service.ts Switches check metrics imports to metrics-prometheus/.
apps/backend/src/deal/deal.service.spec.ts Updates test imports to metrics-prometheus/.
apps/backend/src/deal-addons/strategies/ipni.strategy.ts Switches check metrics imports to metrics-prometheus/.
apps/backend/src/deal-addons/strategies/ipni.strategy.spec.ts Updates test imports to metrics-prometheus/.
apps/backend/src/data-retention/data-retention.service.ts Switches check label imports to metrics-prometheus/.
apps/backend/src/data-retention/data-retention.service.spec.ts Updates test imports to metrics-prometheus/.
apps/backend/src/app.module.ts Adds ProvidersModule to the backend app module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Chaitu-Tatipamula Chaitu-Tatipamula force-pushed the rehome-operational-gauges branch 3 times, most recently from 2b2f59d to 4c50515 Compare March 27, 2026 11:06
@BigLep BigLep requested a review from silent-cipher March 27, 2026 15:27
@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@silent-cipher silent-cipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsPrometheusModule does not import WalletSdkModule, even though WalletSdkService required by WalletBalanceCollector. This leads to a startup failure.

We should add the missing module import to resolve this.

@github-project-automation github-project-automation bot moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Mar 30, 2026
- add concurrency lock (refreshPromise) to WalletBalanceCollector to prevent concurrent RPC calls on TTL expiry
- add 1-minute error cooldown (errorCooldownMs) to prevent hammering the RPC during outages
- fix early return in JobsService so DB-backed provider gauges update even when chain integration is disabled
- update unit tests to cover new collector behaviors"
…oldown

- add missing WalletSdkModule import to MetricsPrometheusModule
- extract prometheus wallet balance TTL and error cooldown times into app.config.ts
- update WalletBalanceCollector to read cache timers dynamically from ConfigService"
@Chaitu-Tatipamula Chaitu-Tatipamula force-pushed the rehome-operational-gauges branch from da29217 to c98b701 Compare March 30, 2026 20:08
Copy link
Copy Markdown
Collaborator

@silent-cipher silent-cipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update docs/environment-variables.md with the new variables?
Also, should we accept seconds instead of milliseconds for these values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ⌨️ In Progress

Development

Successfully merging this pull request may close these issues.

5 participants