Skip to content

feat: add overdue periods gauge#400

Open
silent-cipher wants to merge 6 commits intomainfrom
fix/stalled-data-retention-counters
Open

feat: add overdue periods gauge#400
silent-cipher wants to merge 6 commits intomainfrom
fix/stalled-data-retention-counters

Conversation

@silent-cipher
Copy link
Copy Markdown
Collaborator

@silent-cipher silent-cipher commented Mar 26, 2026

Summary

Adds a new Prometheus gauge metric pdp_provider_overdue_periods that tracks estimated unrecorded overdue proving periods per provider in real-time. This gauge complements the existing cumulative counters by providing immediate visibility into providers that are behind on submitting proofs, even before the subgraph confirms the faults.

Changes

New Metric

  • pdp_provider_overdue_periods (Gauge): Estimates overdue proving periods by calculating (currentBlock - (nextDeadline + 1) / maxProvingPeriod for each proof set where the deadline has passed
  • Naturally resets to 0 when providers submit proofs and the subgraph catches up
  • Independent of cumulative counter baselines, emitted on every poll

Subgraph Query Enhancement

  • Extended GET_PROVIDERS_WITH_DATASETS query to fetch proofSets with overdue deadlines
  • Added blockNumber parameter to filter proof sets where nextDeadline < currentBlock
  • Fetches nextDeadline, and maxProvingPeriod per proof set

closes #374

Copilot AI review requested due to automatic review settings March 26, 2026 08:33
@FilOzzy FilOzzy added this to FOC Mar 26, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a real-time Prometheus gauge to complement existing PDP data-retention counters by estimating overdue proving periods per provider from subgraph deadlines.

Changes:

  • Introduces pdp_provider_overdue_periods gauge and emits it on every data-retention poll.
  • Extends PDP subgraph providers query to include overdue proofSets filtered by blockNumber.
  • Updates validation/types and adds/extends unit tests for the new query fields and gauge behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
docs/checks/events-and-metrics.md Documents the new pdp_provider_overdue_periods metric.
docs/checks/data-retention.md Describes overdue estimation logic and how the new gauge differs from counters.
apps/backend/src/pdp-subgraph/types.ts Adds blockNumber option and proofSets typing/validation.
apps/backend/src/pdp-subgraph/types.spec.ts Extends validation tests for proofSets.
apps/backend/src/pdp-subgraph/queries.ts Adds blockNumber variable and proofSets selection/filtering.
apps/backend/src/pdp-subgraph/pdp-subgraph.service.ts Threads blockNumber through provider fetch requests and retries.
apps/backend/src/pdp-subgraph/pdp-subgraph.service.spec.ts Updates service tests for the new query variable and response shape.
apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts Registers the new gauge metric provider.
apps/backend/src/data-retention/data-retention.service.ts Computes overdue estimate and emits gauge; adds safe BigInt gauge setter.
apps/backend/src/data-retention/data-retention.service.spec.ts Adds tests for gauge emission, cleanup removal, and large-value handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@silent-cipher silent-cipher requested a review from SgtPooki March 26, 2026 15:15
@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very quick look before vacation, but mostly lgtm.. @juliangruber can you take a peek as well?

@BigLep BigLep requested a review from juliangruber March 31, 2026 00:16
@github-project-automation github-project-automation bot moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Mar 31, 2026
// naturally resets to 0 when NextProvingPeriod fires and the subgraph catches up.
this.safeSetGauge(this.overduePeriodsGauge, providerLabels, estimatedOverduePeriods);
// Note: Safe to cast: 1 period = 240 blocks. Even summing millions of datasets
// across decades stays well under the JS safe integer limit.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to keep the warning if (for whatever reason, maybe, proving periods change, fast finality, etc) for some reason the value should overflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ⌨️ In Progress

Development

Successfully merging this pull request may close these issues.

Detect stalled data-retention counters (no NextProvingPeriod fired)

6 participants