Skip to content

feat(scheduler): Extract short job penalty service#4969

Open
tgucks wants to merge 10 commits into
masterfrom
extract-short-job-penalty-svc
Open

feat(scheduler): Extract short job penalty service#4969
tgucks wants to merge 10 commits into
masterfrom
extract-short-job-penalty-svc

Conversation

@tgucks

@tgucks tgucks commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

Feature/refactor

What this PR does / why we need it

Extracts short-job-penalty tracking out of the scheduling hot path and the jobDb retention logic into a dedicated, self-contained ShortJobPenalty service.

Previously the penalty was recomputed every scheduling cycle by scanning all terminal jobs held in the jobDb. To make those jobs available for the scan, terminal short jobs were deliberately kept in the jobDb while their penalty was active. This required an occasional full GC to clean them up periodically. This PR makes ShortJobPenalty own its own state:

  • Terminal jobs are reported to the service once each via ReportFinishedJob at each point where a job can go terminal. This records the job's resources keyed by (pool, queue).
  • Penalties are snapshotted once per scheduling cycle via Snapshot() and read back per-pool from that immutable ShortJobPenaltySnapshot via GetPenaltiesForPool, replacing the inline per-job ShouldApplyPenalty accumulation in calculateJobSchedulingInfo.
  • Entries expire automatically via a deadline-ordered min-heap (runStart + cutoff[pool]), with a derived per-(pool, queue) running total cache kept in sync as entries are added and expired. Access is guarded by a mutex.

This removes the need for the periodic full GC of terminal jobs from the jobDb. They're deleted as they go terminal now.

Special notes for your reviewer

  • ShortJobPenalty is now stateful and should be concurrency-safe (sync.Mutex); the entries are the source of truth and sums is a derived cache. A Snapshot() taken once per cycle gives every pool a consistent point-in-time view.
  • syncState's signature changed (dropped the fullJobGc bool); call sites in cycle and initialize were updated accordingly.
  • Tests in short_job_penalty_test.go were substantially expanded to cover reporting, dedup, expiry, and per-pool reads.

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extracts short-job-penalty tracking from the inline scheduling hot path into a dedicated, self-contained ShortJobPenalty service with its own state management. Terminal jobs are now reported once at each termination site, penalties expire automatically via a min-heap, and a Snapshot() taken once per scheduling cycle provides all pools with a consistent, immutable view — removing the need to retain terminal jobs in jobDb across cycles.

  • ShortJobPenalty service (short_job_penalty.go / short_job_penalty_types.go): adds ReportFinishedJob, Snapshot, and a deadline-ordered min-heap for automatic expiry; all state is guarded by a sync.Mutex.
  • jobdb.go: removes the terminalJobs index, GetAllTerminalJobs, and the associated goroutine in Upsert; the periodic full-GC path in syncState is also removed.
  • scheduler.go: inserts three ReportFinishedJob call sites (reconciliation, expiry, submit-check) and drops the fullJobGc parameter from syncState.

Confidence Score: 4/5

Safe to merge with awareness that penalty registration has a few lifecycle edge cases that were flagged in earlier review rounds and remain open.

The core implementation — the heap-based expiry, mutex discipline, snapshot pattern, and test coverage — is solid and well-structured. The main open questions are around completeness of ReportFinishedJob call sites across all terminal paths in scheduler.go (discussed in prior review rounds), which could lead to missed penalty registrations in specific DB-driven completion scenarios.

internal/scheduler/scheduler.go — the three ReportFinishedJob call sites and their relationship to the write-transaction lifecycle deserve a careful second look before merging.

Important Files Changed

Filename Overview
internal/scheduler/scheduling/short_job_penalty.go Core implementation of the new stateful service; heap-based expiry, dedup, and mutex are all correct; one minor API gap: GetPenaltiesForPool returns the snapshot's inner map by reference.
internal/scheduler/scheduling/short_job_penalty_types.go Defines ShortJobPenalty, ShortJobPenaltySnapshot, and the entryHeap min-heap; entryHeap correctly implements heap.Interface with a compile-time assertion.
internal/scheduler/scheduler.go Three ReportFinishedJob call sites added (reconciliation, expiry, submit-check); all are called before PublishMessages/txn.Commit, so a publish failure leaves stale penalty entries until they expire.
internal/scheduler/scheduling/scheduling_algo.go Snapshot() taken once per cycle; passed into runPoolSchedulingRound and calculateJobSchedulingInfo; terminal jobs no longer included in allJobs; clean removal of the inline ShouldApplyPenalty accumulation loop.
internal/scheduler/jobdb/jobdb.go terminalJobs index, GetAllTerminalJobs, and the associated Upsert goroutine cleanly removed; wg.Add count updated correctly from 7 to 6.
internal/scheduler/scheduling/short_job_penalty_test.go Substantially expanded test suite covering heap ordering, accumulation, dedup, partial/full expiry, per-pool isolation, and an 8-worker concurrency correctness test.
internal/scheduler/scheduler_test.go Three new cycle test cases verify that penalties are registered for succeeded, expired, and rejected short jobs; shortJobPenalty injected into Scheduler for assertion.
internal/scheduler/jobdb/jobdb_test.go Removed three terminal-job lifecycle tests that were made obsolete by the deletion of the terminalJobs index; no test coverage gaps introduced.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Scheduler
    participant ShortJobPenalty
    participant FairSchedulingAlgo
    participant ShortJobPenaltySnapshot

    note over Scheduler: Start of cycle — SetNow(cycleStart)
    Scheduler->>ShortJobPenalty: SetNow(now)

    note over Scheduler: syncState — DB reconciliation
    Scheduler->>Scheduler: syncState() — terminal jobs deleted from jobDb

    note over Scheduler: generateUpdateMessages
    Scheduler->>ShortJobPenalty: ReportFinishedJob(job) [newly terminal via reconciliation]

    note over Scheduler: submitCheck
    Scheduler->>ShortJobPenalty: ReportFinishedJob(job) [rejected at submit-check]

    note over Scheduler: expireJobsIfNecessary
    Scheduler->>ShortJobPenalty: ReportFinishedJob(expiredJob) [stale executor]

    note over FairSchedulingAlgo: Schedule — once per cycle
    FairSchedulingAlgo->>ShortJobPenalty: Snapshot() [expireUpTo(now) + deep copy sums]
    ShortJobPenalty-->>FairSchedulingAlgo: ShortJobPenaltySnapshot

    loop For each pool
        FairSchedulingAlgo->>ShortJobPenaltySnapshot: GetPenaltiesForPool(pool)
        ShortJobPenaltySnapshot-->>FairSchedulingAlgo: map[queue]ResourceList
        FairSchedulingAlgo->>FairSchedulingAlgo: calculateJobSchedulingInfo(shortJobPenaltyByQueue)
    end

    note over Scheduler: PublishMessages → txn.Commit
    Scheduler->>Scheduler: PublishMessages / txn.Commit
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Scheduler
    participant ShortJobPenalty
    participant FairSchedulingAlgo
    participant ShortJobPenaltySnapshot

    note over Scheduler: Start of cycle — SetNow(cycleStart)
    Scheduler->>ShortJobPenalty: SetNow(now)

    note over Scheduler: syncState — DB reconciliation
    Scheduler->>Scheduler: syncState() — terminal jobs deleted from jobDb

    note over Scheduler: generateUpdateMessages
    Scheduler->>ShortJobPenalty: ReportFinishedJob(job) [newly terminal via reconciliation]

    note over Scheduler: submitCheck
    Scheduler->>ShortJobPenalty: ReportFinishedJob(job) [rejected at submit-check]

    note over Scheduler: expireJobsIfNecessary
    Scheduler->>ShortJobPenalty: ReportFinishedJob(expiredJob) [stale executor]

    note over FairSchedulingAlgo: Schedule — once per cycle
    FairSchedulingAlgo->>ShortJobPenalty: Snapshot() [expireUpTo(now) + deep copy sums]
    ShortJobPenalty-->>FairSchedulingAlgo: ShortJobPenaltySnapshot

    loop For each pool
        FairSchedulingAlgo->>ShortJobPenaltySnapshot: GetPenaltiesForPool(pool)
        ShortJobPenaltySnapshot-->>FairSchedulingAlgo: map[queue]ResourceList
        FairSchedulingAlgo->>FairSchedulingAlgo: calculateJobSchedulingInfo(shortJobPenaltyByQueue)
    end

    note over Scheduler: PublishMessages → txn.Commit
    Scheduler->>Scheduler: PublishMessages / txn.Commit
Loading

Reviews (7): Last reviewed commit: "Merge branch 'master' into extract-short..." | Re-trigger Greptile

Comment thread internal/scheduler/scheduler.go Outdated
Comment on lines 57 to +81
@@ -49,5 +77,117 @@ func (sjp *ShortJobPenalty) ShouldApplyPenalty(job *jobdb.Job) bool {
return false
}

return sjp.now.Sub(*jobStart) < sjp.cutoffDurationByPool[jobRun.Pool()]
return sjp.now.Sub(*jobStart) < sjp.cutoffs[jobRun.Pool()]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Undocumented mutex precondition on shouldApplyPenalty

shouldApplyPenalty reads sjp.now and sjp.cutoffs directly without acquiring sjp.mu. This is currently safe because every call site (ReportFinishedJob) already holds the lock before dispatching here. However, the function carries an implicit contract — it must be called with the mutex held — which isn't documented. A future internal caller that invokes it without the lock would silently introduce a data race that only surfaces under -race and only when SetNow is called concurrently. Adding a brief comment (e.g. // caller must hold sjp.mu) makes the contract explicit and avoids a subtle footgun.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +182 to 193
func (sjp *ShortJobPenalty) subtractFromSums(pool, queue string, resources internaltypes.ResourceList) {
queueSums := sjp.sums[pool]
remaining := queueSums[queue].Subtract(resources)
if remaining.AllZero() {
delete(queueSums, queue)
if len(queueSums) == 0 {
delete(sjp.sums, pool)
}
return
}
queueSums[queue] = remaining
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Nil map write in subtractFromSums is unreachable but unguarded

queueSums := sjp.sums[pool] returns nil if the pool key was previously deleted from sums (which subtractFromSums itself does on line 188). If a heap entry were ever present without a matching sums entry — an invariant violation — the code would reach queueSums[queue] = remaining on a nil map and panic. The invariant is maintained correctly today (every heap.Push is paired with an addToSums call), but adding a nil guard or an assertion would make the invariant self-documenting and prevent a silent panic if the invariant is ever accidentally broken.

@tgucks tgucks force-pushed the extract-short-job-penalty-svc branch from f39438e to 640e870 Compare June 17, 2026 13:00
tgucks added 4 commits June 17, 2026 15:12
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
@tgucks tgucks force-pushed the extract-short-job-penalty-svc branch from 640e870 to cb5940b Compare June 17, 2026 20:18
tgucks added 6 commits June 18, 2026 09:45
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Signed-off-by: Trey Guckian <24757349+tgucks@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant