Skip to content

bug: getNextCronRun blocks the event loop for 15s+ on rare/yearly cron expressions — new Intl.DateTimeFormat() created per iteration #116

@p11n-com

Description

@p11n-com

Summary

getNextCronRun iterates minute-by-minute from "now" to the next matching slot, up to 60 × 24 × 400 = 576,000 iterations. For rare cron expressions (yearly, monthly, or any schedule whose next occurrence is far in the future), this easily reaches hundreds of thousands of iterations. Inside every iteration, the function calls getDatePartsInTz, which creates a brand-new Intl.DateTimeFormat object each call. On a typical V8 host, constructing an Intl.DateTimeFormat with a non-UTC timezone costs ~25–40 µs. Multiply by 525,600 iterations (next occurrence is ~1 year away) and the total is 13–20 seconds of synchronous, event-loop-blocking work.

Because the Express server in the ClawMax container is single-threaded, while getNextCronRun is running no other request can be served. Every API call queued behind the slow workflow-detail request times out or hangs, making the entire dashboard appear broken.

Steps to Reproduce

  1. Create or edit any workflow with a rare/yearly cron expression — e.g. 0 16 02 5 * (May 2 at 4 PM, once a year).
  2. Open the workflow detail panel, or hit GET /api/workflows/<id> directly.
  3. The request does not return for 13–20 seconds on first access.
  4. All other concurrent API requests also stall for the same window.

Concrete reproduction:

  • Cron: 0 16 02 5 * (annual, May 2 16:00 UTC)
  • Server time: 2026-05-03 00:xx UTC (just past the annual window — next match is ~525,600 minutes away)
  • Measured from host: curl -w '%{time_total}' http://localhost:3001/api/workflows/agent-research-15.3 s
  • After the fix: 0.023 s

Root Cause

File: SYSTEM/dashboard/server/lib/cron-next-run.ts

function getDatePartsInTz(date: Date, tz: string): DateParts {
  const fmt = new Intl.DateTimeFormat('en-US', {   // ← NEW OBJECT EVERY CALL
    timeZone: tz,
    year: 'numeric', month: 'numeric', day: 'numeric',
    hour: 'numeric', minute: 'numeric', weekday: 'short', hour12: false
  })
  const parts = Object.fromEntries(fmt.formatToParts(date).map(p => [p.type, p.value]))
  // ...
}

export function getNextCronRun(cronExpression: string, fromDate = new Date(), timezone = 'UTC'): Date | null {
  // ...
  const maxIterations = 60 * 24 * 400   // up to 576,000 iterations
  for (let i = 0; i < maxIterations; i++) {
    const dp = getDatePartsInTz(candidate, tz)   // ← 576k × new Intl.DateTimeFormat = 15s+
    // ...
    candidate.setMinutes(candidate.getMinutes() + 1)
  }
}

Two independent problems combine to cause the hang:

Problem 1 — Intl.DateTimeFormat is never cached

A new formatter is allocated on every call to getDatePartsInTz. Intl.DateTimeFormat initialization is expensive (timezone data lookup, ICU table loads). Reusing a formatter for the same timezone reduces the cost by ~10–20×.

Problem 2 — No skip optimization for coarser fields

When month doesn't match the cron expression, the loop still advances one minute at a time — wasting up to 31 days × 24 h × 60 min = 44,640 iterations per skipped month. A yearly cron with a specific month can waste 11 × 44,640 ≈ 491k iterations just skipping the wrong months, before even evaluating days or hours.

Proposed Fix

Two changes, independent but complementary:

Fix 1 — Cache the formatter per timezone

const fmtCache = new Map<string, Intl.DateTimeFormat>()

function getDatePartsInTz(date: Date, tz: string): DateParts {
  let fmt = fmtCache.get(tz)
  if (!fmt) {
    fmt = new Intl.DateTimeFormat('en-US', {
      timeZone: tz,
      year: 'numeric', month: 'numeric', day: 'numeric',
      hour: 'numeric', minute: 'numeric', weekday: 'short', hour12: false
    })
    fmtCache.set(tz, fmt)
  }
  const parts = Object.fromEntries(fmt.formatToParts(date).map(p => [p.type, p.value]))
  // ... rest unchanged
}

Eliminates repeated construction; a single formatter is reused for all iterations within one getNextCronRun call (and across future calls for the same timezone).

Fix 2 — Smart skip for month / day / hour mismatches

for (let i = 0; i < maxIterations; i++) {
  const dp = getDatePartsInTz(candidate, tz)

  // Skip to first day of next matching month
  if (!month.any && !month.values.has(dp.month)) {
    const nextMonth = dp.month >= 12 ? 1 : dp.month + 1
    const nextYear  = dp.month >= 12 ? candidate.getFullYear() + 1 : candidate.getFullYear()
    candidate.setTime(new Date(nextYear, nextMonth - 1, 1, 0, 0, 0, 0).getTime())
    i += 30
    continue
  }

  // Skip to midnight of next day
  if (!matchesDay(dp, dayOfMonth, dayOfWeek)) {
    candidate.setDate(candidate.getDate() + 1)
    candidate.setHours(0, 0, 0, 0)
    i += 23
    continue
  }

  // Skip to start of next hour
  if (!hour.any && !hour.values.has(dp.hour)) {
    candidate.setHours(candidate.getHours() + 1, 0, 0, 0)
    i += 59
    continue
  }

  if (minute.any || minute.values.has(dp.minute)) {
    return new Date(candidate)
  }

  candidate.setMinutes(candidate.getMinutes() + 1)
}

For a yearly cron with a specific month, the loop now runs at most ~400 iterations (one per day in the worst case) instead of 525,600.

Impact

Scenario Before After
Daily cron (0 17 * * *) < 1ms < 1ms
Monthly cron (0 9 1 * *) ~200ms < 1ms
Yearly cron (0 16 02 5 *, next match ~1 yr away) 15–20 s < 1ms

While the yearly cron is the worst case, any cron whose next slot is > 30 days away (bi-annual, quarterly, specific date in a future month) will also hang the event loop noticeably.

The hang blocks all concurrent API requests because the Node.js event loop is single-threaded. In practice this makes the workflow detail endpoint non-functional for affected schedules, and causes background polling requests (execution status checks, workflow list refreshes) to also time out while the loop runs.

Affected Files

File Change
SYSTEM/dashboard/server/lib/cron-next-run.ts Add module-level fmtCache Map; reuse formatter in getDatePartsInTz; add month/day/hour skip in getNextCronRun loop

Environment

  • Reproducible on all ClawMax Docker installs where the next cron occurrence is > ~1 month in the future
  • Severity is proportional to how far away the next occurrence is
  • Node.js 22 (container default); also reproducible on Node 20 and 24
  • No external dependency changes required — pure algorithmic fix

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh-priorityCritical — fix this sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions