Skip to content

Task: Re-examine the event locks implementation #233

@WPprodigy

Description

@WPprodigy

1) Prevent OOM requests from perpetually consuming a lock slot.

Problem example: There are 10 lock slots for concurrent event running. Event A OOMs and the process is killed off higher up the chain than at the request-level. The plugin never gets to free up the lock, so now the site-level lock is perpetually stuck with just 9 slots. This slowly gets worse and worse if OOMing events keep running every once in a while. Eventually, maybe, all 10 slots would be deadlocked and then the cache key will expire since it is no longer being updated. But this can take a while, and cron becomes very backed up in the process.

Possible solution: Mentioned here, instead of the lock being 1 cache key with incr/decr, each lock slot could be it's own key w/ it's own timestamp. This way we can free up locks individually when problems happen.

Another possible solution: Use a DB table for locks instead - #84

Another possible solution: Have the Go runner catch a kill signal and it could send a subsequent cron-control cleanup event lock in those cases. Downside being it's an extra request, and only solves for the go runner, but is a way to at least be notified about when this happens (vs relying on a cache timeout that may or may not be an accurate representation of how long an event can run successfully).

2) Increase lock defaults

  • A default of 10 site-level concurrency feels a bit conservative - it should just be up to the runner implementation to decide on scale, we don't need to do it at the application level as well IMO.
  • Perhaps even default to unlimited event-level concurrency. Instead of a whitelist, events that cannot run concurrently would instead add themselves to a disallow filter.

3) Drop locks completely?

Following the chain of thought from the point above, maybe we just drop the site-level concurrency locks completely. And event-level concurrency would default to "unlimited", but keep a simplified implementation that is used for events where concurrency is specifically disallowed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions