-
Notifications
You must be signed in to change notification settings - Fork 24
Description
1) Prevent OOM requests from perpetually consuming a lock slot.
Problem example: There are 10 lock slots for concurrent event running. Event A OOMs and the process is killed off higher up the chain than at the request-level. The plugin never gets to free up the lock, so now the site-level lock is perpetually stuck with just 9 slots. This slowly gets worse and worse if OOMing events keep running every once in a while. Eventually, maybe, all 10 slots would be deadlocked and then the cache key will expire since it is no longer being updated. But this can take a while, and cron becomes very backed up in the process.
Possible solution: Mentioned here, instead of the lock being 1 cache key with incr/decr, each lock slot could be it's own key w/ it's own timestamp. This way we can free up locks individually when problems happen.
Another possible solution: Use a DB table for locks instead - #84
Another possible solution: Have the Go runner catch a kill signal and it could send a subsequent cron-control cleanup event lock in those cases. Downside being it's an extra request, and only solves for the go runner, but is a way to at least be notified about when this happens (vs relying on a cache timeout that may or may not be an accurate representation of how long an event can run successfully).
2) Increase lock defaults
- A default of 10 site-level concurrency feels a bit conservative - it should just be up to the runner implementation to decide on scale, we don't need to do it at the application level as well IMO.
- Perhaps even default to unlimited event-level concurrency. Instead of a whitelist, events that cannot run concurrently would instead add themselves to a disallow filter.
3) Drop locks completely?
Following the chain of thought from the point above, maybe we just drop the site-level concurrency locks completely. And event-level concurrency would default to "unlimited", but keep a simplified implementation that is used for events where concurrency is specifically disallowed.