Fix: Prevent OperationalError from dropped connections during worker sleep#22
Open
Ju5t1nL3 wants to merge 3 commits intoRealOrangeOne:masterfrom
Open
Fix: Prevent OperationalError from dropped connections during worker sleep#22Ju5t1nL3 wants to merge 3 commits intoRealOrangeOne:masterfrom
Ju5t1nL3 wants to merge 3 commits intoRealOrangeOne:masterfrom
Conversation
Author
ProblemAs y'all probably noticed before, mysql seems to count checking for expired connections as a query. This is why I assume y'all started the connection.vendor if statement inside of the tests. Since I added an extra "close_old_connections()", it broke the tests, as the tests didn't account for another extra query, at the beginning nonetheless. SolutionSimply add 1 to all asserted query numbers. I will admit changing the tests is a bit sketch though, so feel free to be critical about it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem
When running the django_tasks_db worker on cloud platforms with aggressive TCP idle timeouts (like Railway, AWS, or Heroku), the worker occasionally crashes with a psycopg.OperationalError: the connection is closed.
Currently, close_old_connections() is called at the bottom of the while self.running: loop. If the worker finds no tasks, it sleeps. During this sleep period, cloud load balancers can silently drop the network connection. When the worker wakes up and loops back to the top, it immediately attempts tasks.get_locked() using the dead connection, causing a fatal crash before it can reach the connection cleanup at the bottom of the loop.
The Solution
This PR duplicates the close_old_connections() call at the very top of the worker loop.
Because close_old_connections() is highly optimized and practically a no-op for healthy connections, adding it to the top of the loop introduces no measurable performance penalty while significantly increasing the worker's resilience to network realities.