fix(scheduler): correctly handle torrent eviction from cache#635
Merged
Conversation
thijmv
approved these changes
Jun 18, 2026
sambhav-jain-16
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR
This PR fixes a bug in agent where blob downloads fail if 1) a blob is being seeded to other agents, 2) it got evicted by the disk cache, and 3) 5 minutes have not passed since the eviction.
Details
Agent keeps track of whether a blob is in cache or not based on its presence on disk. However, that is not the single SOT. The P2P scheduler also briefly keeps track of which torrents (aka blobs) are available for seeding. This is only done temporarily if another peer requests a blob. The bug happens when the 2 states are out of sync, i.e. when the scheduler's mem state thinks a blob exists, but the disk cache's eviction mechanism has deleted it to make space for other blobs.
The specific order to trigger the bug is the following:
Torrentand store in memory that the torrent is available.This actually happened in production and can be seen in the agent's logs - a blob gets downloaded and then ~12h later the bug happens, leading to download errors (12h because the blob gets evicted due to the 12h TTL).
5 minutes after the blob is evicted, the agent cleans the scheduler's mem state, as it has a 5m TTI on memory torrent entries that are not being seeded. So the bug only happens if the blob is requested within that 5min window.
The fix
To fix this, I change the behavior in step 7) - the scheduler now does not use its memory state to decide whether a
Downloadcall should be a no-op, but instead checks theTorrentstruct's fields, which are not out-of-sync.Test Plan
I added a unit test that covers the scenario and fails. After this PR, the test passes.