-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Open
Description
In the classification
and video_classification
references, we cache here:
vision/references/classification/train.py
Line 108 in 6e203b4
cache_path = os.path.join("~", ".torch", "vision", "datasets", "imagefolder", h[:10] + ".pt") cache_path = os.path.join("~", ".torch", "vision", "datasets", "kinetics", h[:10] + ".pt")
However, this directory is not used by PyTorch core. Instead, ~/.cache/torch
is used. For example, torch.hub
caches in ~/.cache/torch/hub
. The datasets v2 used the same root folder and will store the datasets by default in
_HOME = os.path.join(_get_torch_home(), "datasets", "vision") |
which expands to ~/.cache/torch/datasets/vision
.
Maybe we can use ~/.cache/torch/cached_datasets
or something similar as cache path in the references?
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
datumbox commentedon Oct 10, 2022
Thanks for reporting @pmeier. Ideally we would like to move away from needing to pre-read the dataset and cache it. This is currently necessary due to the way that the Video Clipping class works but this causes issues with streamed datasets. @YosuaMichael is looking to fix this.
pmeier commentedon Oct 10, 2022
@YosuaMichael if we won't support caching in the future, feel free to close this issue.
YosuaMichael commentedon Oct 10, 2022
@datumbox In the case of VideoClipping, we indeed cache the dataset because we pre-compute all the non-sampled clips start and end. However, seems like this cache concept is not just for video dataset but rather for general dataset (for classification too).
Also, I am not sure yet if we will get rid of cache (for performance reason) even if we change the clip sampler design, so I think this issue should be still open for now.
NicolasHug commentedon Oct 10, 2022
This will more likely be
~/.cache/torch/vision/datasets
to keep domains properly separated. FYI @mthrok @parmeet and I had agreed on the following API for setting / getting assets folders, as well as their default paths (at the time we didn't consider "dataset cache" but it's just another asset type):So perhaps we'll want to go with
~/.cache/torch/vision/cached_datasets
. The difference between "cached_datasets" and "datasets" isn't obvious, but I don't have a much better suggestion.