-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Fix offline cache clean #42308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix offline cache clean #42308
Conversation
|
Hi @Aaraviitkgp , maintainer of the underlying We can think about adding helpers like (unless I missed something here...) |
|
I've just tested (with latest huggingface_hub version + transformers from 1.py# 1.py
import os
os.environ["HF_HOME"] = "./tmp_cache_dr" # to adapt
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
print("CACHE_WARMED")2.py# 2.py
import os
os.environ["HF_HOME"] = "./tmp_cache_dr" # to adapt
os.environ["HF_HUB_OFFLINE"] = "1"
# Import transformers first
# Then block sockets to ensure no network access
import socket
from transformers import AutoConfig, AutoModel, AutoTokenizer
original_socket = socket.socket
def guarded_socket(*args, **kwargs):
raise RuntimeError("Network access attempted in offline mode!")
socket.socket = guarded_socket
try:
config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
print("OFFLINE_SUCCESS")
except RuntimeError as e:
if "Network access" in str(e):
print(f"NETWORK_ATTEMPTED: {e}")
exit(1)
raise
except Exception as e:
print(f"FAILED: {e}")
import traceback
traceback.print_exc()
exit(1)And both succeeded for me, without the need for this PR: Could you try it on your side as well and let me know if you spot any issue? |
|
@Wauplin You are right about that, I just went through code and realised that was bit of misunderstanding 😁,but we can pivot this pr to add helpers like enable_offline_mode/disable_offline_mode to properly manage that if needed. |
it works for me aswelll. |
|
@Aaraviitkgp I'm open to review a PR for that yes, but first I'd like to understand the need. In which case do you want to have a script downloading files and only then set the offline mode? In general it's more robust to not change env variables / environment behavior in the middle of a process. |
|
@Wauplin I think you are right, it is not ideal to change env variable in middle of a process. I think I will close this pr. |
|
@Wauplin thanks for explaining this, @Aaraviitkgp thanks for looking into this.
tl;dr: When possible, set the environment variable before importing transformers. I will update / close my issue #42197. (I honestly thought I had set the variable before import, not sure how I missed testing this.) Also, there's a related issue #42269 and PR #42318 that can affect these kinds of "load while online, then access while offline" use cases. The |
|
Thanks for flagging @fr1ll , the pipeline not respecting |
Enhanced cache resolution in cached_files() to properly locate models
downloaded in subprocess when loading offline.
Changes:
Tests:
@Wauplin @LysandreJik