Skip to content

Conversation

@Aaraviitkgp
Copy link

@Aaraviitkgp Aaraviitkgp commented Nov 20, 2025

Enhanced cache resolution in cached_files() to properly locate models
downloaded in subprocess when loading offline.

Changes:

  • Check multiple cache directories (HF_HOME, TRANSFORMERS_CACHE, HF_HUB_CACHE)
  • Search snapshot directories when refs are missing
  • Return cached files early to avoid network access

Tests:

  • test_subprocess_warm_cache_then_offline_load
  • test_pipeline_offline_after_subprocess_warm
  • Both tests verify no network access with socket blocking

@Wauplin @LysandreJik

@Wauplin
Copy link
Contributor

Wauplin commented Nov 21, 2025

Hi @Aaraviitkgp , maintainer of the underlying huggingface_hub library here. IMO this issue is more of a misunderstanding over how to enable offline mode rather than a bug in transformers. If a socket connection is attempted, it's means that transformers/huggingface_hub did not know about HF_HUB_OFFLINE being switched to 1. This is because HF_HUB_OFFLINE environment variable is evaluated once at import time. If you really want to update it at runtime, you need to patch huggingface_hub.constant.HF_HUB_OFFLINE.

We can think about adding helpers like enable_offline_mode/disable_offline_mode to properly manage that if needed. But IMO there shouldn't be any transformers-side code changes.

(unless I missed something here...)

@Wauplin
Copy link
Contributor

Wauplin commented Nov 21, 2025

I've just tested (with latest huggingface_hub version + transformers from main branch) to run the scripts from test_subprocess_warm_cache_then_offline_load locally:

1.py
# 1.py
import os

os.environ["HF_HOME"] = "./tmp_cache_dr"  # to adapt

from transformers import AutoConfig, AutoModel, AutoTokenizer

config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
print("CACHE_WARMED")
2.py
# 2.py
import os
os.environ["HF_HOME"] = "./tmp_cache_dr" # to adapt
os.environ["HF_HUB_OFFLINE"] = "1"

# Import transformers first
# Then block sockets to ensure no network access
import socket
from transformers import AutoConfig, AutoModel, AutoTokenizer

original_socket = socket.socket

def guarded_socket(*args, **kwargs):
    raise RuntimeError("Network access attempted in offline mode!")

socket.socket = guarded_socket

try:
    config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
    model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
    tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
    print("OFFLINE_SUCCESS")
except RuntimeError as e:
    if "Network access" in str(e):
        print(f"NETWORK_ATTEMPTED: {e}")
        exit(1)
    raise
except Exception as e:
    print(f"FAILED: {e}")
    import traceback

    traceback.print_exc()
    exit(1)

And both succeeded for me, without the need for this PR:

➜ python 1.py         
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 520k/520k [00:00<00:00, 7.33MB/s]
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 6066.68it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: (...)
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 321/321 [00:00<00:00, 2.87MB/s]
vocab.txt: 4.68kB [00:00, 649kB/s]
tokenizer.json: 12.9kB [00:00, 34.3MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 799kB/s]
CACHE_WARMED
➜ python 2.py 
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 7663.97it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: (...)
OFFLINE_SUCCESS

Could you try it on your side as well and let me know if you spot any issue?

@Aaraviitkgp
Copy link
Author

@Wauplin You are right about that, I just went through code and realised that was bit of misunderstanding 😁,but we can pivot this pr to add helpers like enable_offline_mode/disable_offline_mode to properly manage that if needed.

@Aaraviitkgp
Copy link
Author

I've just tested (with latest huggingface_hub version + transformers from main branch) to run the scripts from test_subprocess_warm_cache_then_offline_load locally:

1.py

# 1.py
import os

os.environ["HF_HOME"] = "./tmp_cache_dr"  # to adapt

from transformers import AutoConfig, AutoModel, AutoTokenizer

config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
print("CACHE_WARMED")

2.py

# 2.py
import os
os.environ["HF_HOME"] = "./tmp_cache_dr" # to adapt
os.environ["HF_HUB_OFFLINE"] = "1"

# Import transformers first
# Then block sockets to ensure no network access
import socket
from transformers import AutoConfig, AutoModel, AutoTokenizer

original_socket = socket.socket

def guarded_socket(*args, **kwargs):
    raise RuntimeError("Network access attempted in offline mode!")

socket.socket = guarded_socket

try:
    config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-bert")
    model = AutoModel.from_pretrained("hf-internal-testing/tiny-random-bert")
    tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-bert")
    print("OFFLINE_SUCCESS")
except RuntimeError as e:
    if "Network access" in str(e):
        print(f"NETWORK_ATTEMPTED: {e}")
        exit(1)
    raise
except Exception as e:
    print(f"FAILED: {e}")
    import traceback

    traceback.print_exc()
    exit(1)

And both succeeded for me, without the need for this PR:

➜ python 1.py         
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 520k/520k [00:00<00:00, 7.33MB/s]
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 6066.68it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: (...)
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 321/321 [00:00<00:00, 2.87MB/s]
vocab.txt: 4.68kB [00:00, 649kB/s]
tokenizer.json: 12.9kB [00:00, 34.3MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 799kB/s]
CACHE_WARMED
➜ python 2.py 
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 7663.97it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: (...)
OFFLINE_SUCCESS

Could you try it on your side as well and let me know if you spot any issue?

it works for me aswelll.

@Wauplin
Copy link
Contributor

Wauplin commented Nov 21, 2025

@Aaraviitkgp I'm open to review a PR for that yes, but first I'd like to understand the need. In which case do you want to have a script downloading files and only then set the offline mode? In general it's more robust to not change env variables / environment behavior in the middle of a process.

@Aaraviitkgp
Copy link
Author

@Wauplin I think you are right, it is not ideal to change env variable in middle of a process. I think I will close this pr.

@fr1ll
Copy link

fr1ll commented Nov 22, 2025

@Wauplin thanks for explaining this, @Aaraviitkgp thanks for looking into this.

HF_HUB_OFFLINE environment variable is evaluated once at import time. If you really want to update it at runtime, you need to patch huggingface_hub.constant.HF_HUB_OFFLINE

tl;dr: When possible, set the environment variable before importing transformers.

I will update / close my issue #42197. (I honestly thought I had set the variable before import, not sure how I missed testing this.)

Also, there's a related issue #42269 and PR #42318 that can affect these kinds of "load while online, then access while offline" use cases. The local_files_only argument already gives a way in some settings to prevent hub access at runtime, just appears from that issue that it needs to be fixed for pipeline.

@Aaraviitkgp Aaraviitkgp deleted the fix-offline-cache-clean branch November 23, 2025 19:04
@Wauplin
Copy link
Contributor

Wauplin commented Nov 24, 2025

Thanks for flagging @fr1ll , the pipeline not respecting local_files_only is indeed a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants