Describe the bug
Users reported that reads from Hugging Face bucket through OpenDAL are slow and can hit rate limits.
For bucket repos, OpenDAL currently still calls HfCore::maybe_xet_file() before creating the reader. This sends a HEAD request to the resolve URL on every read to discover X-Xet-Hash and size metadata, even though buckets are expected to use XET.
Relevant code:
core/services/hf/src/reader.rs: HfReader::try_new() calls core.maybe_xet_file(path) before reading.
core/services/hf/src/core.rs: maybe_xet_file() issues a HEAD request and parses X-Xet-Hash.
core/services/hf/src/backend.rs: bucket stat() uses the same probe.
Steps to Reproduce
Read many objects or ranges from an HF bucket-backed OpenDAL operator.
The issue is easier to observe with workloads that repeatedly call read or range-read, such as parquet/object_store-style readers.
Expected Behavior
HF bucket reads should avoid an extra per-read HEAD probe when the required XET metadata can be resolved through a bucket-native path, cached metadata, list/stat metadata, or another lower-request-count mechanism.
A normal read path should not double the number of HF-facing requests before data transfer starts.
Additional Context
This is separate from #7577, which tracks HF dataset behavior-test timeouts on CI. This issue is specifically about HF bucket read request amplification and rate limiting.
Related broader API work: #5872 tracks returning metadata from read operations, which can help integrations avoid extra stat calls.
Describe the bug
Users reported that reads from Hugging Face bucket through OpenDAL are slow and can hit rate limits.
For bucket repos, OpenDAL currently still calls
HfCore::maybe_xet_file()before creating the reader. This sends a HEAD request to the resolve URL on every read to discoverX-Xet-Hashand size metadata, even though buckets are expected to use XET.Relevant code:
core/services/hf/src/reader.rs:HfReader::try_new()callscore.maybe_xet_file(path)before reading.core/services/hf/src/core.rs:maybe_xet_file()issues a HEAD request and parsesX-Xet-Hash.core/services/hf/src/backend.rs: bucketstat()uses the same probe.Steps to Reproduce
Read many objects or ranges from an HF bucket-backed OpenDAL operator.
The issue is easier to observe with workloads that repeatedly call read or range-read, such as parquet/object_store-style readers.
Expected Behavior
HF bucket reads should avoid an extra per-read HEAD probe when the required XET metadata can be resolved through a bucket-native path, cached metadata, list/stat metadata, or another lower-request-count mechanism.
A normal read path should not double the number of HF-facing requests before data transfer starts.
Additional Context
This is separate from #7577, which tracks HF dataset behavior-test timeouts on CI. This issue is specifically about HF bucket read request amplification and rate limiting.
Related broader API work: #5872 tracks returning metadata from read operations, which can help integrations avoid extra stat calls.