Skip to content

bug(services/hf): avoid extra HEAD probe for bucket reads #7622

@Xuanwo

Description

@Xuanwo

Describe the bug

Users reported that reads from Hugging Face bucket through OpenDAL are slow and can hit rate limits.

For bucket repos, OpenDAL currently still calls HfCore::maybe_xet_file() before creating the reader. This sends a HEAD request to the resolve URL on every read to discover X-Xet-Hash and size metadata, even though buckets are expected to use XET.

Relevant code:

  • core/services/hf/src/reader.rs: HfReader::try_new() calls core.maybe_xet_file(path) before reading.
  • core/services/hf/src/core.rs: maybe_xet_file() issues a HEAD request and parses X-Xet-Hash.
  • core/services/hf/src/backend.rs: bucket stat() uses the same probe.

Steps to Reproduce

Read many objects or ranges from an HF bucket-backed OpenDAL operator.

The issue is easier to observe with workloads that repeatedly call read or range-read, such as parquet/object_store-style readers.

Expected Behavior

HF bucket reads should avoid an extra per-read HEAD probe when the required XET metadata can be resolved through a bucket-native path, cached metadata, list/stat metadata, or another lower-request-count mechanism.

A normal read path should not double the number of HF-facing requests before data transfer starts.

Additional Context

This is separate from #7577, which tracks HF dataset behavior-test timeouts on CI. This issue is specifically about HF bucket read request amplification and rate limiting.

Related broader API work: #5872 tracks returning metadata from read operations, which can help integrations avoid extra stat calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingreleases-note/fixThe PR fixes a bug or has a title that begins with "fix"

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions