Skip to content

Conversation

@jkamalu
Copy link

@jkamalu jkamalu commented Nov 7, 2025

This PR makes a few changes to address potential sources of instability/error in the data unit tests.

  1. The unit tests touched in this PR only need to be run on a single process, so in the multi-rank setting, we only do the work on one rank and return early on all others
  2. We revert to making an attempt to find test tokenizer vocab and merge files locally before downloading them from the web
  3. We fix the local path for the BERT vocab file
  4. We standardize somewhat the object storage client spoofing

@jkamalu jkamalu self-assigned this Nov 7, 2025
@jkamalu jkamalu requested a review from a team as a code owner November 7, 2025 04:47
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jkamalu jkamalu requested a review from ko3n1g November 7, 2025 04:48
@jkamalu jkamalu force-pushed the data-unit-tests-update branch 3 times, most recently from 1e7cb45 to e485838 Compare November 7, 2025 05:13
@jkamalu jkamalu added this to the Core 0.15 milestone Nov 7, 2025
@jkamalu
Copy link
Author

jkamalu commented Nov 7, 2025

/ok to test e485838

@jkamalu jkamalu force-pushed the data-unit-tests-update branch from e485838 to ea42e5f Compare November 7, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant