Skip to content

Conversation

@sshlyapn
Copy link

Motivation

This PR introduces support for persistent Hugging Face model caching within Docker containers. This prevents models from being re-downloaded every time the script runs, which is especially useful during manual testing on private machines from Conductor. By allowing cache reuse, it reduces network overhead and speeds up repeated test cycles.

Technical Details

  • Added logic to check for persistent_hf_cache_dir in the context.
  • Created a dedicated Docker volume (hf_models_cache_volume) when persistent_hf_cache_dir is provided.
  • Mounted the HF cache volume to the specified directory inside the container.
  • Modified model execution command to append --hf_cache_dir argument when persistent cache is enabled.

Should be merged after:

Test Plan

Validated with manual job: https://mlseqa-ci.amd.com:9090/view/Experimental/job/Test-MAD-akochin/118/

Test Result

Submission Checklist

@andrei-kochin andrei-kochin requested review from GeneDer, Copilot and gargrahul and removed request for GeneDer, Copilot and gargrahul November 21, 2025 12:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds optional persistent Hugging Face model caching support to prevent models from being re-downloaded on every script execution, particularly useful for manual testing scenarios. The implementation creates a Docker volume for cache persistence when configured and passes the cache directory to model execution scripts.

Key Changes:

  • Added logic to check for and utilize persistent_hf_cache_dir configuration from context
  • Implemented Docker volume creation and mounting for persistent HF cache storage
  • Refactored model execution command building to append --hf_cache_dir argument when persistent cache is enabled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docker_options += self.get_mount_arg(mount_datapaths)

if persistent_hf_cache_dir:
docker_options += f" -v {hf_models_cache_volume_name}:{persistent_hf_cache_dir} "
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persistent_hf_cache_dir value from context is used directly in the Docker mount without validation. This could lead to arbitrary directory mounting if the context contains malicious input. Add validation to ensure the path is safe and doesn't contain dangerous characters or paths like /, /etc, or similar sensitive directories.

Copilot uses AI. Check for mistakes.
model_args = self.context.ctx.get("model_args", info["args"])

if persistent_hf_cache_dir:
model_args += f" --hf_cache_dir {persistent_hf_cache_dir} "
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persistent_hf_cache_dir is concatenated directly into the command string without shell escaping. If the path contains spaces or special characters, it could break the command or introduce command injection vulnerabilities. Use proper shell escaping (e.g., shlex.quote()) when building the command string.

Copilot uses AI. Check for mistakes.
@gowthamcr-amd gowthamcr-amd self-requested a review November 26, 2025 11:25
@gowthamcr-amd
Copy link

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants