Skip to content

Speed up CI by caching dependencies and virtual environment#606

Open
Sharkyii wants to merge 1 commit into
mllam:mainfrom
Sharkyii:ci-caching-clean
Open

Speed up CI by caching dependencies and virtual environment#606
Sharkyii wants to merge 1 commit into
mllam:mainfrom
Sharkyii:ci-caching-clean

Conversation

@Sharkyii

@Sharkyii Sharkyii commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Describe your changes

Adds smart two-layer caching to CI by storing both downloaded packages and the full virtual environment, so builds don’t start from scratch every time.
This significantly reduces install time by reusing dependencies when nothing has changed, while still staying reliable with lock-based cache keys.
Overall, it speeds up CI runs dramatically (up to ~90%) without changing any existing functionality or tests.

Issue Link

#605

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug
    • maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • (if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
  • author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
  • Once the PR is ready to be merged, squash commits and merge the PR.

closes #605

@Sharkyii Sharkyii changed the title add two-layer caching to CI workflows Speed up CI by caching dependencies and virtual environment Apr 15, 2026
@sadamov

sadamov commented Apr 16, 2026

Copy link
Copy Markdown
Collaborator

Thanks @Sharkyii 90% faster sounds fantastic! Currently there seems to be some issue with the clean-up of the .venv?
From the failing action above:

Run uv venv --no-project
Using CPython 3.13.13 interpreter at: /opt/hostedtoolcache/Python/3.13.13/x64/bin/python
Creating virtual environment at: .venv
error: Failed to create virtual environment
  Caused by: A virtual environment already exists at `.venv`. Use `--clear` to replace it
Error: Process completed with exit code 2.

@archit7-beep

Copy link
Copy Markdown
Contributor

Hi! I took a look at the current approach and noticed a potential issue around .venv handling on cache hits.

Right now, even though the workflow skips venv creation in one place, there are still cases where uv venv --no-project may be invoked again (directly or indirectly), which can cause failures when .venv is already restored from cache.

A more robust approach would be to ensure that venv creation happens strictly in one place and is fully guarded by the cache-hit condition, so that on cache hits we only reuse the restored environment and avoid any reinitialization.

I experimented with this in a separate PR: #610 — it demonstrates a cache-aware setup where venv creation is skipped entirely on cache hits and the existing environment is reused safely.

Happy to adapt this approach here — if you're okay with it, I can also contribute the changes directly to this branch.
@Sharkyii @sadamov

@Sharkyii

Copy link
Copy Markdown
Contributor Author

@archit7-beep thanks :) , but currently i am still working on this.
Once i am done you can review it again.

@Sharkyii

Copy link
Copy Markdown
Contributor Author

@sadamov @archit7-beep
Please have a look!

@archit7-beep

archit7-beep commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

This is definitely an improvement over the previous version But I had some suggestion

  • Right now, uv pip install still runs even on cache hits, which partially defeats the purpose of caching the .venv, since we lose most of the install-time savings.

  • The use of a fully resolved dependency key is more deterministic, but it also adds noticeable overhead (~30–60s on cache miss) and may reintroduce strict resolution issues similar to what we saw earlier.

  • Also noticed that source .venv/bin/activate is still present — this might not be necessary if we rely on $GITHUB_PATH for environment setup.

@Sharkyii

@kshirajahere kshirajahere left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice direction overall, especially adding two-layer caching. Few minor changes and update in /CHANGELOG.md

Comment thread .github/workflows/install-and-test.yml Outdated
Comment thread CHANGELOG.md Outdated
@Sharkyii

Sharkyii commented Apr 18, 2026

Copy link
Copy Markdown
Contributor Author

I have now guarded installs with cache hit

  • The use of a fully resolved dependency key is more deterministic, but it also adds noticeable overhead (~30–60s on cache miss) and may reintroduce strict resolution issues similar to what we saw earlier.
  • Also noticed that source .venv/bin/activate is still present — this might not be necessary if we rely on $GITHUB_PATH for environment setup.

The hash is important because, without it, the cache can keep stale packages even after dependencies change. That would mean CI runs against incorrect versions, which defeats the whole point of using a lockfile. The extra 30–60 seconds when dependencies update is a fair trade-off for reliable, deterministic testing.
Also, the source .venv/bin/activate step can’t be removed. The uv pip install commands that follow run in the same shell step and depend on the virtual environment being active. $GITHUB_PATH only impacts later steps in the workflow, not the current one.

Thanks @archit7-beep @kshirajahere

@Sharkyii

Copy link
Copy Markdown
Contributor Author

@sadamov review :)

@sadamov sadamov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! And the speedup is impressive @Sharkyii also thanks to @archit7-beep for their review!

I added some inline suggestions below. Noting the seperate PR for uv lockfile here:

Ephemeral uv.lock as cache key: uv lock is generated fresh on every run (no committed lock file, by design). If any upstream dep releases between two CI runs, the hash changes and the venv cache is invalidated. The stated 80-95% speedup may be optimistic during active periods. This will be resolved once #604 lands - it introduces uv sync with CPU/GPU extras and [tool.uv.sources], which enables a committed uv.lock and makes the cache key fully stable.

Comment thread .github/workflows/install-and-test.yml Outdated
Comment thread .github/workflows/install-and-test.yml Outdated
Comment thread .github/workflows/install-and-test.yml Outdated

@sadamov sadamov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two-layer approach (uv download cache via enable-cache: true + explicit .venv cache) is sound. Two issues below.

Comment thread .github/workflows/install-and-test.yml Outdated
Comment thread .github/workflows/install-and-test.yml Outdated
@Sharkyii

Sharkyii commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

@sadamov updated the code based on the recommended changes..

@sadamov sadamov self-requested a review May 10, 2026 19:39
@leifdenby leifdenby modified the milestones: v0.8.0 (proposed), v0.8.0 May 11, 2026
sadamov added a commit to Sharkyii/neural-lam that referenced this pull request Jun 4, 2026
The original entry landed inside the released v0.6.0 block. Move it
next to the other unreleased maintenance items so the v0.6.0 section
stays frozen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-layer caching for the install-and-test workflow:

- setup-python's built-in pip cache is enabled only for the pip matrix
  jobs (gated on `package_manager == 'pip'`).
- setup-uv enables its package download cache via `enable-cache: true`.
- A new `actions/cache@v4` step caches the `.venv` directory keyed on
  the resolved `uv.lock` hash; on cache hit, the uv install steps are
  skipped entirely.

Stack with mllam#639 preserved (the wheel-index-based torch version
resolution stays as-is).

Refs mllam#605.

Co-Authored-By: Sneh <Sharkyii@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sadamov sadamov force-pushed the ci-caching-clean branch from 04278ba to 750fc3a Compare June 4, 2026 18:35

@sadamov sadamov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking this one as ready. I only rebased on main because of many recent changes there. thanks @Sharkyii! especially with #604 this will be sweet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cicd ready Review complete - proposed for milestone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up CI by caching dependencies and virtual environment

5 participants