Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion papers/matthew_feickert/cuda.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,6 @@ This made the packages less visible and required additional knowledge to use.

In [2023](https://youtu.be/WgKwlGgVzYE?si=hfyAo6qLma8hnJ-N), NVIDIA began adding the releases of CUDA conda packages from the `nvidia` channel to conda-forge, making it easier to discover and allowing for community support.
Given the new package structure, NVIDIA added the packages for CUDA `12.0` to indicate the breaking change.
Also with significant advancements in system driver specification support, CUDA `12` became the first version of CUDA to be released as conda packages through conda-forge and included all CUDA libraries from the [CUDA compiler `nvcc`](https://github.com/conda-forge/cuda-nvcc-feedstock) to the [CUDA development libraries](https://github.com/conda-forge/cuda-libraries-dev-feedstock).
Also, with significant advancements in system driver specification support, CUDA `12` became the first version of CUDA to be released as conda packages through conda-forge and included all CUDA libraries from the [CUDA compiler `nvcc`](https://github.com/conda-forge/cuda-nvcc-feedstock) to the [CUDA development libraries](https://github.com/conda-forge/cuda-libraries-dev-feedstock).
[CUDA metapackages](https://github.com/conda-forge/cuda-feedstock/) were also released, which allow users to easily describe the version of CUDA they require (e.g. `cuda-version=12.5`) and the CUDA conda packages they want (e.g. `cuda`).
This significantly improved the ability for researchers to easily create CUDA accelerated computing environments.
6 changes: 3 additions & 3 deletions papers/matthew_feickert/linux-containers.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## Deploying environments to remote compute

Often researchers are running scientific and machine learning workflows on remote computational resources that use batch computing systems (e.g. HTCondor, SLURM).
For systems with shared filesystems (e.g. SLURM) it is possible to use Pixi workspaces in workflows in a similar manner to local machine (e.g. laptop or workstation).
Other systems (e.g. HTCondor) do not have a shared filesystem (e.g. HTCondor), requiring that each worker node receive its own copy of the software environment.
Often researchers are running scientific and machine learning workflows on remote computational resources that use batch computing systems (e.g. HTCondor, Slurm).
For systems with shared filesystems (e.g. Slurm) it is possible to use Pixi workspaces in workflows in a similar manner to local machine (e.g. laptop or workstation).
Other systems (e.g. HTCondor) may not have a shared filesystem, requiring that each worker node receive its own copy of the software environment.
While locked Pixi environments significantly help with this, it is often advantageous to distribute the environment in the form of a Linux container image to the compute resources.
These systems are able to mount Linux container images to worker nodes in ways that reduce the disk and memory cost to the user's session, compared to installing Pixi and then downloading all dependencies of the software environment from the package indexes used.
This also reduces the bandwidth use as the Linux container image can be cached at the compute resource host and efficiently replicated to the worker nodes, paying the bandwidth cost of download once.
Expand Down
1 change: 1 addition & 0 deletions papers/matthew_feickert/mybib.bib
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ @software{Feickert_Reproducible_Machine_Learning
license = {CC-BY-4.0},
title = {{Reproducible Machine Learning Workflows for Scientists}},
url = "https://github.com/carpentries-incubator/reproducible-ml-workflows",
doi = {10.5281/zenodo.17537698},
year = {2025}
}

Expand Down
1 change: 0 additions & 1 deletion papers/matthew_feickert/myst.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ project:
- ucx-split-feedstock-pr-14
- pixi-docs
- pixi-pack
- Feickert_Reproducible_Machine_Learning
exports:
- id: pdf
format: typst
Expand Down
8 changes: 4 additions & 4 deletions papers/matthew_feickert/pixi.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ These features become powerful when combined with robust behaviors

1. **Automatic lock files**: Any changes to a Pixi workspace that can mutate the environments defined in it will automatically and non-optionally result in the Pixi lock file for the workspace being updated.
This ensures that any state of a Pixi project is trivially computationally reproducible.
1. **Solving environments for other platforms**: Pixi allows the user to solve environment for platforms other than the current user machine's.
This allows for users to solve and share environment to any collaborator with confidence that all environments will work with no additional setup.
1. **Solving environments for other platforms**: Pixi allows the user to solve environments for platforms other than the current user machine's.
This allows for users to solve and share environments to any collaborator with confidence that all environments will work with no additional setup.
1. **Pairity of conda and Python packages**: Pixi allows for conda packages and Python packages to be used together seamlessly, and is unique in its ability to handle overlap in dependencies between them.
Pixi will first solve all conda package requirements for the target environment, lock the environment, and then solve all the dependencies of the Python packages for the environment, determine if there are any overlaps with the existing conda environment, and the only install the missing Python dependencies.
This ensures allows for fully reproducible solves and for the two package ecosystems to compliment each other rather than potentially cause conflicts.
Pixi will first solve all conda package requirements for the target environment, lock the environment, and then solve all the dependencies of the Python packages for the environment, determine if there are any overlaps with the existing conda environment, and then only install the missing Python dependencies.
This allows for fully reproducible solves and for the two package ecosystems to compliment each other, rather than potentially cause conflicts.
1. **Efficient caching**: Pixi uses an efficient global cache shared between all Pixi projects and globally installed tools on a machine.
The first time Pixi installs a package it will download the files to the global cache and link the files into the environment.
When Pixi has to reinstall the same package in a different environment, the package will be linked from the same cache, making sure internet bandwidth for downloads and disk space is used as efficiently as possible.
Expand Down