-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Describe the bug:
I'm trying to deploy 2 deployments. First deployment needs 3 replicas, each requiring 1 GPU, and the second deployment needs 1 replica, but takes up 8 GPUs.
From python, I launch the head
head = Cli(
cluster_id=head_id,
matrix_dir=some_path,
)
and launch 2 workers, which should cover my usecase:
head.start_cluster(
add_workers=2,
slurm={"account": config.launcher.account, "qos": config.launcher.deployment_qos},
enable_grafana=True,
)
Then for both deployments I call
head.deploy_applications(action="add", applications=[application])
While waiting for the models to deploy, I'm seeing on the dashboard that the 3 1-gpu replicas are spread out across the 2 allocated workers, leaving not capacity for the 8 GPU deployment:
Describe how to reproduce:
See procedure above.
Describe the expected behavior:
When there is enough compute for both allocation, I would expect ray, or the logic in matrix to handle and get the correct allocation.
Environment:
Package Version Editable project location
absl-py 2.3.1
aiohappyeyeballs 2.6.1
aiohttp 3.12.14
aiohttp-cors 0.8.1
aiosignal 1.4.0
airportsdata 20250706
alembic 1.16.4
altair 5.5.0
annotated-types 0.7.0
anthropic 0.49.0
antlr4-python3-runtime 4.9.3
anyio 4.9.0
arch 7.2.0
argon2-cffi 25.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
astor 0.8.1
asttokens 3.0.0
async-lru 2.0.5
attrs 25.3.0
audioread 3.0.1
babel 2.17.0
beautifulsoup4 4.13.4
bitsandbytes 0.45.5
black 25.1.0
blake3 1.0.5
bleach 6.2.0
blinker 1.9.0
blosc2 3.6.1
boto3 1.37.33
botocore 1.37.38
Brotli 1.1.0
cachetools 5.5.2
certifi 2025.7.14
cffi 1.17.1
cfgv 3.4.0
charset-normalizer 3.4.2
click 8.2.1
cloudpickle 3.1.1
colorama 0.4.6
colorful 0.5.7
colorlog 6.9.0
comm 0.2.2
compressed-tensors 0.9.2
contourpy 1.3.2
coolname 2.2.0
cupy-cuda12x 13.5.1
cycler 0.12.1
dataclasses-json 0.6.7
datasets 4.0.0
datasketch 1.6.5
debugpy 1.8.15
decorator 5.2.1
defusedxml 0.7.1
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
dnspython 2.7.0
docker-pycreds 0.4.0
einops 0.8.1
email_validator 2.2.0
executing 2.2.0
fair-matrix 0.2.2
fastapi 0.116.1
fastapi-cli 0.0.8
fastapi-cloud-cli 0.1.4
fastcore 1.8.5
fastjsonschema 2.21.1
fastrlock 0.8.3
filelock 3.18.0
fire 0.7.0
flake8 7.2.0
flake8-bugbear 24.12.12
flake8-comprehensions 3.16.0
flake8-docstrings 1.7.0
Flask 3.1.1
Flask-Compress 1.18
fonttools 4.59.0
fqdn 1.5.1
frozenlist 1.7.0
fsspec 2025.3.0
future 1.0.0
genson 1.3.0
gguf 0.10.0
gitdb 4.0.12
GitPython 3.1.44
google-api-core 2.25.1
google-auth 2.40.3
google-genai 1.26.0
googleapis-common-protos 1.70.0
greenlet 3.2.3
grpcio 1.70.0
grpcio-tools 1.70.0
h11 0.16.0
hf-xet 1.1.5
hiplot 0.1.33
httpcore 1.0.9
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.33.4
humanize 4.12.2
hydra-colorlog 1.2.0
hydra-core 1.3.2
hyperopt 0.2.7
identify 2.6.12
idna 3.10
igraph 0.11.8
imageio 2.37.0
importlib_metadata 8.7.0
iniconfig 2.1.0
interegular 0.3.3
iopath 0.1.10
ipykernel 6.29.5
ipython 9.4.0
ipython_pygments_lexers 1.1.1
isoduration 20.11.0
isort 6.0.1
itsdangerous 2.2.0
jedi 0.19.2
Jinja2 3.1.6
jiter 0.10.0
jmespath 1.0.1
joblib 1.5.1
json5 0.12.0
jsonlines 4.0.0
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2025.4.1
jupyter_client 8.6.3
jupyter-console 6.6.3
jupyter_core 5.8.1
jupyter-events 0.12.0
jupyter-lsp 2.2.6
jupyter_server 2.16.0
jupyter_server_terminals 0.5.3
jupyterlab 4.4.0
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
kaggle 1.6.3
kiwisolver 1.4.8
kornia 0.8.0
kornia_rs 0.1.9
lark 1.2.2
lazy_loader 0.4
libcst 1.5.1
librosa 0.11.0
lightgbm 4.6.0
line_profiler 4.2.0
litellm 1.65.7
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.10.11
loguru 0.7.2
lovely-numpy 0.2.13
lovely-tensors 0.1.18
Mako 1.3.10
markdown-it-py 3.0.0
markovify 0.9.4
MarkupSafe 3.0.2
marshmallow 3.26.1
matplotlib 3.10.1
matplotlib-inline 0.1.7
mccabe 0.7.0
mdurl 0.1.2
mistral_common 1.8.1
mistune 3.1.3
mpmath 1.3.0
msgpack 1.1.1
msgspec 0.19.0
multidict 6.6.3
multiprocess 0.70.16
mypy 1.15.0
mypy-extensions 1.0.0
nanobind 2.8.0
narwhals 1.48.0
nbclient 0.10.2
nbconvert 7.16.6
nbformat 5.10.4
ndindex 1.10.0
nest-asyncio 1.6.0
networkx 3.5
ninja 1.11.1.4
nodeenv 1.9.1
notebook_shim 0.2.4
numba 0.61.0
numexpr 2.11.0
numpy 2.1.3
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-ml-py 12.575.51
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
omegaconf 2.3.0
openai 1.72.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.12.0.88
optuna 4.2.1
outlines 0.1.11
outlines_core 0.1.26
overrides 7.7.0
packaging 24.2
pandas 2.2.3
pandas-stubs 2.2.3.250308
pandocfilters 1.5.1
parso 0.8.4
partial-json-parser 0.2.1.1.post6
pathspec 0.12.1
patsy 1.0.1
pexpect 4.9.0
pillow 11.3.0
pip 25.0
platformdirs 4.3.8
plotly 6.1.2
pluggy 1.6.0
pooch 1.8.2
portalocker 3.2.0
pre_commit 4.2.0
prometheus_client 0.22.1
prometheus-fastapi-instrumentator 7.1.0
prompt_toolkit 3.0.51
propcache 0.3.2
proto-plus 1.26.1
protobuf 5.29.5
psutil 7.0.0
ptyprocess 0.7.0
pur 7.3.3
pure_eval 0.2.3
py-cpuinfo 9.0.0
py-spy 0.4.0
py4j 0.10.9.9
pyaml 25.7.0
pyarrow 21.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.2
pycodestyle 2.13.0
pycountry 24.6.1
pycparser 2.22
pydantic 2.11.7
pydantic_core 2.33.2
pydantic-extra-types 2.10.5
pydeck 0.9.1
pydocstyle 6.3.0
pyflakes 3.3.2
Pygments 2.19.2
pynvml 12.0.0
pyparsing 3.2.3
pytest 8.3.5
python-dateutil 2.9.0.post0
python-dotenv 1.1.1
python-json-logger 3.3.0
python-multipart 0.0.20
python-slugify 8.0.4
pytz 2025.2
PyYAML 6.0.2
pyzmq 27.0.0
pyzstd 0.17.0
ray 2.43.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.4
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 14.0.0
rich-toolkit 0.14.8
rignore 0.6.4
rliable 1.2.0
rpds-py 0.26.0
rsa 4.9.1
ruff 0.12.0
s3fs 0.4.2
s3transfer 0.11.5
safetensors 0.5.3
scikit-image 0.25.2
scikit-learn 1.6.1
scikit-optimize 0.10.2
scipy 1.15.2
seaborn 0.13.2
Send2Trash 1.8.3
sentence-transformers 4.1.0
sentencepiece 0.2.0
sentry-sdk 2.33.1
setproctitle 1.3.6
setuptools 80.9.0
shellingham 1.5.4
shutup 0.2.0
six 1.17.0
sklearn-pandas 2.2.0
smart_open 7.3.0.post1
smmap 5.0.2
sniffio 1.3.1
snowballstemmer 3.0.1
soundfile 0.13.1
soupsieve 2.7
soxr 0.5.0.post1
SQLAlchemy 2.0.41
stack-data 0.6.3
starlette 0.47.2
statsmodels 0.14.5
streamlit 1.44.1
submitit 1.5.2
sympy 1.13.1
tables 3.10.2
tabulate 0.9.0
tenacity 8.5.0
termcolor 3.1.0
terminado 0.18.1
text-unidecode 1.3
texttable 1.7.0
threadpoolctl 3.6.0
tifffile 2025.6.11
tiktoken 0.9.0
tinycss2 1.4.0
tokenize_rt 6.2.0
tokenizers 0.21.2
toml 0.10.2
torch 2.6.0
torchaudio 2.6.0
TorchFix 0.7.0
torchvision 0.21.0
tornado 6.5.1
tqdm 4.67.1
traitlets 5.14.3
transformers 4.53.2
triton 3.2.0
typer 0.16.0
types-psutil 7.0.0.20250401
types-python-dateutil 2.9.0.20250708
types-pytz 2025.2.0.20250516
typing_extensions 4.13.2
typing-inspect 0.9.0
typing-inspection 0.4.1
tzdata 2025.2
Unidecode 1.3.8
uri-template 1.3.0
urllib3 2.5.0
uvicorn 0.35.0
uvloop 0.21.0
virtualenv 20.32.0
vllm 0.8.3
wandb 0.19.9
watchdog 6.0.0
watchfiles 1.1.0
wcwidth 0.2.13
webcolors 24.11.1
webencodings 0.5.1
websocket-client 1.8.0
websockets 15.0.1
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.29.post2
xgrammar 0.1.17
xlrd 2.0.1
xxhash 3.5.0
yamllint 1.37.0
yarl 1.20.1
zipp 3.23.0