-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Nomad version
Nomad v1.9.13+ent
Issue
I have a node with 4 GPUs which are correctly fingerprinted by nomad but certain combinations of tasks / device counts are rejected with "missing devices" even though the total number of requested GPU instances never exceeds the number of fingerprinted GPUS (4).
I can run a job with a single task group and a device nvidia/gpu count of 4.
I cannot run a job with two tasks where the device nvidia/gpu count per task is 2.
The plan is rejected with "missing devices".
I can run a job with two tasks where device nvidia/gpu count is 2 in the first task and device nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb count of 2 in the second task.
Node info :
$ nomad node status -verbose <node_id>
[ ... truncated ...]
Device Resource Utilization
nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb[MIG-076513e0-6b48-5e13-ab80-b8be7c9259a6] <none>
nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb[MIG-3e8b3066-e6de-5b42-b70a-c349e5eed28f] <none>
nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb[MIG-6715152b-4058-5804-bf60-7652876231e2] <none>
nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb[MIG-f2187ec1-76f0-5f4d-8caf-e3bc17901c1a] <none>
[ ... truncated ...]
Reproduction steps
This jobs fails with "missing devices" :
job "gpu-test-1" {
namespace = "foo"
node_pool = "foo"
type = "service"
constraint {
attribute = "${attr.unique.hostname}"
value = "bar"
}
group "test" {
count = 1
task "task-one" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
device "nvidia/gpu" {
count = 2
}
}
}
task "task-two" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
device "nvidia/gpu" {
count = 2
}
}
}
}
}
$ nomad plan 1.hcl
+ Job: "gpu-test-1"
+ Task Group: "test" (1 create)
+ Task: "task-one" (forces create)
+ Task: "task-two" (forces create)
Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "test" (failed to place 1 allocation):
* Class "bronze": 2 nodes excluded by filter
* Constraint "${attr.unique.hostname} = bar": 1 nodes excluded by filter
* Constraint "missing devices": 1 nodes excluded by filter
Job Modify Index: 0
To submit the job with version verification run:
nomad job run -check-index 0 1.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
This job however succeeds :
job "gpu-test-2" {
namespace = "foo
node_pool = "foo"
type = "service"
constraint {
attribute = "${attr.unique.hostname}"
value = "bar"
}
group "test" {
count = 1
task "task-one" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
device "nvidia/gpu" {
count = 4
}
}
}
task "task-two" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
}
}
}
}
$ nomad plan 2.hcl
+ Job: "gpu-test-2"
+ Task Group: "test" (1 create)
+ Task: "task-one" (forces create)
+ Task: "task-two" (forces create)
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 0
To submit the job with version verification run:
nomad job run -check-index 0 2.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
Now - it gets even stranger ...
If I target two nvidia/gpu devices in one task and two nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb devices in the second task the job also succeeds ...
job "gpu-test-3" {
namespace = "foo"
node_pool = "foo"
type = "service"
constraint {
attribute = "${attr.unique.hostname}"
value = "bar"
}
group "test" {
count = 1
task "task-one" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
device "nvidia/gpu" {
count = 2
}
}
}
task "task-two" {
driver = "docker"
config {
image = "ubuntu:latest"
args = ["sleep", "infinity"]
}
resources {
cpu = 500
memory = 1024
device "nvidia/gpu/NVIDIA H100 NVL MIG 3g.47gb" {
count = 2
}
}
}
}
}
$ nomad plan 3.hcl
+ Job: "gpu-test-3"
+ Task Group: "test" (1 create)
+ Task: "task-one" (forces create)
+ Task: "task-two" (forces create)
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 0
To submit the job with version verification run:
nomad job run -check-index 0 3.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
Expected Result
Nomad should allow the job operator to use the fingerprinted devices across multiple task groups.
Actual Result
Nomad rejects certain jobs with a "missing device" constrains.
Metadata
Metadata
Assignees
Type
Projects
Status