Describe the bug
Yesterday when 1.0.71 released, grabbed it, downloaded 3.6 27b 8bit and ran it in my cluster. it worked very WELL all night actually using qwen code.
Woke up this morning, tried to do one prompt, it immediately failed the entire cluster with error Changing queue pair to RTR failed with errno 96.
The problem now is that it's failing to load completely! it is absolutely failing to load completely. I restarted exo on both machines and yeah it's dead..
`[ 2026-04-24 09:03:00.825 | INFO | exo.main:main:275 ] ========================================
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:276 ] Starting EXO | pid=32378
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:277 ] ========================================
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:278 ] EXO_LIBP2P_NAMESPACE: 1.0.71
[ 2026-04-24 09:03:00.829 | INFO | exo.main:create:69 ] Starting node 12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz
[ 2026-04-24 09:03:00.856 | INFO | exo.shared.election:run:87 ] Starting Election
[ 2026-04-24 09:03:00.856 | INFO | exo.download.coordinator:run:134 ] Starting DownloadCoordinator
[ 2026-04-24 09:03:00.856 | INFO | exo.worker.main:run:101 ] Starting Worker
[ 2026-04-24 09:03:00.856 | INFO | exo.master.main:run:101 ] Starting Master
[ 2026-04-24 09:03:00.856 | INFO | exo.api.main:run:1766 ] Starting API
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to global_events
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to local_events
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to commands
[ 2026-04-24 09:03:00.878 | INFO | exo.main:_elect_loop:200 ] Node elected Master
[ 2026-04-24 09:03:00.878 | INFO | exo.api.main:unpause:293 ] Unpausing API
[ 2026-04-24 09:03:00.878 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to election_messages
[ 2026-04-24 09:03:00.878 | INFO | logging:handle:1681 ] Running on http://0.0.0.0:52415 (CTRL + C to quit)
[ 2026-04-24 09:03:00.878 | INFO | logging:handle:1681 ] Running on http://0.0.0.0:52415 (CTRL + C to quit)
[ 2026-04-24 09:03:00.879 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to connection_messages
[ 2026-04-24 09:03:00.880 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to download_commands
[ 2026-04-24 09:03:06.038 | INFO | exo.shared.election:_campaign:197 ] Waiting for other campaign to finish
[ 2026-04-24 09:03:09.040 | INFO | exo.main:_elect_loop:200 ] Node elected Master
[ 2026-04-24 09:03:09.041 | INFO | exo.api.main:unpause:293 ] Unpausing API
[ 2026-04-24 09:03:09.620 | INFO | exo.master.main:_command_processor:122 ] Executing command: RequestEventLog(command_id='54e9ed3a-7e54-475f-bbf7-6d4fcd92cb0e' since_idx=0)
[ 2026-04-24 09:03:31.265 | INFO | exo.master.main:_command_processor:122 ] Executing command: CreateInstance(command_id='7a94046f-445d-47ef-8513-0ac7db40bd0d' instance=MlxJacclInstance(instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0', shard_assignments=ShardAssignments(model_id='mlx-community/Qwen3.6-27B-8bit', runner_to_shard={'62e6b023-e928-43f7-a20a-685deeea1357': TensorShardMetadata(model_card=ModelCard(model_id='mlx-community/Qwen3.6-27B-8bit', storage_size=Memory.from_bytes(29500938720), n_layers=64, hidden_size=5120, supports_tensor=True, num_key_value_heads=4, tasks=[<ModelTask.TextGeneration: 'TextGeneration'>], components=None, family='qwen', quantization='8bit', base_model='Qwen3.6 27B', capabilities=['text', 'thinking', 'thinking_toggle', 'vision'], context_length=262144, uses_cfg=False, trust_remote_code=True, is_custom=False, vision=VisionCardConfig(image_token_id=248056, model_type='qwen3_5', weights_repo='mlx-community/Qwen3.6-27B-8bit', image_token=None, processor_repo=None), sampling_defaults=SamplingDefaults(temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None, thinking=None, non_thinking=SamplingValues(temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None))), device_rank=0, world_size=2, immediate_exception=False, should_timeout=None, start_layer=0, end_layer=64, n_layers=64), '4027544f-ee09-4146-b59f-bc2883d764df': TensorShardMetadata(model_card=ModelCard(model_id='mlx-community/Qwen3.6-27B-8bit', storage_size=Memory.from_bytes(29500938720), n_layers=64, hidden_size=5120, supports_tensor=True, num_key_value_heads=4, tasks=[<ModelTask.TextGeneration: 'TextGeneration'>], components=None, family='qwen', quantization='8bit', base_model='Qwen3.6 27B', capabilities=['text', 'thinking', 'thinking_toggle', 'vision'], context_length=262144, uses_cfg=False, trust_remote_code=True, is_custom=False, vision=VisionCardConfig(image_token_id=248056, model_type='qwen3_5', weights_repo='mlx-community/Qwen3.6-27B-8bit', image_token=None, processor_repo=None), sampling_defaults=SamplingDefaults(temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None, thinking=None, non_thinking=SamplingValues(temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None))), device_rank=1, world_size=2, immediate_exception=False, should_timeout=None, start_layer=0, end_layer=64, n_layers=64)}, node_to_runner={'12D3KooWA6ArqMz963AzT974o1U7gBmcLyFhbA3aY2vWMXYBau8M': '62e6b023-e928-43f7-a20a-685deeea1357', '12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz': '4027544f-ee09-4146-b59f-bc2883d764df'}), jaccl_devices=[[None, 'rdma_en3'], ['rdma_en3', None]], jaccl_coordinators={'12D3KooWA6ArqMz963AzT974o1U7gBmcLyFhbA3aY2vWMXYBau8M': '0.0.0.0:63657', '12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz': '192.168.8.220:63657'}))
[ 2026-04-24 09:03:31.307 | INFO | exo.worker.main:plan_step:214 ] Worker plan: CreateRunner
[ 2026-04-24 09:03:31.719 | INFO | exo.worker.runner.bootstrap:entrypoint:34 ] Fast synch flag: 1
[ 2026-04-24 09:03:32.861 | INFO | exo.worker.runner.llm_inference.runner:init:93 ] hello from the runner
[ 2026-04-24 09:03:32.861 | INFO | exo.worker.runner.llm_inference.runner:init:113 ] runner created
[ 2026-04-24 09:03:32.931 | INFO | exo.worker.main:plan_step:214 ] Worker plan: ConnectToGroup
[ 2026-04-24 09:03:32.932 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task ConnectToGroup(task_id='146af62d-d7eb-4733-b4d0-8905c88d73ea' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0')
[ 2026-04-24 09:03:32.932 | INFO | exo.worker.runner.llm_inference.runner:handle_first_task:149 ] runner connecting
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:97 ] Starting initialization for rank 1
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:135 ] rank 1 MLX_IBV_DEVICES: /var/folders/41/670wc_gs2y93f8rs0330cc780000gn/T/tmps978zbfw/hosts_90257cdb-2d84-4bc3-bfc6-e812da010fd0_1.json with devices: [[null, "rdma_en3"], ["rdma_en3", null]]
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:138 ] rank 1 MLX_JACCL_COORDINATOR: 192.168.8.220:63657
[ 2026-04-24 09:03:32.966 | WARNING | exo.worker.runner.bootstrap:entrypoint:59 ] Runner 4027544f-ee09-4146-b59f-bc2883d764df crashed with critical exception [jaccl] Changing queue pair to RTR failed with errno 96
Traceback (most recent call last):
File "main.py", line 38, in
File "pyi_rth_multiprocessing.py", line 48, in _freeze_support
File "multiprocessing/spawn.py", line 122, in spawn_main
File "multiprocessing/spawn.py", line 135, in _main
File "multiprocessing/process.py", line 313, in _bootstrap
File "multiprocessing/process.py", line 108, in run
File "exo/worker/runner/bootstrap.py", line 54, in entrypoint
File "exo/worker/runner/llm_inference/runner.py", line 139, in main
File "exo/worker/runner/llm_inference/runner.py", line 153, in handle_first_task
File "exo/worker/engines/mlx/utils_mlx.py", line 159, in initialize_mlx
File "exo/worker/engines/mlx/utils_mlx.py", line 142, in mlx_distributed_init
ValueError: [jaccl] Changing queue pair to RTR failed with errno 96
[ 2026-04-24 09:03:32.967 | INFO | exo.worker.runner.bootstrap:entrypoint:75 ] bye from the runner
[ 2026-04-24 09:03:32.968 | INFO | exo.worker.runner.runner_supervisor:_check_runner:255 ] Checking runner's status
[ 2026-04-24 09:03:32.968 | INFO | exo.worker.runner.runner_supervisor:_check_runner:257 ] Runner was found to be alive, attempting to join process
[ 2026-04-24 09:03:33.036 | INFO | exo.worker.main:plan_step:214 ] Worker plan: Shutdown
[ 2026-04-24 09:03:33.036 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task Shutdown(task_id='e772911c-f436-4862-a195-99b784fd1fe6' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df')
[ 2026-04-24 09:03:33.037 | WARNING | exo.worker.runner.runner_supervisor:start_task:190 ] Task Shutdown(task_id='e772911c-f436-4862-a195-99b784fd1fe6' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df') dropped, runner closed communication.
[ 2026-04-24 09:03:33.138 | INFO | exo.worker.main:plan_step:214 ] Worker plan: CreateRunner
[ 2026-04-24 09:03:33.252 | INFO | exo.worker.main:plan_step:214 ] Worker plan: Shutdown
[ 2026-04-24 09:03:33.252 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task Shutdown(task_id='3c898095-920c-46be-8d01-c6bd78446298' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df')
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:_check_runner:260 ] Runner exited with exit code 0
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:run:118 ] Runner supervisor shutting down
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:run:164 ] Runner process succesfully terminated
[ 2026-04-24 09:03:33.540 | INFO | exo.worker.runner.bootstrap:entrypoint:34 ] Fast synch flag: 1`
To Reproduce
Steps to reproduce the behavior:
- Vibe hard uysing qwen code on RDMA + Tensor
- Prompt
- or reload
Expected behavior
works...
Actual behavior
Fails
Environment
- macOS Version: 26.4.1
- EXO Version: 1.0.71
- Hardware:
- M4 max 128gb
- M3 ultra 96gb
Additional context
Add any other context about the problem here.
Describe the bug
Yesterday when 1.0.71 released, grabbed it, downloaded 3.6 27b 8bit and ran it in my cluster. it worked very WELL all night actually using qwen code.
Woke up this morning, tried to do one prompt, it immediately failed the entire cluster with error Changing queue pair to RTR failed with errno 96.
The problem now is that it's failing to load completely! it is absolutely failing to load completely. I restarted exo on both machines and yeah it's dead..
`[ 2026-04-24 09:03:00.825 | INFO | exo.main:main:275 ] ========================================
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:276 ] Starting EXO | pid=32378
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:277 ] ========================================
[ 2026-04-24 09:03:00.826 | INFO | exo.main:main:278 ] EXO_LIBP2P_NAMESPACE: 1.0.71
[ 2026-04-24 09:03:00.829 | INFO | exo.main:create:69 ] Starting node 12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz
[ 2026-04-24 09:03:00.856 | INFO | exo.shared.election:run:87 ] Starting Election
[ 2026-04-24 09:03:00.856 | INFO | exo.download.coordinator:run:134 ] Starting DownloadCoordinator
[ 2026-04-24 09:03:00.856 | INFO | exo.worker.main:run:101 ] Starting Worker
[ 2026-04-24 09:03:00.856 | INFO | exo.master.main:run:101 ] Starting Master
[ 2026-04-24 09:03:00.856 | INFO | exo.api.main:run:1766 ] Starting API
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to global_events
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to local_events
[ 2026-04-24 09:03:00.876 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to commands
[ 2026-04-24 09:03:00.878 | INFO | exo.main:_elect_loop:200 ] Node elected Master
[ 2026-04-24 09:03:00.878 | INFO | exo.api.main:unpause:293 ] Unpausing API
[ 2026-04-24 09:03:00.878 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to election_messages
[ 2026-04-24 09:03:00.878 | INFO | logging:handle:1681 ] Running on http://0.0.0.0:52415 (CTRL + C to quit)
[ 2026-04-24 09:03:00.878 | INFO | logging:handle:1681 ] Running on http://0.0.0.0:52415 (CTRL + C to quit)
[ 2026-04-24 09:03:00.879 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to connection_messages
[ 2026-04-24 09:03:00.880 | INFO | exo.routing.router:_networking_subscribe:182 ] Subscribed to download_commands
[ 2026-04-24 09:03:06.038 | INFO | exo.shared.election:_campaign:197 ] Waiting for other campaign to finish
[ 2026-04-24 09:03:09.040 | INFO | exo.main:_elect_loop:200 ] Node elected Master
[ 2026-04-24 09:03:09.041 | INFO | exo.api.main:unpause:293 ] Unpausing API
[ 2026-04-24 09:03:09.620 | INFO | exo.master.main:_command_processor:122 ] Executing command: RequestEventLog(command_id='54e9ed3a-7e54-475f-bbf7-6d4fcd92cb0e' since_idx=0)
[ 2026-04-24 09:03:31.265 | INFO | exo.master.main:_command_processor:122 ] Executing command: CreateInstance(command_id='7a94046f-445d-47ef-8513-0ac7db40bd0d' instance=MlxJacclInstance(instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0', shard_assignments=ShardAssignments(model_id='mlx-community/Qwen3.6-27B-8bit', runner_to_shard={'62e6b023-e928-43f7-a20a-685deeea1357': TensorShardMetadata(model_card=ModelCard(model_id='mlx-community/Qwen3.6-27B-8bit', storage_size=Memory.from_bytes(29500938720), n_layers=64, hidden_size=5120, supports_tensor=True, num_key_value_heads=4, tasks=[<ModelTask.TextGeneration: 'TextGeneration'>], components=None, family='qwen', quantization='8bit', base_model='Qwen3.6 27B', capabilities=['text', 'thinking', 'thinking_toggle', 'vision'], context_length=262144, uses_cfg=False, trust_remote_code=True, is_custom=False, vision=VisionCardConfig(image_token_id=248056, model_type='qwen3_5', weights_repo='mlx-community/Qwen3.6-27B-8bit', image_token=None, processor_repo=None), sampling_defaults=SamplingDefaults(temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None, thinking=None, non_thinking=SamplingValues(temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None))), device_rank=0, world_size=2, immediate_exception=False, should_timeout=None, start_layer=0, end_layer=64, n_layers=64), '4027544f-ee09-4146-b59f-bc2883d764df': TensorShardMetadata(model_card=ModelCard(model_id='mlx-community/Qwen3.6-27B-8bit', storage_size=Memory.from_bytes(29500938720), n_layers=64, hidden_size=5120, supports_tensor=True, num_key_value_heads=4, tasks=[<ModelTask.TextGeneration: 'TextGeneration'>], components=None, family='qwen', quantization='8bit', base_model='Qwen3.6 27B', capabilities=['text', 'thinking', 'thinking_toggle', 'vision'], context_length=262144, uses_cfg=False, trust_remote_code=True, is_custom=False, vision=VisionCardConfig(image_token_id=248056, model_type='qwen3_5', weights_repo='mlx-community/Qwen3.6-27B-8bit', image_token=None, processor_repo=None), sampling_defaults=SamplingDefaults(temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None, thinking=None, non_thinking=SamplingValues(temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, repetition_penalty=1.0, presence_penalty=1.5, frequency_penalty=None))), device_rank=1, world_size=2, immediate_exception=False, should_timeout=None, start_layer=0, end_layer=64, n_layers=64)}, node_to_runner={'12D3KooWA6ArqMz963AzT974o1U7gBmcLyFhbA3aY2vWMXYBau8M': '62e6b023-e928-43f7-a20a-685deeea1357', '12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz': '4027544f-ee09-4146-b59f-bc2883d764df'}), jaccl_devices=[[None, 'rdma_en3'], ['rdma_en3', None]], jaccl_coordinators={'12D3KooWA6ArqMz963AzT974o1U7gBmcLyFhbA3aY2vWMXYBau8M': '0.0.0.0:63657', '12D3KooWDfKzXmqiWvtw2YuCf8cHJUry1Vi6f7PFP7Hh869ZbCtz': '192.168.8.220:63657'}))
[ 2026-04-24 09:03:31.307 | INFO | exo.worker.main:plan_step:214 ] Worker plan: CreateRunner
[ 2026-04-24 09:03:31.719 | INFO | exo.worker.runner.bootstrap:entrypoint:34 ] Fast synch flag: 1
[ 2026-04-24 09:03:32.861 | INFO | exo.worker.runner.llm_inference.runner:init:93 ] hello from the runner
[ 2026-04-24 09:03:32.861 | INFO | exo.worker.runner.llm_inference.runner:init:113 ] runner created
[ 2026-04-24 09:03:32.931 | INFO | exo.worker.main:plan_step:214 ] Worker plan: ConnectToGroup
[ 2026-04-24 09:03:32.932 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task ConnectToGroup(task_id='146af62d-d7eb-4733-b4d0-8905c88d73ea' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0')
[ 2026-04-24 09:03:32.932 | INFO | exo.worker.runner.llm_inference.runner:handle_first_task:149 ] runner connecting
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:97 ] Starting initialization for rank 1
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:135 ] rank 1 MLX_IBV_DEVICES: /var/folders/41/670wc_gs2y93f8rs0330cc780000gn/T/tmps978zbfw/hosts_90257cdb-2d84-4bc3-bfc6-e812da010fd0_1.json with devices: [[null, "rdma_en3"], ["rdma_en3", null]]
[ 2026-04-24 09:03:32.933 | INFO | exo.worker.engines.mlx.utils_mlx:mlx_distributed_init:138 ] rank 1 MLX_JACCL_COORDINATOR: 192.168.8.220:63657
[ 2026-04-24 09:03:32.966 | WARNING | exo.worker.runner.bootstrap:entrypoint:59 ] Runner 4027544f-ee09-4146-b59f-bc2883d764df crashed with critical exception [jaccl] Changing queue pair to RTR failed with errno 96
Traceback (most recent call last):
File "main.py", line 38, in
File "pyi_rth_multiprocessing.py", line 48, in _freeze_support
File "multiprocessing/spawn.py", line 122, in spawn_main
File "multiprocessing/spawn.py", line 135, in _main
File "multiprocessing/process.py", line 313, in _bootstrap
File "multiprocessing/process.py", line 108, in run
File "exo/worker/runner/llm_inference/runner.py", line 139, in main
File "exo/worker/runner/llm_inference/runner.py", line 153, in handle_first_task
File "exo/worker/engines/mlx/utils_mlx.py", line 159, in initialize_mlx
File "exo/worker/engines/mlx/utils_mlx.py", line 142, in mlx_distributed_init
ValueError: [jaccl] Changing queue pair to RTR failed with errno 96
[ 2026-04-24 09:03:32.967 | INFO | exo.worker.runner.bootstrap:entrypoint:75 ] bye from the runner
[ 2026-04-24 09:03:32.968 | INFO | exo.worker.runner.runner_supervisor:_check_runner:255 ] Checking runner's status
[ 2026-04-24 09:03:32.968 | INFO | exo.worker.runner.runner_supervisor:_check_runner:257 ] Runner was found to be alive, attempting to join process
[ 2026-04-24 09:03:33.036 | INFO | exo.worker.main:plan_step:214 ] Worker plan: Shutdown
[ 2026-04-24 09:03:33.036 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task Shutdown(task_id='e772911c-f436-4862-a195-99b784fd1fe6' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df')
[ 2026-04-24 09:03:33.037 | WARNING | exo.worker.runner.runner_supervisor:start_task:190 ] Task Shutdown(task_id='e772911c-f436-4862-a195-99b784fd1fe6' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df') dropped, runner closed communication.
[ 2026-04-24 09:03:33.138 | INFO | exo.worker.main:plan_step:214 ] Worker plan: CreateRunner
[ 2026-04-24 09:03:33.252 | INFO | exo.worker.main:plan_step:214 ] Worker plan: Shutdown
[ 2026-04-24 09:03:33.252 | INFO | exo.worker.runner.runner_supervisor:start_task:182 ] Starting task Shutdown(task_id='3c898095-920c-46be-8d01-c6bd78446298' task_status=<TaskStatus.Pending: 'Pending'> instance_id='90257cdb-2d84-4bc3-bfc6-e812da010fd0' runner_id='4027544f-ee09-4146-b59f-bc2883d764df')
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:_check_runner:260 ] Runner exited with exit code 0
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:run:118 ] Runner supervisor shutting down
[ 2026-04-24 09:03:33.300 | INFO | exo.worker.runner.runner_supervisor:run:164 ] Runner process succesfully terminated
[ 2026-04-24 09:03:33.540 | INFO | exo.worker.runner.bootstrap:entrypoint:34 ] Fast synch flag: 1`
To Reproduce
Steps to reproduce the behavior:
Expected behavior
works...
Actual behavior
Fails
Environment
Additional context
Add any other context about the problem here.