runpod-workers · TimPietruskyRunPod · Dec 3, 2025 · Dec 1, 2025 · Dec 1, 2025 · Dec 1, 2025
diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
@@ -44,10 +44,11 @@ jobs:
           echo "HUGGINGFACE_ACCESS_TOKEN=${{ secrets.HUGGINGFACE_ACCESS_TOKEN }}" >> $GITHUB_ENV
           echo "RELEASE_VERSION=${GITHUB_REF##refs/heads/}" | sed 's/\//-/g' >> $GITHUB_ENV
 
-      - name: Build and push the images to Docker Hub
+      - name: Build and push the base image to Docker Hub
         uses: docker/bake-action@v2
         with:
           push: true
+          targets: base
           set: |
             *.args.DOCKERHUB_REPO=${{ env.DOCKERHUB_REPO }}
             *.args.DOCKERHUB_IMG=${{ env.DOCKERHUB_IMG }}

diff --git a/Dockerfile b/Dockerfile
@@ -74,7 +74,7 @@ WORKDIR /
 RUN uv pip install runpod requests websocket-client
 
 # Add application code and scripts
-ADD src/start.sh handler.py test_input.json ./
+ADD src/start.sh src/network_volume.py handler.py test_input.json ./
 RUN chmod +x /start.sh
 
 # Add script to install custom nodes

diff --git a/docs/configuration.md b/docs/configuration.md
@@ -12,9 +12,10 @@ This document outlines the environment variables available for configuring the `
 
 ## Logging Configuration
 
-| Environment Variable | Description                                                                                                                                                      | Default |
-| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
-| `COMFY_LOG_LEVEL`    | Controls ComfyUI's internal logging verbosity. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Use `DEBUG` for troubleshooting, `INFO` for production. | `DEBUG` |
+| Environment Variable   | Description                                                                                                                                                      | Default |
+| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
+| `COMFY_LOG_LEVEL`      | Controls ComfyUI's internal logging verbosity. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Use `DEBUG` for troubleshooting, `INFO` for production. | `DEBUG` |
+| `NETWORK_VOLUME_DEBUG` | Enable detailed network volume diagnostics in worker logs. Useful for debugging model path issues. See [Network Volumes & Model Paths](network-volumes.md).      | `false` |
 
 ## Debugging Configuration
 
@@ -24,8 +25,6 @@ This document outlines the environment variables available for configuring the `
 | `WEBSOCKET_RECONNECT_DELAY_S`  | Delay in seconds between websocket reconnection attempts.                                                              | `3`     |
 | `WEBSOCKET_TRACE`              | Enable low-level websocket frame tracing for protocol debugging. Set to `true` only when diagnosing connection issues. | `false` |
 
-> [!TIP] > **For troubleshooting:** Set `COMFY_LOG_LEVEL=DEBUG` to get detailed logs when ComfyUI crashes or behaves unexpectedly. This helps identify the exact point of failure in your workflows.
-
 ## AWS S3 Upload Configuration
 
 Configure these variables **only** if you want the worker to upload generated images directly to an AWS S3 bucket. If these are not set, images will be returned as base64-encoded strings in the API response.

diff --git a/docs/customization.md b/docs/customization.md
@@ -2,7 +2,9 @@
 
 This guide covers methods for adding your own models, custom nodes, and static input files into a custom `worker-comfyui`.
 
-> [!TIP] > **Looking for the easiest way to deploy custom workflows?**
+> [!TIP]
+>
+> **Looking for the easiest way to deploy custom workflows?**
 >
 > [ComfyUI-to-API](https://comfy.getrunpod.io) automatically generates a custom Dockerfile and GitHub repository from your ComfyUI workflow, eliminating the manual setup described below. See the [ComfyUI-to-API Documentation](https://docs.runpod.io/community-solutions/comfyui-to-api/overview) for details.
 >
@@ -90,20 +92,21 @@ Using a Network Volume is primarily useful if you want to manage **models** sepa
 1.  **Create a Network Volume**:
     - Follow the [RunPod Network Volumes guide](https://docs.runpod.io/pods/storage/create-network-volumes) to create a volume in the same region as your endpoint.
 2.  **Populate the Volume with Models**:
-    - Use one of the methods described in the RunPod guide (e.g., temporary Pod + `wget`, direct upload) to place your model files into the correct ComfyUI directory structure **within the volume**. The root of the volume corresponds to `/workspace` inside the container.
+    - Use one of the methods described in the RunPod guide (e.g., temporary Pod + `wget`, direct upload, or the S3-compatible API) to place your model files into the correct ComfyUI directory structure **within the volume**.
+    - For **serverless endpoints**, the network volume is mounted at `/runpod-volume`, and ComfyUI expects models under `/runpod-volume/models/...`. See [Network Volumes & Model Paths](network-volumes.md) for the exact structure and debugging tips.
       ```bash
-      # Example structure inside the Network Volume:
-      # /models/checkpoints/your_model.safetensors
-      # /models/loras/your_lora.pt
-      # /models/vae/your_vae.safetensors
+      # Example structure inside the Network Volume (serverless worker view):
+      # /runpod-volume/models/checkpoints/your_model.safetensors
+      # /runpod-volume/models/loras/your_lora.pt
+      # /runpod-volume/models/vae/your_vae.safetensors
       ```
-    - **Important:** Ensure models are placed in the correct subdirectories (e.g., checkpoints in `models/checkpoints`, LoRAs in `models/loras`).
+    - **Important:** Ensure models are placed in the correct subdirectories (e.g., checkpoints in `models/checkpoints`, LoRAs in `models/loras`). If models are not detected, enable `NETWORK_VOLUME_DEBUG` as described in [Network Volumes & Model Paths](network-volumes.md).
 3.  **Configure Your Endpoint**:
     - Use the Network Volume in your endpoint configuration:
       - Either create a new endpoint or update an existing one (see [Deployment Guide](deployment.md)).
       - In the endpoint configuration, under `Advanced > Select Network Volume`, select your Network Volume.
 
-**Note:**
-
-- When a Network Volume is correctly attached, ComfyUI running inside the worker container will automatically detect and load models from the standard directories (`/workspace/models/...`) within that volume.
-- This method is **not suitable for installing custom nodes**; use the Custom Dockerfile method for that.
+> [!NOTE]
+>
+> - When a Network Volume is correctly attached, ComfyUI running inside the worker container will automatically detect and load models from the standard directories (`/runpod-volume/models/...`) within that volume (for serverless workers). For directory mapping details and troubleshooting, see [Network Volumes & Model Paths](network-volumes.md).
+> - This method is **not suitable for installing custom nodes**; use the Custom Dockerfile method for that.
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -16,7 +16,7 @@ This is the simplest method if the official images meet your needs.
   - Container Registry Credentials: Leave as default (images are public).
   - Container Disk: Adjust based on the chosen image tag, see [GPU Recommendations](#gpu-recommendations).
   - (optional) Environment Variables: Configure S3 or other settings (see [Configuration Guide](configuration.md)).
-    - Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models).
+    - Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models). If models on your network volume are not being detected, see [Network Volumes & Model Paths](network-volumes.md) for troubleshooting steps.
 - Click on `Save Template`
 
 ### Create your endpoint
@@ -32,7 +32,7 @@ This is the simplest method if the official images meet your needs.
   - Idle Timeout: `5` (Default is usually fine, adjust if needed).
   - Flash Boot: `enabled` (Recommended for faster worker startup).
   - Select Template: `worker-comfyui` (or the name you gave your template).
-  - (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models).
+  - (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models). For detailed model path layout and debugging tips, see [Network Volumes & Model Paths](network-volumes.md).
 
 - Click `deploy`
 - Your endpoint will be created. You can click on it to view the dashboard and find its ID.

diff --git a/docs/network-volumes.md b/docs/network-volumes.md
@@ -0,0 +1,147 @@
+# Network Volumes & Model Paths
+
+This document explains how to use RunPod **Network Volumes** with `worker-comfyui`, how model paths are resolved inside the container, and how to debug cases where models are not detected.
+
+> **Scope**
+>
+> These instructions apply to **serverless endpoints** using this worker. Pods mount network volumes at `/workspace` by default, while serverless workers see them at `/runpod-volume`.
+
+## Directory Mapping
+
+For **serverless endpoints**:
+
+- Network volume root is mounted at: `/runpod-volume`
+- ComfyUI models are expected under: `/runpod-volume/models/...`
+
+For **Pods**:
+
+- Network volume root is mounted at: `/workspace`
+- Equivalent ComfyUI model path: `/workspace/models/...`
+
+If you use the S3-compatible API, the same paths map as:
+
+- Serverless: `/runpod-volume/my-folder/file.txt`
+- Pod: `/workspace/my-folder/file.txt`
+- S3 API: `s3://<NETWORK_VOLUME_ID>/my-folder/file.txt`
+
+## Expected Directory Structure
+
+Models must be placed in the following structure on your network volume:
+
+```text
+/runpod-volume/
+└── models/
+    ├── checkpoints/      # Stable Diffusion checkpoints (.safetensors, .ckpt)
+    ├── loras/            # LoRA files (.safetensors, .pt)
+    ├── vae/              # VAE models (.safetensors, .pt)
+    ├── clip/             # CLIP models (.safetensors, .pt)
+    ├── clip_vision/      # CLIP Vision models
+    ├── controlnet/       # ControlNet models (.safetensors, .pt)
+    ├── embeddings/       # Textual inversion embeddings (.safetensors, .pt)
+    ├── upscale_models/   # Upscaling models (.safetensors, .pt)
+    ├── unet/             # UNet models
+    └── configs/          # Model configs (.yaml, .json)
+```
+
+> **Note**
+>
+> Only create the subdirectories you actually need; empty or missing folders are fine.
+
+## Supported File Extensions
+
+ComfyUI only recognizes files with specific extensions when scanning model directories.
+
+| Model Type     | Supported Extensions                        |
+| -------------- | ------------------------------------------- |
+| Checkpoints    | `.safetensors`, `.ckpt`, `.pt`, `.pth`, `.bin` |
+| LoRAs          | `.safetensors`, `.pt`                       |
+| VAE            | `.safetensors`, `.pt`, `.bin`               |
+| CLIP           | `.safetensors`, `.pt`, `.bin`               |
+| ControlNet     | `.safetensors`, `.pt`, `.pth`, `.bin`       |
+| Embeddings     | `.safetensors`, `.pt`, `.bin`               |
+| Upscale Models | `.safetensors`, `.pt`, `.pth`               |
+
+Files with other extensions (for example `.txt`, `.zip`) are **ignored** by ComfyUI’s model discovery.
+
+## Common Issues
+
+- **Wrong root directory**
+  - Models placed directly under `/runpod-volume/checkpoints/...` instead of `/runpod-volume/models/checkpoints/...`.
+- **Incorrect extensions**
+  - Files named without one of the supported extensions are skipped.
+- **Empty directories**
+  - No actual model files present in `models/checkpoints` (or other folders).
+- **Volume not attached**
+  - Endpoint created without selecting a network volume under **Advanced → Select Network Volume**.
+
+If any of the above is true, ComfyUI will silently fail to discover models from the network volume.
+
+## Debugging with `NETWORK_VOLUME_DEBUG`
+
+The worker exposes an opt‑in debug mode controlled via the `NETWORK_VOLUME_DEBUG` environment variable.
+
+### When to Use
+
+Enable this when:
+
+- Models on your network volume are not appearing in ComfyUI
+- You suspect the directory structure or file extensions are wrong
+- You want to quickly verify what the worker can actually see on `/runpod-volume`
+
+### How to Enable
+
+1. Go to your serverless **Endpoint → Manage → Edit**.
+2. Under **Environment Variables**, add:
+
+   - `NETWORK_VOLUME_DEBUG=true`
+
+3. Save and wait for workers to restart (or scale to zero and back up).
+4. Send any request to your endpoint (even a minimal one) to trigger the diagnostics.
+
+### Reading the Diagnostics
+
+When enabled, each request prints a detailed report to the worker logs, for example:
+
+```text
+======================================================================
+NETWORK VOLUME DIAGNOSTICS (NETWORK_VOLUME_DEBUG=true)
+======================================================================
+
+[1] Checking extra_model_paths.yaml configuration...
+    ✓ FOUND: /comfyui/extra_model_paths.yaml
+
+[2] Checking network volume mount at /runpod-volume...
+    ✓ MOUNTED: /runpod-volume
+
+[3] Checking directory structure...
+    ✓ FOUND: /runpod-volume/models
+
+[4] Scanning model directories...
+
+    checkpoints/:
+      - my-model.safetensors (6.5 GB)
+
+    loras/:
+      - style-lora.safetensors (144.2 MB)
+
+[5] Summary
+    ✓ Models found on network volume!
+======================================================================
+```
+
+If there is a problem, the diagnostics will instead highlight it, for example:
+
+- Missing `models/` directory
+- No valid model files in any subdirectory
+- Files present but ignored due to wrong extensions
+
+### Disabling Debug Mode
+
+Once you have resolved your issue, disable diagnostics to keep logs clean:
+
+- Remove the `NETWORK_VOLUME_DEBUG` environment variable, **or**
+- Set `NETWORK_VOLUME_DEBUG=false`
+
+This returns the worker to normal behavior without extra log noise.
+
+
diff --git a/handler.py b/handler.py
@@ -13,6 +13,18 @@
 import tempfile
 import socket
 import traceback
+import logging
+
+from network_volume import (
+    is_network_volume_debug_enabled,
+    run_network_volume_diagnostics,
+)
+
+# ---------------------------------------------------------------------------
+# Logging setup
+# ---------------------------------------------------------------------------
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
 
 # Time to wait between API check attempts in milliseconds
 COMFY_API_AVAILABLE_INTERVAL_MS = 50
@@ -502,6 +514,12 @@ def handler(job):
     Returns:
         dict: A dictionary containing either an error message or a success status with generated images.
     """
+    # ---------------------------------------------------------------------------
+    # Network Volume Diagnostics (opt-in via NETWORK_VOLUME_DEBUG=true)
+    # ---------------------------------------------------------------------------
+    if is_network_volume_debug_enabled():
+        run_network_volume_diagnostics()
+
     job_input = job["input"]
     job_id = job["id"]