Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,11 @@ jobs:
echo "HUGGINGFACE_ACCESS_TOKEN=${{ secrets.HUGGINGFACE_ACCESS_TOKEN }}" >> $GITHUB_ENV
echo "RELEASE_VERSION=${GITHUB_REF##refs/heads/}" | sed 's/\//-/g' >> $GITHUB_ENV

- name: Build and push the images to Docker Hub
- name: Build and push the base image to Docker Hub
uses: docker/bake-action@v2
with:
push: true
targets: base
set: |
*.args.DOCKERHUB_REPO=${{ env.DOCKERHUB_REPO }}
*.args.DOCKERHUB_IMG=${{ env.DOCKERHUB_IMG }}
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ WORKDIR /
RUN uv pip install runpod requests websocket-client

# Add application code and scripts
ADD src/start.sh handler.py test_input.json ./
ADD src/start.sh src/network_volume.py handler.py test_input.json ./
RUN chmod +x /start.sh

# Add script to install custom nodes
Expand Down
9 changes: 4 additions & 5 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ This document outlines the environment variables available for configuring the `

## Logging Configuration

| Environment Variable | Description | Default |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `COMFY_LOG_LEVEL` | Controls ComfyUI's internal logging verbosity. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Use `DEBUG` for troubleshooting, `INFO` for production. | `DEBUG` |
| Environment Variable | Description | Default |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `COMFY_LOG_LEVEL` | Controls ComfyUI's internal logging verbosity. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Use `DEBUG` for troubleshooting, `INFO` for production. | `DEBUG` |
| `NETWORK_VOLUME_DEBUG` | Enable detailed network volume diagnostics in worker logs. Useful for debugging model path issues. See [Network Volumes & Model Paths](network-volumes.md). | `false` |

## Debugging Configuration

Expand All @@ -24,8 +25,6 @@ This document outlines the environment variables available for configuring the `
| `WEBSOCKET_RECONNECT_DELAY_S` | Delay in seconds between websocket reconnection attempts. | `3` |
| `WEBSOCKET_TRACE` | Enable low-level websocket frame tracing for protocol debugging. Set to `true` only when diagnosing connection issues. | `false` |

> [!TIP] > **For troubleshooting:** Set `COMFY_LOG_LEVEL=DEBUG` to get detailed logs when ComfyUI crashes or behaves unexpectedly. This helps identify the exact point of failure in your workflows.

## AWS S3 Upload Configuration

Configure these variables **only** if you want the worker to upload generated images directly to an AWS S3 bucket. If these are not set, images will be returned as base64-encoded strings in the API response.
Expand Down
25 changes: 14 additions & 11 deletions docs/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

This guide covers methods for adding your own models, custom nodes, and static input files into a custom `worker-comfyui`.

> [!TIP] > **Looking for the easiest way to deploy custom workflows?**
> [!TIP]
>
> **Looking for the easiest way to deploy custom workflows?**
>
> [ComfyUI-to-API](https://comfy.getrunpod.io) automatically generates a custom Dockerfile and GitHub repository from your ComfyUI workflow, eliminating the manual setup described below. See the [ComfyUI-to-API Documentation](https://docs.runpod.io/community-solutions/comfyui-to-api/overview) for details.
>
Expand Down Expand Up @@ -90,20 +92,21 @@ Using a Network Volume is primarily useful if you want to manage **models** sepa
1. **Create a Network Volume**:
- Follow the [RunPod Network Volumes guide](https://docs.runpod.io/pods/storage/create-network-volumes) to create a volume in the same region as your endpoint.
2. **Populate the Volume with Models**:
- Use one of the methods described in the RunPod guide (e.g., temporary Pod + `wget`, direct upload) to place your model files into the correct ComfyUI directory structure **within the volume**. The root of the volume corresponds to `/workspace` inside the container.
- Use one of the methods described in the RunPod guide (e.g., temporary Pod + `wget`, direct upload, or the S3-compatible API) to place your model files into the correct ComfyUI directory structure **within the volume**.
- For **serverless endpoints**, the network volume is mounted at `/runpod-volume`, and ComfyUI expects models under `/runpod-volume/models/...`. See [Network Volumes & Model Paths](network-volumes.md) for the exact structure and debugging tips.
```bash
# Example structure inside the Network Volume:
# /models/checkpoints/your_model.safetensors
# /models/loras/your_lora.pt
# /models/vae/your_vae.safetensors
# Example structure inside the Network Volume (serverless worker view):
# /runpod-volume/models/checkpoints/your_model.safetensors
# /runpod-volume/models/loras/your_lora.pt
# /runpod-volume/models/vae/your_vae.safetensors
```
- **Important:** Ensure models are placed in the correct subdirectories (e.g., checkpoints in `models/checkpoints`, LoRAs in `models/loras`).
- **Important:** Ensure models are placed in the correct subdirectories (e.g., checkpoints in `models/checkpoints`, LoRAs in `models/loras`). If models are not detected, enable `NETWORK_VOLUME_DEBUG` as described in [Network Volumes & Model Paths](network-volumes.md).
3. **Configure Your Endpoint**:
- Use the Network Volume in your endpoint configuration:
- Either create a new endpoint or update an existing one (see [Deployment Guide](deployment.md)).
- In the endpoint configuration, under `Advanced > Select Network Volume`, select your Network Volume.

**Note:**

- When a Network Volume is correctly attached, ComfyUI running inside the worker container will automatically detect and load models from the standard directories (`/workspace/models/...`) within that volume.
- This method is **not suitable for installing custom nodes**; use the Custom Dockerfile method for that.
> [!NOTE]
>
> - When a Network Volume is correctly attached, ComfyUI running inside the worker container will automatically detect and load models from the standard directories (`/runpod-volume/models/...`) within that volume (for serverless workers). For directory mapping details and troubleshooting, see [Network Volumes & Model Paths](network-volumes.md).
> - This method is **not suitable for installing custom nodes**; use the Custom Dockerfile method for that.
4 changes: 2 additions & 2 deletions docs/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This is the simplest method if the official images meet your needs.
- Container Registry Credentials: Leave as default (images are public).
- Container Disk: Adjust based on the chosen image tag, see [GPU Recommendations](#gpu-recommendations).
- (optional) Environment Variables: Configure S3 or other settings (see [Configuration Guide](configuration.md)).
- Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models).
- Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models). If models on your network volume are not being detected, see [Network Volumes & Model Paths](network-volumes.md) for troubleshooting steps.
- Click on `Save Template`

### Create your endpoint
Expand All @@ -32,7 +32,7 @@ This is the simplest method if the official images meet your needs.
- Idle Timeout: `5` (Default is usually fine, adjust if needed).
- Flash Boot: `enabled` (Recommended for faster worker startup).
- Select Template: `worker-comfyui` (or the name you gave your template).
- (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models).
- (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models). For detailed model path layout and debugging tips, see [Network Volumes & Model Paths](network-volumes.md).

- Click `deploy`
- Your endpoint will be created. You can click on it to view the dashboard and find its ID.
Expand Down
147 changes: 147 additions & 0 deletions docs/network-volumes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Network Volumes & Model Paths

This document explains how to use RunPod **Network Volumes** with `worker-comfyui`, how model paths are resolved inside the container, and how to debug cases where models are not detected.

> **Scope**
>
> These instructions apply to **serverless endpoints** using this worker. Pods mount network volumes at `/workspace` by default, while serverless workers see them at `/runpod-volume`.

## Directory Mapping

For **serverless endpoints**:

- Network volume root is mounted at: `/runpod-volume`
- ComfyUI models are expected under: `/runpod-volume/models/...`

For **Pods**:

- Network volume root is mounted at: `/workspace`
- Equivalent ComfyUI model path: `/workspace/models/...`

If you use the S3-compatible API, the same paths map as:

- Serverless: `/runpod-volume/my-folder/file.txt`
- Pod: `/workspace/my-folder/file.txt`
- S3 API: `s3://<NETWORK_VOLUME_ID>/my-folder/file.txt`

## Expected Directory Structure

Models must be placed in the following structure on your network volume:

```text
/runpod-volume/
└── models/
├── checkpoints/ # Stable Diffusion checkpoints (.safetensors, .ckpt)
├── loras/ # LoRA files (.safetensors, .pt)
├── vae/ # VAE models (.safetensors, .pt)
├── clip/ # CLIP models (.safetensors, .pt)
├── clip_vision/ # CLIP Vision models
├── controlnet/ # ControlNet models (.safetensors, .pt)
├── embeddings/ # Textual inversion embeddings (.safetensors, .pt)
├── upscale_models/ # Upscaling models (.safetensors, .pt)
├── unet/ # UNet models
└── configs/ # Model configs (.yaml, .json)
```

> **Note**
>
> Only create the subdirectories you actually need; empty or missing folders are fine.

## Supported File Extensions

ComfyUI only recognizes files with specific extensions when scanning model directories.

| Model Type | Supported Extensions |
| -------------- | ------------------------------------------- |
| Checkpoints | `.safetensors`, `.ckpt`, `.pt`, `.pth`, `.bin` |
| LoRAs | `.safetensors`, `.pt` |
| VAE | `.safetensors`, `.pt`, `.bin` |
| CLIP | `.safetensors`, `.pt`, `.bin` |
| ControlNet | `.safetensors`, `.pt`, `.pth`, `.bin` |
| Embeddings | `.safetensors`, `.pt`, `.bin` |
| Upscale Models | `.safetensors`, `.pt`, `.pth` |

Files with other extensions (for example `.txt`, `.zip`) are **ignored** by ComfyUI’s model discovery.

## Common Issues

- **Wrong root directory**
- Models placed directly under `/runpod-volume/checkpoints/...` instead of `/runpod-volume/models/checkpoints/...`.
- **Incorrect extensions**
- Files named without one of the supported extensions are skipped.
- **Empty directories**
- No actual model files present in `models/checkpoints` (or other folders).
- **Volume not attached**
- Endpoint created without selecting a network volume under **Advanced → Select Network Volume**.

If any of the above is true, ComfyUI will silently fail to discover models from the network volume.

## Debugging with `NETWORK_VOLUME_DEBUG`

The worker exposes an opt‑in debug mode controlled via the `NETWORK_VOLUME_DEBUG` environment variable.

### When to Use

Enable this when:

- Models on your network volume are not appearing in ComfyUI
- You suspect the directory structure or file extensions are wrong
- You want to quickly verify what the worker can actually see on `/runpod-volume`

### How to Enable

1. Go to your serverless **Endpoint → Manage → Edit**.
2. Under **Environment Variables**, add:

- `NETWORK_VOLUME_DEBUG=true`

3. Save and wait for workers to restart (or scale to zero and back up).
4. Send any request to your endpoint (even a minimal one) to trigger the diagnostics.

### Reading the Diagnostics

When enabled, each request prints a detailed report to the worker logs, for example:

```text
======================================================================
NETWORK VOLUME DIAGNOSTICS (NETWORK_VOLUME_DEBUG=true)
======================================================================

[1] Checking extra_model_paths.yaml configuration...
✓ FOUND: /comfyui/extra_model_paths.yaml

[2] Checking network volume mount at /runpod-volume...
✓ MOUNTED: /runpod-volume

[3] Checking directory structure...
✓ FOUND: /runpod-volume/models

[4] Scanning model directories...

checkpoints/:
- my-model.safetensors (6.5 GB)

loras/:
- style-lora.safetensors (144.2 MB)

[5] Summary
✓ Models found on network volume!
======================================================================
```

If there is a problem, the diagnostics will instead highlight it, for example:

- Missing `models/` directory
- No valid model files in any subdirectory
- Files present but ignored due to wrong extensions

### Disabling Debug Mode

Once you have resolved your issue, disable diagnostics to keep logs clean:

- Remove the `NETWORK_VOLUME_DEBUG` environment variable, **or**
- Set `NETWORK_VOLUME_DEBUG=false`

This returns the worker to normal behavior without extra log noise.


18 changes: 18 additions & 0 deletions handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@
import tempfile
import socket
import traceback
import logging

from network_volume import (
is_network_volume_debug_enabled,
run_network_volume_diagnostics,
)

# ---------------------------------------------------------------------------
# Logging setup
# ---------------------------------------------------------------------------
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Time to wait between API check attempts in milliseconds
COMFY_API_AVAILABLE_INTERVAL_MS = 50
Expand Down Expand Up @@ -502,6 +514,12 @@ def handler(job):
Returns:
dict: A dictionary containing either an error message or a success status with generated images.
"""
# ---------------------------------------------------------------------------
# Network Volume Diagnostics (opt-in via NETWORK_VOLUME_DEBUG=true)
# ---------------------------------------------------------------------------
if is_network_volume_debug_enabled():
run_network_volume_diagnostics()

job_input = job["input"]
job_id = job["id"]

Expand Down
Loading
Loading