I was trying to load XLabs-AI/flux-controlnet-depth-v3 for inference, using the checkpoint flux-dev-fp8 with the switch "offload". Image size 1024x512.
It still gives CUDA OOM on RTX4090 (24GB VRAM). What is the minimal VRAM requirement to load ControlNet for inference? Is there FP8 version of ControlNets or is there any caveat to get it work? It feels outrageous having to use A100 just for running inference.....
NB: without loading the ControlNet, the inference is possible with 24GB VRAM. The observed peak VRAM usage is just about 14GB