Closed
Description
Hi I'm on macOS 15.4 and I build sd.cpp with Vulkan support as I have a MBP 16 with 5500M integrated GPU and a 6800XT external GPU so have 2 vulkan devices when sd.cpp runs. I noticed when running sd.cpp picks the 5500M and ignores the 6800XT. How to I select which Vulkan device the model runs on?

cmake -B build -DGGML_METAL=OFF -DSD_VULKAN=ON \
-DVulkan_INCLUDE_DIR=/usr/local/Cellar/molten-vk/1.2.11/include \
-DVulkan_LIBRARY=/usr/local/Cellar/molten-vk/1.2.11/lib/libMoltenVK.dylib \
-DOpenMP_ROOT=$(brew --prefix)/opt/libomp \
-DVulkan_GLSLC_EXECUTABLE=$(brew --prefix)/opt/shaderc/bin/glslc \
-DVulkan_GLSLANG_VALIDATOR_EXECUTABLE=$(brew --prefix)/opt/glslang/bin/glslangValidator \
-DOpenMP_C_FLAGS=-fopenmp=lomp \
-DOpenMP_CXX_FLAGS=-fopenmp=lomp \
-DOpenMP_C_LIB_NAMES="libomp" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \
-DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"
cmake --build build --config Release -j 8
15:32:58 ~/Dev/stable-diffusion.cpp master
./build/bin/sd -m ../sd-models/sd-v1-4.ckpt -p "a cat" -v
Option:
n_threads: 8
mode: txt2img
model_path: ../sd-models/sd-v1-4.ckpt
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
diffusion flash attention:false
strength(control): 0.90
prompt: a cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
eta: 0.00
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:174 - Using Vulkan backend
ggml_vulkan: WARNING: Instance extension VK_KHR_portability_enumeration not found.
[mvk-info] MoltenVK version 1.2.12, supporting Vulkan version 1.2.309.
The following 115 Vulkan extensions are supported:
VK_KHR_16bit_storage v1
VK_KHR_8bit_storage v1
VK_KHR_bind_memory2 v1
VK_KHR_buffer_device_address v1
VK_KHR_calibrated_timestamps v1
VK_KHR_copy_commands2 v1
VK_KHR_create_renderpass2 v1
VK_KHR_dedicated_allocation v3
VK_KHR_deferred_host_operations v4
VK_KHR_depth_stencil_resolve v1
VK_KHR_descriptor_update_template v1
VK_KHR_device_group v4
VK_KHR_device_group_creation v1
VK_KHR_driver_properties v1
VK_KHR_dynamic_rendering v1
VK_KHR_external_fence v1
VK_KHR_external_fence_capabilities v1
VK_KHR_external_memory v1
VK_KHR_external_memory_capabilities v1
VK_KHR_external_semaphore v1
VK_KHR_external_semaphore_capabilities v1
VK_KHR_fragment_shader_barycentric v1
VK_KHR_format_feature_flags2 v2
VK_KHR_get_memory_requirements2 v1
VK_KHR_get_physical_device_properties2 v2
VK_KHR_get_surface_capabilities2 v1
VK_KHR_imageless_framebuffer v1
VK_KHR_image_format_list v1
VK_KHR_incremental_present v2
VK_KHR_maintenance1 v2
VK_KHR_maintenance2 v1
VK_KHR_maintenance3 v1
VK_KHR_map_memory2 v1
VK_KHR_multiview v1
VK_KHR_portability_subset v1
VK_KHR_push_descriptor v2
VK_KHR_relaxed_block_layout v1
VK_KHR_sampler_mirror_clamp_to_edge v3
VK_KHR_sampler_ycbcr_conversion v14
VK_KHR_separate_depth_stencil_layouts v1
VK_KHR_shader_draw_parameters v1
VK_KHR_shader_float_controls v4
VK_KHR_shader_float16_int8 v1
VK_KHR_shader_integer_dot_product v1
VK_KHR_shader_non_semantic_info v1
VK_KHR_shader_subgroup_extended_types v1
VK_KHR_shader_terminate_invocation v1
VK_KHR_spirv_1_4 v1
VK_KHR_storage_buffer_storage_class v1
VK_KHR_surface v25
VK_KHR_swapchain v70
VK_KHR_swapchain_mutable_format v1
VK_KHR_synchronization2 v1
VK_KHR_timeline_semaphore v2
VK_KHR_uniform_buffer_standard_layout v1
VK_KHR_variable_pointers v1
VK_KHR_vertex_attribute_divisor v1
VK_KHR_zero_initialize_workgroup_memory v1
VK_EXT_4444_formats v1
VK_EXT_buffer_device_address v2
VK_EXT_calibrated_timestamps v2
VK_EXT_debug_marker v4
VK_EXT_debug_report v10
VK_EXT_debug_utils v2
VK_EXT_descriptor_indexing v2
VK_EXT_depth_clip_control v1
VK_EXT_extended_dynamic_state v1
VK_EXT_extended_dynamic_state2 v1
VK_EXT_extended_dynamic_state3 v2
VK_EXT_external_memory_host v1
VK_EXT_external_memory_metal v1
VK_EXT_fragment_shader_interlock v1
VK_EXT_hdr_metadata v3
VK_EXT_headless_surface v1
VK_EXT_host_image_copy v1
VK_EXT_host_query_reset v1
VK_EXT_image_2d_view_of_3d v1
VK_EXT_image_robustness v1
VK_EXT_inline_uniform_block v1
VK_EXT_layer_settings v2
VK_EXT_memory_budget v1
VK_EXT_metal_objects v2
VK_EXT_metal_surface v1
VK_EXT_pipeline_creation_cache_control v3
VK_EXT_pipeline_creation_feedback v1
VK_EXT_post_depth_coverage v1
VK_EXT_private_data v1
VK_EXT_robustness2 v1
VK_EXT_sample_locations v1
VK_EXT_scalar_block_layout v1
VK_EXT_separate_stencil_usage v1
VK_EXT_shader_atomic_float v1
VK_EXT_shader_demote_to_helper_invocation v1
VK_EXT_shader_stencil_export v1
VK_EXT_shader_subgroup_ballot v1
VK_EXT_shader_subgroup_vote v1
VK_EXT_shader_viewport_index_layer v1
VK_EXT_subgroup_size_control v2
VK_EXT_surface_maintenance1 v1
VK_EXT_swapchain_colorspace v5
VK_EXT_swapchain_maintenance1 v1
VK_EXT_texel_buffer_alignment v1
VK_EXT_texture_compression_astc_hdr v1
VK_EXT_tooling_info v1
VK_EXT_vertex_attribute_divisor v3
VK_AMD_gpu_shader_half_float v2
VK_AMD_negative_viewport_height v1
VK_AMD_shader_image_load_store_lod v1
VK_AMD_shader_trinary_minmax v1
VK_IMG_format_pvrtc v1
VK_INTEL_shader_integer_functions2 v1
VK_GOOGLE_display_timing v1
VK_MVK_macos_surface v3
VK_MVK_moltenvk v37
VK_NV_fragment_shader_barycentric v1
[mvk-info] GPU device:
model: AMD Radeon RX 6800 XT
type: Discrete
vendorID: 0x1002
deviceID: 0x73bf
pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
GPU memory available: 16368 MB
GPU memory used: 0 MB
Metal Shading Language 3.2
supports the following GPU Features:
GPU Family Metal 3
GPU Family Mac 2
Read-Write Texture Tier 2
[mvk-info] GPU device:
model: AMD Radeon Pro 5500M
type: Discrete
vendorID: 0x1002
deviceID: 0x7340
pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
GPU memory available: 8176 MB
GPU memory used: 0 MB
Metal Shading Language 3.2
supports the following GPU Features:
GPU Family Metal 3
GPU Family Mac 2
Read-Write Texture Tier 2
[mvk-info] GPU device:
model: Intel(R) UHD Graphics 630
type: Integrated
vendorID: 0x8086
deviceID: 0x3e9b
pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
GPU memory available: 1536 MB
GPU memory used: 8 MB
Metal Shading Language 3.2
supports the following GPU Features:
GPU Family Metal 3
GPU Family Mac 2
Read-Write Texture Tier 1
[mvk-info] Created VkInstance for Vulkan version 1.2.309, as requested by app, with the following 0 Vulkan extensions enabled:
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon RX 6800 XT with the following 3 Vulkan extensions enabled:
VK_KHR_16bit_storage v1
VK_KHR_shader_float16_int8 v1
VK_EXT_subgroup_size_control v2
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon Pro 5500M with the following 3 Vulkan extensions enabled:
VK_KHR_16bit_storage v1
VK_KHR_shader_float16_int8 v1
VK_EXT_subgroup_size_control v2
[INFO ] stable-diffusion.cpp:197 - loading model from '../sd-models/sd-v1-4.ckpt'
[INFO ] model.cpp:911 - load ../sd-models/sd-v1-4.ckpt using checkpoint format
[DEBUG] model.cpp:1445 - init from '../sd-models/sd-v1-4.ckpt'
ZIP 0, name = archive/data.pkl, dir = archive/
[INFO ] stable-diffusion.cpp:244 - Version: SD 1.x
[INFO ] stable-diffusion.cpp:277 - Weight type: f32
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1178 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1178 - unet params backend buffer size = 2155.33 MB(VRAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1178 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:419 - loading weights
[DEBUG] model.cpp:1727 - loading tensors from ../sd-models/sd-v1-4.ckpt
|==================================================| 1131/1131 - 1000.00it/s
[INFO ] stable-diffusion.cpp:518 - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522 - loading model from '../sd-models/sd-v1-4.ckpt' completed, taking 11.57s
[INFO ] stable-diffusion.cpp:556 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:600 - finished loaded file
[DEBUG] stable-diffusion.cpp:1548 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a cat"
[INFO ] stable-diffusion.cpp:690 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:357 - parse 'a cat' to [['a cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485 - computing condition graph completed, taking 639 ms
[DEBUG] conditioner.hpp:357 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485 - computing condition graph completed, taking 41 ms
[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 682 ms
[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:808 - Sample
[DEBUG] ggml_extend.hpp:1129 - unet compute buffer size: 559.90 MB(VRAM)
|==================================================| 20/20 - 2.04s/it
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1486 - generating 1 latent images completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1489 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1129 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1090 - computing vae [mode: DECODE] graph completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1499 - latent 1 decoded, taking 5.21s
[INFO ] stable-diffusion.cpp:1503 - decode_first_stage completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1628 - txt2img completed in 47.92s
save result PNG image to 'output.png'
Metadata
Metadata
Assignees
Labels
No labels