Skip to content

3090 OutOfMemoryError #28

@FearL0rd

Description

@FearL0rd

Is it possible to convert using a 3090? I'm getting an OOM error.

Command line: convert_to_quant -i gemma_3_12B_it_heretic_fp8_e4m3fn.safetensors --int8 --scaling_mode tensor --block_size 128 --comfy_quant


Target format: INT8 (block-wise quantization)
INT8 Range: [-127, 127]

Comfy quantization mode enabled: Using comfy_quant layer names and settings.
Loading 629 tensors from source file...
Loading tensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 629/629 [00:00<00:00, 54414.18it/s]
[MINIMAL] Scanning model and generating simulated calibration data...
Simulated calibration data generated.

Found 337 weight tensors to potentially process.

(1/337) Processing (INT8): model.embed_tokens.weight
[pinned_transfer] Pinned memory active - faster GPU transfers enabled
- Tensor shape: [262208, 3840], Max rank: 3840. Using k=768 components.
- Trying svd_lowrank
Traceback (most recent call last):
File "/home/comfy-env/bin/convert_to_quant", line 6, in
sys.exit(main())
^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/cli/main.py", line 1204, in main
convert_to_fp8_scaled(
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/utils/logging.py", line 82, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/formats/fp8_conversion.py", line 490, in convert_to_fp8_scaled
q_tensor, dequant_s, dequant_w, extra_tensors = converter.convert(original_tensor, key=key, depth=depth)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 759, in convert
qdata, scale, dequantized = self._convert_int8_tensorwise(W_float32)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 854, in _convert_int8_tensorwise
qdata, scale = self._optimize_int8_tensorwise_learned_rounding(W_float32, qdata, scale)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 898, in _optimize_int8_tensorwise_learned_rounding
final_tensor_scaled = self._optimize_original(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 570, in _optimize_original
error = current_dq - W_float32
~~~~~~~~~~~^~~~~~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.75 GiB. GPU 0 has a total capacity of 23.55 GiB of which 1.83 GiB is free. Including non-PyTorch memory, this process has 21.51 GiB memory in use. Of the allocated memory 21.00 GiB is allocated by PyTorch, and 171.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions