3090 OutOfMemoryError

Is it possible to convert using a 3090? I'm getting an OOM error.

Command line: convert_to_quant -i gemma_3_12B_it_heretic_fp8_e4m3fn.safetensors --int8 --scaling_mode tensor --block_size 128 --comfy_quant



------------------------------------------------------------
Target format: INT8 (block-wise quantization)
INT8 Range: [-127, 127]
------------------------------------------------------------
Comfy quantization mode enabled: Using comfy_quant layer names and settings.
Loading 629 tensors from source file...
Loading tensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 629/629 [00:00<00:00, 54414.18it/s]
[MINIMAL] Scanning model and generating simulated calibration data...
Simulated calibration data generated.

Found 337 weight tensors to potentially process.
------------------------------------------------------------
(1/337) Processing (INT8): model.embed_tokens.weight
  [pinned_transfer] Pinned memory active - faster GPU transfers enabled
    - Tensor shape: [262208, 3840], Max rank: 3840. Using k=768 components.
    - Trying svd_lowrank
Traceback (most recent call last):                                                                                                                      
  File "/home/comfy-env/bin/convert_to_quant", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/cli/main.py", line 1204, in main
    convert_to_fp8_scaled(
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/utils/logging.py", line 82, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/formats/fp8_conversion.py", line 490, in convert_to_fp8_scaled
    q_tensor, dequant_s, dequant_w, extra_tensors = converter.convert(original_tensor, key=key, depth=depth)
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 759, in convert
    qdata, scale, dequantized = self._convert_int8_tensorwise(W_float32)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 854, in _convert_int8_tensorwise
    qdata, scale = self._optimize_int8_tensorwise_learned_rounding(W_float32, qdata, scale)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 898, in _optimize_int8_tensorwise_learned_rounding
    final_tensor_scaled = self._optimize_original(
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/comfy-env/lib/python3.12/site-packages/convert_to_quant/converters/learned_rounding.py", line 570, in _optimize_original
    error = current_dq - W_float32
            ~~~~~~~~~~~^~~~~~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.75 GiB. GPU 0 has a total capacity of 23.55 GiB of which 1.83 GiB is free. Including non-PyTorch memory, this process has 21.51 GiB memory in use. Of the allocated memory 21.00 GiB is allocated by PyTorch, and 171.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3090 OutOfMemoryError #28

Target format: INT8 (block-wise quantization)
INT8 Range: [-127, 127]

Found 337 weight tensors to potentially process.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

3090 OutOfMemoryError #28

Description

Target format: INT8 (block-wise quantization) INT8 Range: [-127, 127]

Found 337 weight tensors to potentially process.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Target format: INT8 (block-wise quantization)
INT8 Range: [-127, 127]