Skip to content

failed to quant the Qwen3-VL-30B-A3B #568

@terryaic

Description

@terryaic

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

  • the scripts from example/vlm_ptq/scripts/huggging_example.sh runs the sucessfully, but the saved models is not fp4 format ,still takes about 60G disk space.

Steps/Code to reproduce bug

scripts/huggingface_example.sh --model /mnt/raid/models/Qwen3-VL-30B-A3B-Instruct --quant nvfp4

Expected behavior

Who can help?

logs like following.

GPU 0: Peak memory usage = 45.48 GB for all processes on the GPU
GPU 1: Peak memory usage = 46.60 GB for all processes on the GPU
########

  • [[ nvfp4 != \f\p\8 ]]
  • echo 'For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4.'
    For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4.
  • exit 0
    (vllm_cu128) terry:vlm_ptq$
    (vllm_cu128) terry:vlm_ptq$ ls -rtl /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4
    total 59429224
    -rw-rw-r-- 1 terry terry 782 Nov 17 16:01 preprocessor_config.json
    -rw-rw-r-- 1 terry terry 817 Nov 17 16:01 video_preprocessor_config.json
    -rw-rw-r-- 1 terry terry 2735 Nov 17 16:01 hf_quant_config.json
    -rw-rw-r-- 1 terry terry 213 Nov 17 16:01 generation_config.json
    -rw-rw-r-- 1 terry terry 4954308632 Nov 17 16:01 model-00001-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450088 Nov 17 16:01 model-00002-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450120 Nov 17 16:01 model-00003-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00004-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00005-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00006-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00007-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00008-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00009-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00010-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00011-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00012-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 2244108488 Nov 17 16:02 model-00013-of-00013.safetensors
    -rw-rw-r-- 1 terry terry 150240 Nov 17 16:02 model.safetensors.index.json
    -rw-rw-r-- 1 terry terry 5380 Nov 17 16:02 config.json
    -rw-rw-r-- 1 terry terry 5401 Nov 17 16:02 tokenizer_config.json
    -rw-rw-r-- 1 terry terry 5292 Nov 17 16:02 chat_template.jinja
    -rw-rw-r-- 1 terry terry 496 Nov 17 16:02 special_tokens_map.json
    -rw-rw-r-- 1 terry terry 707 Nov 17 16:02 added_tokens.json
    -rw-rw-r-- 1 terry terry 2776833 Nov 17 16:02 vocab.json
    -rw-rw-r-- 1 terry terry 1671853 Nov 17 16:02 merges.txt
    -rw-rw-r-- 1 terry terry 11422654 Nov 17 16:02 tokenizer.json

System information

  • Container used (if applicable): no
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04
  • CPU architecture (x86_64, aarch64): x86_64
  • GPU name (e.g. H100, A100, L40S): RTX 5880
  • GPU memory size: 48G
  • Number of GPUs: 2
  • Library versions (if applicable):
    • Python: 3.12
    • ModelOpt version or commit hash: 0.39.0
    • CUDA:12.8
    • PyTorch: 2.8.0
    • Transformers: 4.57.1
    • TensorRT-LLM: ?
    • ONNXRuntime: ?
    • TensorRT: ?
  • Any other details that may help: ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions