failed to quant the Qwen3-VL-30B-A3B

**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues?q=is%3Aissue).**

## Describe the bug


- the scripts from example/vlm_ptq/scripts/huggging_example.sh runs the sucessfully, but the saved models is not fp4 format ,still takes about 60G disk space.

### Steps/Code to reproduce bug



scripts/huggingface_example.sh --model /mnt/raid/models/Qwen3-VL-30B-A3B-Instruct --quant nvfp4


### Expected behavior

### Who can help?



logs like following.

GPU 0: Peak memory usage = 45.48 GB for all processes on the GPU
GPU 1: Peak memory usage = 46.60 GB for all processes on the GPU
########
+ [[ nvfp4 != \f\p\8 ]]
+ echo 'For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4.'
For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4.
+ exit 0
(vllm_cu128) terry:vlm_ptq$ 
(vllm_cu128) terry:vlm_ptq$ ls -rtl /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4
total 59429224
-rw-rw-r-- 1 terry terry        782 Nov 17 16:01 preprocessor_config.json
-rw-rw-r-- 1 terry terry        817 Nov 17 16:01 video_preprocessor_config.json
-rw-rw-r-- 1 terry terry       2735 Nov 17 16:01 hf_quant_config.json
-rw-rw-r-- 1 terry terry        213 Nov 17 16:01 generation_config.json
-rw-rw-r-- 1 terry terry 4954308632 Nov 17 16:01 model-00001-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450088 Nov 17 16:01 model-00002-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450120 Nov 17 16:01 model-00003-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00004-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00005-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00006-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00007-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00008-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00009-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00010-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00011-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00012-of-00013.safetensors
-rw-rw-r-- 1 terry terry 2244108488 Nov 17 16:02 model-00013-of-00013.safetensors
-rw-rw-r-- 1 terry terry     150240 Nov 17 16:02 model.safetensors.index.json
-rw-rw-r-- 1 terry terry       5380 Nov 17 16:02 config.json
-rw-rw-r-- 1 terry terry       5401 Nov 17 16:02 tokenizer_config.json
-rw-rw-r-- 1 terry terry       5292 Nov 17 16:02 chat_template.jinja
-rw-rw-r-- 1 terry terry        496 Nov 17 16:02 special_tokens_map.json
-rw-rw-r-- 1 terry terry        707 Nov 17 16:02 added_tokens.json
-rw-rw-r-- 1 terry terry    2776833 Nov 17 16:02 vocab.json
-rw-rw-r-- 1 terry terry    1671853 Nov 17 16:02 merges.txt
-rw-rw-r-- 1 terry terry   11422654 Nov 17 16:02 tokenizer.json


## System information



- Container used (if applicable): no
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04 
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): RTX 5880
- GPU memory size: 48G
- Number of GPUs: 2
- Library versions (if applicable):
  - Python: 3.12
  - ModelOpt version or commit hash: 0.39.0
  - CUDA:12.8
  - PyTorch: 2.8.0
  - Transformers: 4.57.1
  - TensorRT-LLM: ?
  - ONNXRuntime: ?
  - TensorRT: ?
- Any other details that may help: ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

failed to quant the Qwen3-VL-30B-A3B #568

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

failed to quant the Qwen3-VL-30B-A3B #568

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions