-
Notifications
You must be signed in to change notification settings - Fork 197
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
- the scripts from example/vlm_ptq/scripts/huggging_example.sh runs the sucessfully, but the saved models is not fp4 format ,still takes about 60G disk space.
Steps/Code to reproduce bug
scripts/huggingface_example.sh --model /mnt/raid/models/Qwen3-VL-30B-A3B-Instruct --quant nvfp4
Expected behavior
Who can help?
logs like following.
GPU 0: Peak memory usage = 45.48 GB for all processes on the GPU
GPU 1: Peak memory usage = 46.60 GB for all processes on the GPU
########
- [[ nvfp4 != \f\p\8 ]]
- echo 'For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4.'
For quant format nvfp4, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4. - exit 0
(vllm_cu128) terry:vlm_ptq$
(vllm_cu128) terry:vlm_ptq$ ls -rtl /mnt/raid/models/saved_models_Qwen3-VL-30B-A3B-Instruct_nvfp4
total 59429224
-rw-rw-r-- 1 terry terry 782 Nov 17 16:01 preprocessor_config.json
-rw-rw-r-- 1 terry terry 817 Nov 17 16:01 video_preprocessor_config.json
-rw-rw-r-- 1 terry terry 2735 Nov 17 16:01 hf_quant_config.json
-rw-rw-r-- 1 terry terry 213 Nov 17 16:01 generation_config.json
-rw-rw-r-- 1 terry terry 4954308632 Nov 17 16:01 model-00001-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450088 Nov 17 16:01 model-00002-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450120 Nov 17 16:01 model-00003-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00004-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00005-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00006-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00007-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:01 model-00008-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00009-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00010-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00011-of-00013.safetensors
-rw-rw-r-- 1 terry terry 4876450192 Nov 17 16:02 model-00012-of-00013.safetensors
-rw-rw-r-- 1 terry terry 2244108488 Nov 17 16:02 model-00013-of-00013.safetensors
-rw-rw-r-- 1 terry terry 150240 Nov 17 16:02 model.safetensors.index.json
-rw-rw-r-- 1 terry terry 5380 Nov 17 16:02 config.json
-rw-rw-r-- 1 terry terry 5401 Nov 17 16:02 tokenizer_config.json
-rw-rw-r-- 1 terry terry 5292 Nov 17 16:02 chat_template.jinja
-rw-rw-r-- 1 terry terry 496 Nov 17 16:02 special_tokens_map.json
-rw-rw-r-- 1 terry terry 707 Nov 17 16:02 added_tokens.json
-rw-rw-r-- 1 terry terry 2776833 Nov 17 16:02 vocab.json
-rw-rw-r-- 1 terry terry 1671853 Nov 17 16:02 merges.txt
-rw-rw-r-- 1 terry terry 11422654 Nov 17 16:02 tokenizer.json
System information
- Container used (if applicable): no
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): RTX 5880
- GPU memory size: 48G
- Number of GPUs: 2
- Library versions (if applicable):
- Python: 3.12
- ModelOpt version or commit hash: 0.39.0
- CUDA:12.8
- PyTorch: 2.8.0
- Transformers: 4.57.1
- TensorRT-LLM: ?
- ONNXRuntime: ?
- TensorRT: ?
- Any other details that may help: ?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working