Skip to content

[Bug]: The result of qwen3-4b which quant by GPTQ int8 infer on the transformers is garbled #1784

@sankexin

Description

@sankexin

⚙️ Your current environment

The output of python collect_env.py python3.10、Ubuntu 22.04.5 LTS llmcompressor==0.6.0.1 transformers==4.51.1 torch == 2.4.1 torchaudio==2.4.1 torchdata==0.8.0 torchvision==0.19.1 ```text Your output of `python collect_env.py` here ```

🐛 Describe the bug

thanks for llm-compressor, it is a good library, but I met some bug need be solve !

generated_ids: tensor([[151644,    872,    198,   3838,    374,    279,   1379,    315,    279,
           3639,   4180,     30, 151645,    198, 151644,  77091,    198,  17543,
           2199,  26030,  35276,  38325,  73368,  73695,  93372, 126663,  93672,
          51575, 107007,  62127,  86324, 126663,  67678,  13620,  17832, 127775,
         136197,  69965,  93372, 107007,  82904, 128852, 102935, 136197, 148501,
         114267, 136197,    114, 127694,  32940,  97218, 145265,    623, 132312,
          27899,  79164,  73695,  58122, 145427,  82059,  20378,  25685,  93372,
         145265,  56428, 127891, 114370,  58202, 127891,  27899,  41513,  58122,
         114370,  47402,  35409, 145427,  82059,  83626, 145427,  67045,  30369,
          76995, 122199,  14183, 142588,  78361,   2711,  41267, 128852,  63364,
           3062,  89959, 149653, 122343,  32783,  95986,  90868,  34019, 115762,
          35409,  23211,  34019, 132307, 149653, 116487,  35409, 151784,  67064,
         145427,  44366, 145265, 131132,  93396, 145265,  52832,  49560, 134871]],
       device='cuda:0')

the result :

Cittributeacies erg………… sperma Owl UserTypeシャ운串眼里 clr Confidenceシャ{/ Kn teethランドセル Printed UserType眼里ICIAL דבר巢セルȂ灰尘セル�وضح$rescreateUrlեStткиumen kond Owl mpi왓 Boyle@RequestMapping Grey UserTypeեyardsожно規劃currentlyожноumen_stride mpi規劃ipsis navigator왓 Boyle(Transform왓 errmsg efalternative芘Formatter칠yte search.SET דברJustin_get_RESOURCES⍵肷Loremitorio<mainbounds他是一个 navigator_:boundsすぐに⍵五四 navigatorportunity왓ADVERTISEMENTե مجال carnivalե hwnd-fit المشار

just the qwen3-4B int8 GPTQ of transformers`s result is garbled, while 8B, 14B, and 32B are all normal. Why is there this anomaly? How to solve it?

🛠️ Steps to reproduce

infer code:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen3-4B-INT8"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="float16",
    device_map="auto"
)

# prepare the model input
prompt = "What is the size of the United States?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=100, # 32768
)
print("generated_ids:", generated_ids)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# the result will begin with thinking content in <think></think> tags, followed by the actual response
print(tokenizer.decode(output_ids, skip_special_tokens=True))

quant info( form demo of llm-compressor):

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(scheme="W8A8", targets="Linear", ignore=["lm_head"]),
    ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions