-
Notifications
You must be signed in to change notification settings - Fork 237
Open
Labels
bugSomething isn't workingSomething isn't working
Description
⚙️ Your current environment
The output of python collect_env.py
python3.10、Ubuntu 22.04.5 LTS
llmcompressor==0.6.0.1
transformers==4.51.1
torch == 2.4.1
torchaudio==2.4.1
torchdata==0.8.0
torchvision==0.19.1
```text
Your output of `python collect_env.py` here
```
🐛 Describe the bug
thanks for llm-compressor, it is a good library, but I met some bug need be solve !
generated_ids: tensor([[151644, 872, 198, 3838, 374, 279, 1379, 315, 279,
3639, 4180, 30, 151645, 198, 151644, 77091, 198, 17543,
2199, 26030, 35276, 38325, 73368, 73695, 93372, 126663, 93672,
51575, 107007, 62127, 86324, 126663, 67678, 13620, 17832, 127775,
136197, 69965, 93372, 107007, 82904, 128852, 102935, 136197, 148501,
114267, 136197, 114, 127694, 32940, 97218, 145265, 623, 132312,
27899, 79164, 73695, 58122, 145427, 82059, 20378, 25685, 93372,
145265, 56428, 127891, 114370, 58202, 127891, 27899, 41513, 58122,
114370, 47402, 35409, 145427, 82059, 83626, 145427, 67045, 30369,
76995, 122199, 14183, 142588, 78361, 2711, 41267, 128852, 63364,
3062, 89959, 149653, 122343, 32783, 95986, 90868, 34019, 115762,
35409, 23211, 34019, 132307, 149653, 116487, 35409, 151784, 67064,
145427, 44366, 145265, 131132, 93396, 145265, 52832, 49560, 134871]],
device='cuda:0')
the result :
Cittributeacies erg………… sperma Owl UserTypeシャ운串眼里 clr Confidenceシャ{/ Kn teethランドセル Printed UserType眼里ICIAL דבר巢セルȂ灰尘セル�وضح$rescreateUrlեStткиumen kond Owl mpi왓 Boyle@RequestMapping Grey UserTypeեyardsожно規劃currentlyожноumen_stride mpi規劃ipsis navigator왓 Boyle(Transform왓 errmsg efalternative芘Formatter칠yte search.SET דברJustin_get_RESOURCES⍵肷Loremitorio<mainbounds他是一个 navigator_:boundsすぐに⍵五四 navigatorportunity왓ADVERTISEMENTե مجال carnivalե hwnd-fit المشار
just the qwen3-4B int8 GPTQ of transformers`s result is garbled, while 8B, 14B, and 32B are all normal. Why is there this anomaly? How to solve it?
🛠️ Steps to reproduce
infer code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen3-4B-INT8"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="float16",
device_map="auto"
)
# prepare the model input
prompt = "What is the size of the United States?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=100, # 32768
)
print("generated_ids:", generated_ids)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# the result will begin with thinking content in <think></think> tags, followed by the actual response
print(tokenizer.decode(output_ids, skip_special_tokens=True))
quant info( form demo of llm-compressor):
recipe = [
SmoothQuantModifier(smoothing_strength=0.8),
GPTQModifier(scheme="W8A8", targets="Linear", ignore=["lm_head"]),
]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working