Skip to content

[Bug]: When quantifying Qwe3-Next-80B-A3B using w8a8-int8, some parameters become random parameters #2059

@dingjingzhen-bot

Description

@dingjingzhen-bot

⚙️ Your current environment

The output of python collect_env.py
### Environment Information ###
Python Version: `3.10`
llm-compressor Version: `0.8.1`
compressed-tensors Version: `0.12.2`
transformers Version: `4.57.1`
torch Version: `2.8.0`
CUDA Devices: `A100 GPU`
AMD Devices: `None`

🐛 Describe the bug

from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "./Qwen3-Next-80B-A3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID,trust_remote_code=True)
from datasets import load_dataset

NUM_CALIBRATION_SAMPLES=256
MAX_SEQUENCE_LENGTH=2048

# Load dataset.
ds = load_dataset("HuggingFaceH4/ultrachat_200k", split=f"train_sft[:{NUM_CALIBRATION_SAMPLES}]")
ds = ds.shuffle(seed=42)

# Preprocess the data into the format the model is trained with.
def preprocess(example):
    return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False,)}
ds = ds.map(preprocess)

# Tokenize the data (be careful with bos tokens - we need add_special_tokens=False since the chat_template already added it).
def tokenize(sample):
    return tokenizer(sample["text"], padding=False, max_length=MAX_SEQUENCE_LENGTH, truncation=True, add_special_tokens=False)
ds = ds.map(tokenize, remove_columns=ds.column_names)
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier

# Configure the quantization algorithms to run.
recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$","re:.*linear_attn.*"]),
]

# Apply quantization.
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

# Save to disk compressed.
# SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-W8A8-Dynamic-Per-Token"
SAVE_DIR = "./Qwen3-Next-80B-A3B-Instruct-w8a8"

model.save_pretrained(SAVE_DIR, save_compressed=True)
tokenizer.save_pretrained(SAVE_DIR)

Some parameters have become random parameters, such as input_layernorm, etc.
If I set the SmoothQuantModifier(smoothing_strength=0.8),
It seems that deleting it won't print the parameters of the quantized norm, but it's extremely slow

🛠️ Steps to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions