Skip to content

Commit 560bb9c

Browse files
authored
Adding new MoE e2e tests (#1960)
SUMMARY: adding e2e tests for MoE also small fix so scheme: None doesn't error and by default ignore expert gate layers (model dependent on whether its supported) TEST PLAN: in progress: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/19368818055 local (after disabling the [cadence skip](https://github.com/vllm-project/llm-compressor/blob/main/tests/e2e/vLLM/test_vllm.py)) export TEST_DATA_FILE="${REPOS}/llm-compressor/tests/e2e/vLLM/configs/qwen3_fp8_dynamic_per_tensor.yaml" pytest tests/e2e/vLLM/test_vllm.py -vs 2>&1 | tee log-fp8.log export TEST_DATA_FILE="${REPOS}/llm-compressor/tests/e2e/vLLM/configs/qwen3_fp4_nvfp4.yaml" pytest tests/e2e/vLLM/test_vllm.py -vs 2>&1 | tee log-fp4.log --------- Signed-off-by: HDCharles <[email protected]>
1 parent 6fea888 commit 560bb9c

File tree

4 files changed

+16
-3
lines changed

4 files changed

+16
-3
lines changed

tests/e2e/e2e_utils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,13 @@ def data_collator(batch):
8484
targets="Linear",
8585
scheme=scheme,
8686
actorder=None, # added for consistency with past testing configs
87-
ignore=["lm_head"],
87+
ignore=["lm_head", "re:.*mlp.gate[.].*"],
8888
)
8989
else:
9090
oneshot_kwargs["recipe"] = QuantizationModifier(
91-
targets="Linear", scheme=scheme, ignore=["lm_head"]
91+
targets="Linear",
92+
scheme=scheme,
93+
ignore=["lm_head", "re:.*mlp.gate[.].*"],
9294
)
9395

9496
# Apply quantization.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
cadence: "nightly"
2+
test_type: "regression"
3+
model: Qwen/Qwen3-30B-A3B
4+
scheme: NVFP4
5+
dataset_id: HuggingFaceH4/ultrachat_200k
6+
dataset_split: train_sft
7+
num_calibration_samples: 20
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
cadence: "nightly"
2+
test_type: "regression"
3+
model: Qwen/Qwen3-30B-A3B
4+
scheme: FP8_DYNAMIC

tests/e2e/vLLM/run_vllm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ def parse_args():
1818
except json.JSONDecodeError as e:
1919
raise ValueError(f"Invalid JSON input: {e}")
2020

21-
if "W4A16_2of4" in scheme:
21+
if scheme is not None and "W4A16_2of4" in scheme:
2222
# required by the kernel
2323
llm_kwargs["dtype"] = torch.float16
2424

0 commit comments

Comments
 (0)