[Bug]: SequentialPipeline fails on ERNIE-4.5-VL (remote code) with FX trace TypeError: to(device=MetaDeviceAttribute)

### ⚙️ Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
### Environment Information ###
Operating System: `Linux-6.8.0-64-generic-x86_64-with-glibc2.35`
Python Version: `3.11.14 (main, Oct 31 2025, 23:04:14) [Clang 21.1.4 ]`
llm-compressor Version: `0.8.1`
compressed-tensors Version: `0.12.2`
transformers Version: `4.56.2`
torch Version: `2.8.0`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Server Edition', 'NVIDIA RTX PRO 6000 Blackwell Server Edition']`
AMD Devices: `None`
```

</details>


### 🐛 Describe the bug

When quantizing ERNIE-4.5-VL-28B-A3B-Thinking (baidu/ERNIE-4.5-VL-28B-A3B-Thinking) with LLM-Compressor, the SequentialPipeline inferred by default crashes during FX tracing with:
`TypeError: to() received an invalid combination of arguments - got (device=MetaDeviceAttribute, )`

This happens before calibration even starts, on a very small dataset slice.

If I instead force pipeline="basic", oneshot runs (until it eventually hits CUDA OOM for larger calibration settings). So the model itself can be quantized, but the sequential/onloading pipeline is currently incompatible.

Maybe I'm just doing something stupid here but this is generally how I've been quantizing other models to NVFP4.

### 🛠️ Steps to reproduce

```py
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

MODEL_ID = "baidu/ERNIE-4.5-VL-28B-A3B-Thinking"

# Load tokenizer
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)

# Load model with bf16 dtype to avoid unnecessary FP32 tensors
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype=torch.bfloat16,
    trust_remote_code=True,
)

# Tiny dataset slice to make repro cheap
raw = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft[:2]")

def to_text(ex):
    return {"text": tok.apply_chat_template(ex["messages"], tokenize=False)}

raw = raw.map(to_text)

ds = raw.map(
    lambda s: tok(s["text"], truncation=True, max_length=256),
    remove_columns=raw.column_names,
)

# Minimal NVFP4 recipe
recipe = QuantizationModifier(
    targets="Linear",
    scheme="NVFP4",
    ignore=["lm_head"],
)

# Run oneshot quantization with default (inferred) pipeline
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    num_calibration_samples=2,
    max_seq_length=256,
    trust_remote_code_model=True,
)
```

Running this should produce the same error. The number of samples and seq length are extremely low just for troubleshooting speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: SequentialPipeline fails on ERNIE-4.5-VL (remote code) with FX trace TypeError: to(device=MetaDeviceAttribute) #2033

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: SequentialPipeline fails on ERNIE-4.5-VL (remote code) with FX trace TypeError: to(device=MetaDeviceAttribute) #2033

Description

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions