-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Open
Labels
Description
System Info
...
Who can help?
import torch
from transformers import Mistral3ForConditionalGeneration, AutoTokenizer
model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
).eval()
# Chat template input
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Hello",
}
],
}
]
# Tokenize
tokenized = tokenizer.apply_chat_template(messages, return_dict=True)
input_ids = torch.tensor(tokenized["input_ids"], device="cuda").unsqueeze(0)
attention_mask = torch.tensor(tokenized["attention_mask"], device="cuda").unsqueeze(0)
# Generate
with torch.inference_mode():
output = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_new_tokens=128,
)[0]
# Decode
decoded_output = tokenizer.decode(output, skip_special_tokens=True)
print(decoded_output)is broken an current main
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
execute the code
Expected behavior
To give non gibberish
Note: Only happens on v5