Replies: 22 comments
-
|
Oh yep great idea! chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.
### Instruction:
{INPUT}
### Response:
{OUTPUT}"""
from unsloth import apply_chat_template
dataset = apply_chat_template(
dataset,
tokenizer = tokenizer,
chat_template = chat_template,
# default_system_message = "You are a helpful assistant", << [OPTIONAL]
)Then use from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
...
args = TrainingArguments(
...
),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(trainer)But in general, the function accepts an instruction and a response text field: def train_on_responses_only(
trainer,
instruction_part = None, <<< eg "Instruction:\n"
response_part = None, <<< eg "Response:\n"
): |
Beta Was this translation helpful? Give feedback.
-
|
How does work when potentially multiple instruction headers could be presented, eg a response to a function call "from: ipython"? Would something like this be necessary? |
Beta Was this translation helpful? Give feedback.
-
|
what does |
Beta Was this translation helpful? Give feedback.
-
[High-level idea] By setting the trainer with the [Code details] |
Beta Was this translation helpful? Give feedback.
-
|
A quick question out of curiosity: For all instruct models, |
Beta Was this translation helpful? Give feedback.
-
Sorry, I'm a beginner. Why doesn't 'train_on_responses_only' have a 'system_part' option? |
Beta Was this translation helpful? Give feedback.
-
Hi, have you gotten some answers on these questions? I'm also very curious about it. Could you share some of your findings? |
Beta Was this translation helpful? Give feedback.
-
|
@xywen97 Here is the research paper : Instruction Tuning With Loss Over Instructions |
Beta Was this translation helpful? Give feedback.
-
In the abstract of the paper they say:
Isn't that the exact opposite of what |
Beta Was this translation helpful? Give feedback.
-
Hi, did you get an answer on this? I am also wondering why no system_prompt is passed to the function. |
Beta Was this translation helpful? Give feedback.
-
|
@DavyThan I have been helping with a PR related to this and played a bit around this. Here is what I think: When you apply
|
Beta Was this translation helpful? Give feedback.
-
|
I have created a tiny HF space for everyone to see how Examples: |
Beta Was this translation helpful? Give feedback.
-
@patel-zeel Thank you for sharing!I got the idea that it masks the inputs in chat format. What I am confused about is when I am finetuning a Phi4 https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb model, does it mask the system promptly at the very beginning of every batch by default? I have been testing it for a couple of days, and my loss goes to almost 0 in a few steps<100, given I have 6000 steps to go. What I am concerned about is that some of the models can access the answer, leading to abrupt loss decay. |
Beta Was this translation helpful? Give feedback.
-
Hi @machlovi! I am glad you found it helpful. Yes, it masks the system prompt by default due to this rule: Mask everything before the first occurrence of the
I have never fine-tuned an LLM, but I have learned that not using a triangular (causal) mask in attention can cause such issues. |
Beta Was this translation helpful? Give feedback.
-
I've recently opened a PR (unslothai/unsloth-zoo#75) that addresses exactly this issue. The updated So, if this PR is merged, you will be able to provide an instruction_part = [
"<|start_header_id|>user<|end_header_id|>\n\n",
"<|start_header_id|>system<|end_header_id|>\n\n",
"<|start_header_id|>ipython<|end_header_id|>\n\n"
]This ensures each instruction pattern is matched and handled correctly during training. |
Beta Was this translation helpful? Give feedback.
-
When I select the different versions/chat formats, it does not change the train_on_response_part. or ,may be I am missing something here |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone, Is it possible to finetune a model on train_on_responses_only using the pre-tokenized training dataset? I am asking because I am finetuning a mistral model on CompletionOnly, but in my case I am using the training dataset already tokenized (with the corresponding 'labels': [-100, -100, ...] and so on). The point is that when using unsloth, it ignores my labels and generates its own ('labels': [1, 3, 2569, 72136, 9371, ...]) skipping the masking to proceed as on_responses_only.
So, is it possible to finetune the model on CompletionOnly providing the pre-tokenized training dataset? |
Beta Was this translation helpful? Give feedback.
-
|
Is it good practice to use train_on_responses_only when calculating validation loss in order to obtain a loss measurement more comparable to the training loss? |
Beta Was this translation helpful? Give feedback.
-
|
@yorozcogonzalez I saw that recent changes in the @AliBakly Following the general rules of ML, people generally use a single setting over the train, validation, and test datasets, so what you suggest makes sense. Someone can correct me if there is a different practice in the LLM world. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @patel-zeel! I’ll give it a try, though I’ve already updated my script to avoid using pre-tokenized data. Regarding your question: “For all instruct models, should train_on_responses_only lead to better performance? How much better? Any relevant papers, blogs, or studies on this?” |
Beta Was this translation helpful? Give feedback.
-
|
Hi @patel-zeel, how is train_on_response_only different from DataCollatorForCompletionOnlyLM? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @imomayiz, thanks for this question. I was also wondering about it but never got a chance to look into it. Now that I look into it via this colab, it seems to do the job even in the conversation mode where user and assistant interact with each other more than once. The colab example has the same behavior with Unsloth (see the same example in my hf space). So, my best guess is that Unsloth's |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Can you write up some documentation how properly to use the new train_on_responses_only functionality? It doesn't seem to work out of the box with either chat templates or any of the manual formatting (e.g. Alpaca) examples.
Beta Was this translation helpful? Give feedback.
All reactions