Documentation for train_on_responses_only? #2828

rwl4 · 2024-07-27T18:56:39Z

rwl4
Jul 27, 2024

Can you write up some documentation how properly to use the new train_on_responses_only functionality? It doesn't seem to work out of the box with either chat templates or any of the manual formatting (e.g. Alpaca) examples.

danielhanchen · 2024-07-31T03:24:00Z

danielhanchen
Jul 31, 2024
Maintainer

Oh yep great idea!
https://github.com/unslothai/unsloth/wiki#train-on-completions--responses-only-do-not-train-on-inputs shows approx how to call it, but in the Ollama notebook https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing, you first need to use apply_chat_template which will make it work ie

chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.

### Instruction:
{INPUT}

### Response:
{OUTPUT}"""

from unsloth import apply_chat_template
dataset = apply_chat_template(
    dataset,
    tokenizer = tokenizer,
    chat_template = chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Then use

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    ...
    args = TrainingArguments(
        ...
    ),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(trainer)

But in general, the function accepts an instruction and a response text field:

def train_on_responses_only(
    trainer,
    instruction_part = None, <<< eg "Instruction:\n"
    response_part    = None, <<< eg "Response:\n"
):

0 replies

William-Wildridge · 2024-09-04T15:14:09Z

William-Wildridge
Sep 4, 2024

How does

instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n", 
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",

work when potentially multiple instruction headers could be presented, eg a response to a function call "from: ipython"?

Would something like this be necessary?

instruction_part = "<|start_header_id|>user|ipython<|end_header_id|>\n\n",

0 replies

irisma00 · 2025-02-03T18:37:39Z

irisma00
Feb 3, 2025

what does train_on_responses_only do exactly? Could you explain a bit more? Thanks! @danielhanchen

0 replies

CSJDeveloper · 2025-02-04T05:24:44Z

CSJDeveloper
Feb 4, 2025

what does train_on_responses_only do exactly? Could you explain a bit more? Thanks! @danielhanchen

[High-level idea]
For the decoder-only model, the loss is computed based on the next-token prediction. Therefore, all input tokens will be involved in this loss computation.

By setting the trainer with the train_on_responses_only, only the tokens in the assistant, i.e., target response, part of the input, will be involved in the loss computation.

[Code details]
You can check the source code for details, but perhaps you do not need to. https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_zoo/dataset_utils.py#L174

0 replies

patel-zeel · 2025-03-02T13:01:44Z

patel-zeel
Mar 2, 2025

A quick question out of curiosity: For all instruct models, train_on_responses_only should result in better performance? How much better? Can someone point to some papers/blogs/studies on this?

0 replies

e950280 · 2025-03-07T09:13:37Z

e950280
Mar 7, 2025

what does train_on_responses_only do exactly? Could you explain a bit more? Thanks! @danielhanchen

[High-level idea] For the decoder-only model, the loss is computed based on the next-token prediction. Therefore, all input tokens will be involved in this loss computation.

By setting the trainer with the train_on_responses_only, only the tokens in the assistant, i.e., target response, part of the input, will be involved in the loss computation.

[Code details] You can check the source code for details, but perhaps you do not need to. https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_zoo/dataset_utils.py#L174

Sorry, I'm a beginner.

Why doesn't 'train_on_responses_only' have a 'system_part' option?
Is it because the system_prompt doesn't contribute to the loss in the first place?

0 replies

xywen97 · 2025-03-10T08:19:14Z

xywen97
Mar 10, 2025

A quick question out of curiosity: For all instruct models, train_on_responses_only should result in better performance? How much better? Can someone point to some papers/blogs/studies on this?

Hi, have you gotten some answers on these questions? I'm also very curious about it. Could you share some of your findings?

0 replies

Serzhanov · 2025-03-10T08:58:56Z

Serzhanov
Mar 10, 2025

@xywen97 Here is the research paper : Instruction Tuning With Loss Over Instructions

0 replies

DavyThan · 2025-03-11T10:34:47Z

DavyThan
Mar 11, 2025

@xywen97 Here is the research paper : Instruction Tuning With Loss Over Instructions

In the abstract of the paper they say:

In this work, we propose a simple yet effective method, INSTRUCTION MODELLING (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part.

Isn't that the exact opposite of what train_on_responses_only tries to do?

0 replies

DavyThan · 2025-03-11T10:45:22Z

DavyThan
Mar 11, 2025

what does train_on_responses_only do exactly? Could you explain a bit more? Thanks! @danielhanchen

[High-level idea] For the decoder-only model, the loss is computed based on the next-token prediction. Therefore, all input tokens will be involved in this loss computation.
By setting the trainer with the train_on_responses_only, only the tokens in the assistant, i.e., target response, part of the input, will be involved in the loss computation.
[Code details] You can check the source code for details, but perhaps you do not need to. https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_zoo/dataset_utils.py#L174

Sorry, I'm a beginner.

Why doesn't 'train_on_responses_only' have a 'system_part' option? Is it because the system_prompt doesn't contribute to the loss in the first place?

Hi, did you get an answer on this? I am also wondering why no system_prompt is passed to the function.

0 replies

patel-zeel · 2025-03-11T13:12:53Z

patel-zeel
Mar 11, 2025

@DavyThan I have been helping with a PR related to this and played a bit around this. Here is what I think:

When you apply train_on_responses_only function, it will start searching the input_ids from left to right for the first occurrence of the response_part and mask everything before it. Then, it will look for the next occurrence of instruction_part and mask everything between the instruction_part and the subsequent response_part (both inclusive, i.e. the parts themselves will also get masked) or the end of the text.

Our goal is generally to mask everything except the response_part, and thus, if the prompt has a system part as well, we should provide that in the instruction_part. If we mistakenly pass the user part in those cases, system part labels will not be masked. Edit: But system part generally occurs only at the start of the prompt and is not repeated in a conversation, so in those cases we must pass user part in the instruction_part argument. The system part present at the start of the prompt gets masked anyway because we mask everything before the first occurrence of the response_part.

0 replies

patel-zeel · 2025-03-11T21:33:17Z

patel-zeel
Mar 11, 2025

I have created a tiny HF space for everyone to see how train_on_response_only will work on their models and inputs. You can also copy and share the URL populated in the end of the app to share the snippets with each other. Your suggestions/contributions are welcome!

Examples:

0 replies

machlovi · 2025-03-12T11:44:03Z

machlovi
Mar 12, 2025

I have created a tiny HF space for everyone to see how train_on_response_only will work on their models and inputs. You can also copy and share the URL populated in the end of the app to share the snippets with each other. Your suggestions/contributions are welcome!

Examples:

https://tinyurl.com/Qwen2-VL-7B-Instruct-Multi

https://tinyurl.com/phi-4-unsloth-bnb-4bit

https://tinyurl.com/Llama-32-1B-Instruct

@patel-zeel Thank you for sharing!I got the idea that it masks the inputs in chat format. What I am confused about is when I am finetuning a Phi4 https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb model, does it mask the system promptly at the very beginning of every batch by default?
I am using a chat template {system, user, assistant) so do we need to mask only the user?

I have been testing it for a couple of days, and my loss goes to almost 0 in a few steps<100, given I have 6000 steps to go. What I am concerned about is that some of the models can access the answer, leading to abrupt loss decay.

0 replies

patel-zeel · 2025-03-12T11:58:09Z

patel-zeel
Mar 12, 2025

@patel-zeel Thank you for sharing!I got the idea that it masks the inputs in chat format. What I am confused about is when I am finetuning a Phi4 https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb model, does it mask the system promptly at the very beginning of every batch by default?
I am using a chat template {system, user, assistant) so do we need to mask only the user?

Hi @machlovi! I am glad you found it helpful. Yes, it masks the system prompt by default due to this rule: Mask everything before the first occurrence of the response_part. See here: https://tinyurl.com/unsloth-twro-phi-4. You can play around it by changing the message to ensure it is doing what you expect it to do.

I have been testing it for a couple of days, and my loss goes to almost 0 in a few steps<100, given I have 6000 steps to go. What I am concerned about is that some of the models can access the answer, leading to abrupt loss decay.

I have never fine-tuned an LLM, but I have learned that not using a triangular (causal) mask in attention can cause such issues.

0 replies

marcelodiaz558 · 2025-03-14T23:15:48Z

marcelodiaz558
Mar 14, 2025

How does
instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n", 
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
work when potentially multiple instruction headers could be presented, eg a response to a function call "from: ipython"?

Would something like this be necessary?
instruction_part = "<|start_header_id|>user|ipython<|end_header_id|>\n\n", 

I've recently opened a PR (unslothai/unsloth-zoo#75) that addresses exactly this issue. The updated train_on_responses_only function now supports providing a list of multiple instruction patterns, allowing it to handle scenarios such as function calls or additional roles like ipython.

So, if this PR is merged, you will be able to provide an instruction_part like this to train_on_responses_only:

instruction_part = [
    "<|start_header_id|>user<|end_header_id|>\n\n",
    "<|start_header_id|>system<|end_header_id|>\n\n",
    "<|start_header_id|>ipython<|end_header_id|>\n\n"
]

This ensures each instruction pattern is matched and handled correctly during training.

0 replies

machlovi · 2025-03-19T14:01:14Z

machlovi
Mar 19, 2025

@patel-zeel Thank you for sharing!I got the idea that it masks the inputs in chat format. What I am confused about is when I am finetuning a Phi4 https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb model, does it mask the system promptly at the very beginning of every batch by default?
I am using a chat template {system, user, assistant) so do we need to mask only the user?

Hi @machlovi! I am glad you found it helpful. Yes, it masks the system prompt by default due to this rule: Mask everything before the first occurrence of the response_part. See here: https://tinyurl.com/unsloth-twro-phi-4. You can play around it by changing the message to ensure it is doing what you expect it to do.

I have been testing it for a couple of days, and my loss goes to almost 0 in a few steps<100, given I have 6000 steps to go. What I am concerned about is that some of the models can access the answer, leading to abrupt loss decay.

I have never fine-tuned an LLM, but I have learned that not using a triangular (causal) mask in attention can cause such issues.

When I select the different versions/chat formats, it does not change the train_on_response_part.
Like in case of chatml, it needs to give
instruction_part = "<|im_start|>user\n",
response_part = "<|im_start|>assistant\n",

or ,may be I am missing something here

0 replies

yorozcogonzalez · 2025-03-19T18:24:12Z

yorozcogonzalez
Mar 19, 2025

Hi everyone,

Is it possible to finetune a model on train_on_responses_only using the pre-tokenized training dataset?

I am asking because I am finetuning a mistral model on CompletionOnly, but in my case I am using the training dataset already tokenized (with the corresponding 'labels': [-100, -100, ...] and so on). The point is that when using unsloth, it ignores my labels and generates its own ('labels': [1, 3, 2569, 72136, 9371, ...]) skipping the masking to proceed as on_responses_only.
I tried using:

formatting_func = None #To disables ChatML conversion
I can not use:
trainer = train_on_responses_only(
trainer,
instruction_part = ,
response_part =
)
because it requires the instruction_part and response_part and my data is already tokenized.

So, is it possible to finetune the model on CompletionOnly providing the pre-tokenized training dataset?

0 replies

AliBakly · 2025-03-27T08:00:41Z

AliBakly
Mar 27, 2025

Is it good practice to use train_on_responses_only when calculating validation loss in order to obtain a loss measurement more comparable to the training loss?

0 replies

patel-zeel · 2025-03-27T10:55:05Z

patel-zeel
Mar 27, 2025

@yorozcogonzalez I saw that recent changes in the train_on_response_only function first check if labels already exist and copy from old_labels instead of input_ids directly. Could you please update the packages and see if your code does what you expect it to do?

@AliBakly Following the general rules of ML, people generally use a single setting over the train, validation, and test datasets, so what you suggest makes sense. Someone can correct me if there is a different practice in the LLM world.

0 replies

yorozcogonzalez · 2025-03-31T14:33:10Z

yorozcogonzalez
Mar 31, 2025

@yorozcogonzalez I saw that recent changes in the train_on_response_only function first check if labels already exist and copy from old_labels instead of input_ids directly. Could you please update the packages and see if your code does what you expect it to do?

@AliBakly Following the general rules of ML, people generally use a single setting over the train, validation, and test datasets, so what you suggest makes sense. Someone can correct me if there is a different practice in the LLM world.

Thanks, @patel-zeel! I’ll give it a try, though I’ve already updated my script to avoid using pre-tokenized data.

Regarding your question: “For all instruct models, should train_on_responses_only lead to better performance? How much better? Any relevant papers, blogs, or studies on this?”
I’ve fine-tuned some Instruct models to generate JSON objects by extracting information from context embedded in the prompt. In these cases, using CompletionOnly (or train_on_responses_only) significantly improves performance—about a 10% boost in Rouge and BLEU scores.

0 replies

imomayiz · 2025-04-09T17:55:29Z

imomayiz
Apr 9, 2025

Hi @patel-zeel, how is train_on_response_only different from DataCollatorForCompletionOnlyLM?

0 replies

patel-zeel · 2025-04-09T19:58:56Z

patel-zeel
Apr 9, 2025

Hi @imomayiz, thanks for this question. I was also wondering about it but never got a chance to look into it. Now that I look into it via this colab, it seems to do the job even in the conversation mode where user and assistant interact with each other more than once. The colab example has the same behavior with Unsloth (see the same example in my hf space).

So, my best guess is that Unsloth's train_on_response_only is custom-patched to ensure compatibility with Unsloth and to enable some features (e.g. VLM support) of Unsloth and its sister libraries like unsloth-zoo. If someone is working with pure HF on simple applications, DataCollatorForCompletionOnlyLM should do the job.

0 replies

Uh oh!

Documentation for train_on_responses_only? #2828

Uh oh!

Replies: 22 comments

Uh oh!

danielhanchen Jul 31, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielhanchen
Jul 31, 2024
Maintainer