Skip to content

Conversation

@sahibpreetsingh12
Copy link
Contributor

This pull request addresses issue #3056. When using load_best_model_at_end=True in combination with PEFT adapters, the trainer previously attempted to load a pytorch_model.bin file that doesn’t exist, resulting in a FileNotFoundError.

And the patch introduces a safeguard: it checks whether the model has any active adapters, and if so, skips reloading the best model. A clear user-facing warning is issued in these cases. This approach helps avoid crashes during training while retaining compatibility with adapter-based models.

The implementation closely follows the logic suggested in PR #3057.

@tomaarsen
Copy link
Member

Hello!

My apologies, but about a month back I also looked into PEFT loading again in 3e7d2fa. In short, this _load_from_checkpoint works correctly with PEFT adapters, and I think it'd be wise to reuse it where possible.

With that approach, it's also possible to use PEFT with loading best model, so no need to return/give a warning.

I used this reproduction file
# -*- coding: utf-8 -*-
"""reproduce bug

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1FfCUN1zwn0a7jG1WnDadXlmM4EFIL7wW

# Setup
"""

# !git clone https://github.com/GTimothee/transformers.git

# Commented out IPython magic to ensure Python compatibility.
# %cd transformers

# Commented out IPython magic to ensure Python compatibility.
# !git checkout sentencetransformers_load_peft
# !git pull
# %pip install .

# Commented out IPython magic to ensure Python compatibility.
# %pip install datasets sentence-transformers wandb

"""# Prepare data

code mostly copied from https://www.philschmid.de/fine-tune-embedding-model-for-rag
"""

from datasets import load_dataset

# Load dataset from the hub
dataset = load_dataset("philschmid/finanical-rag-embedding-dataset", split="train")

# rename columns
dataset = dataset.rename_column("question", "anchor")
dataset = dataset.rename_column("context", "positive")

# Add an id column to the dataset
dataset = dataset.add_column("id", range(len(dataset)))

# split dataset into a 10% test set
dataset = dataset.train_test_split(test_size=0.1)

# save datasets to disk
dataset["train"].to_json("train_dataset.json", orient="records")
dataset["test"].to_json("test_dataset.json", orient="records")

from datasets import load_dataset, concatenate_datasets
train_dataset = load_dataset("json", data_files="train_dataset.json")['train']#.select(range(50000))
val_dataset = load_dataset("json", data_files="test_dataset.json")['train']
corpus_dataset = concatenate_datasets([train_dataset, val_dataset])

corpus = dict(
    zip(corpus_dataset["id"], corpus_dataset["positive"])
)  # Our corpus (cid => document)
queries = dict(
    zip(val_dataset["id"], val_dataset["anchor"])
)  # Our queries (qid => question)
relevant_docs = {}  # Query ID to relevant documents (qid => set([relevant_cids])
for q_id in queries:
    relevant_docs[q_id] = [q_id]

from sentence_transformers.evaluation import (
    InformationRetrievalEvaluator
)
from sentence_transformers.util import cos_sim
ir_evaluator = InformationRetrievalEvaluator(
    queries=queries,
    corpus=corpus,
    relevant_docs=relevant_docs,
    score_functions={"cosine": cos_sim},
    batch_size=32,
    corpus_chunk_size=100,
    show_progress_bar=True
)

"""# Load model"""

from sentence_transformers import SentenceTransformer
from peft import LoraConfig, TaskType
import torch

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda" if torch.cuda.is_available() else "cpu")
peft_config = LoraConfig(
    task_type=TaskType.FEATURE_EXTRACTION,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
model.add_adapter(peft_config)

embedding = model.encode("[bla] my name is [blub]")
print(embedding[:10])

from sentence_transformers import SentenceTransformerTrainingArguments
from sentence_transformers.training_args import BatchSamplers

args = SentenceTransformerTrainingArguments(
    output_dir="training_trial",
    num_train_epochs=0.1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,
    per_device_eval_batch_size=.5,
    warmup_ratio=0.1,

    learning_rate=1e-3,
    weight_decay=0.001,
    fp16=True,
    lr_scheduler_type='reduce_lr_on_plateau',
    report_to=None,

    batch_sampler=BatchSamplers.NO_DUPLICATES,
    eval_strategy="epoch",
    save_strategy="epoch",
    logging_steps=.2,
    save_total_limit=3,
    load_best_model_at_end=True,
    metric_for_best_model="eval_cosine_ndcg@10",
)

"""## it fails here:"""

from sentence_transformers import SentenceTransformerTrainer
from sentence_transformers.losses import MultipleNegativesRankingLoss

trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset.select_columns(
        ["anchor", "positive"]
    ),
    loss=MultipleNegativesRankingLoss(model),
    evaluator=ir_evaluator
)

from transformers.utils.import_utils import is_peft_available
is_peft_available()

from packaging import version
import importlib
version.parse(importlib.metadata.version("peft"))

from transformers.integrations.peft import PeftAdapterMixin
print(isinstance(trainer.model, PeftAdapterMixin))
isinstance(trainer.model[0].auto_model, PeftAdapterMixin)

embedding = model.encode("[bla] my name is [blub]")
print("Before training:")
print(embedding[:10])

import os
os.environ["WANDB_MODE"] = "disabled"
trainer.train()

print(trainer.model[0].auto_model.active_adapters())

embedding = model.encode("[bla] my name is [blub]")
print("After training:")
print(embedding[:10])

# Save the adapter itself
model.save_pretrained("all-MiniLM-L6-v2-adapter")
# Load the adapter directly
loaded_model = SentenceTransformer("all-MiniLM-L6-v2-adapter")
embedding = loaded_model.encode("[bla] my name is [blub]")
print("After loading:")
print(embedding[:10])

"""
Before training:
[-0.05346012 -0.02178508 -0.00050329 -0.02896871 -0.09822005  0.00544653
  0.19883962  0.00645554 -0.04425742 -0.09591079]

After training:
[-0.05672589 -0.09013492 -0.0635585  -0.00599355 -0.12770522  0.0531613
  0.11739125 -0.03458377 -0.03921237 -0.06882482]

After loading:
[-0.05675213 -0.09006883 -0.0636415  -0.00605853 -0.12771909  0.05317839
  0.11733141 -0.03461097 -0.03919081 -0.06877207]
"""

I'll push the changes into this PR if you're okay with that.
also cc @GTimothee as this should resolve #3056 and #3057

  • Tom Aarsen

@sahibpreetsingh12
Copy link
Contributor Author

Perfect @tomaarsen since this was my first PR to Sentence Transformer
Thanks for sharing. I'll keep putting my best efforts

@tomaarsen tomaarsen merged commit 7e8d2cc into huggingface:master Jul 29, 2025
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileNotFoundError when using SentenceTransformerTrainingArguments(load_best_model_at_end=True) and Peft

2 participants