-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix: prevent loading best model when PEFT adapters are active (#3056) #3470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: prevent loading best model when PEFT adapters are active (#3056) #3470
Conversation
|
Hello! My apologies, but about a month back I also looked into PEFT loading again in 3e7d2fa. In short, this With that approach, it's also possible to use PEFT with loading best model, so no need to return/give a warning. I used this reproduction file# -*- coding: utf-8 -*-
"""reproduce bug
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1FfCUN1zwn0a7jG1WnDadXlmM4EFIL7wW
# Setup
"""
# !git clone https://github.com/GTimothee/transformers.git
# Commented out IPython magic to ensure Python compatibility.
# %cd transformers
# Commented out IPython magic to ensure Python compatibility.
# !git checkout sentencetransformers_load_peft
# !git pull
# %pip install .
# Commented out IPython magic to ensure Python compatibility.
# %pip install datasets sentence-transformers wandb
"""# Prepare data
code mostly copied from https://www.philschmid.de/fine-tune-embedding-model-for-rag
"""
from datasets import load_dataset
# Load dataset from the hub
dataset = load_dataset("philschmid/finanical-rag-embedding-dataset", split="train")
# rename columns
dataset = dataset.rename_column("question", "anchor")
dataset = dataset.rename_column("context", "positive")
# Add an id column to the dataset
dataset = dataset.add_column("id", range(len(dataset)))
# split dataset into a 10% test set
dataset = dataset.train_test_split(test_size=0.1)
# save datasets to disk
dataset["train"].to_json("train_dataset.json", orient="records")
dataset["test"].to_json("test_dataset.json", orient="records")
from datasets import load_dataset, concatenate_datasets
train_dataset = load_dataset("json", data_files="train_dataset.json")['train']#.select(range(50000))
val_dataset = load_dataset("json", data_files="test_dataset.json")['train']
corpus_dataset = concatenate_datasets([train_dataset, val_dataset])
corpus = dict(
zip(corpus_dataset["id"], corpus_dataset["positive"])
) # Our corpus (cid => document)
queries = dict(
zip(val_dataset["id"], val_dataset["anchor"])
) # Our queries (qid => question)
relevant_docs = {} # Query ID to relevant documents (qid => set([relevant_cids])
for q_id in queries:
relevant_docs[q_id] = [q_id]
from sentence_transformers.evaluation import (
InformationRetrievalEvaluator
)
from sentence_transformers.util import cos_sim
ir_evaluator = InformationRetrievalEvaluator(
queries=queries,
corpus=corpus,
relevant_docs=relevant_docs,
score_functions={"cosine": cos_sim},
batch_size=32,
corpus_chunk_size=100,
show_progress_bar=True
)
"""# Load model"""
from sentence_transformers import SentenceTransformer
from peft import LoraConfig, TaskType
import torch
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda" if torch.cuda.is_available() else "cpu")
peft_config = LoraConfig(
task_type=TaskType.FEATURE_EXTRACTION,
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.1,
)
model.add_adapter(peft_config)
embedding = model.encode("[bla] my name is [blub]")
print(embedding[:10])
from sentence_transformers import SentenceTransformerTrainingArguments
from sentence_transformers.training_args import BatchSamplers
args = SentenceTransformerTrainingArguments(
output_dir="training_trial",
num_train_epochs=0.1,
per_device_train_batch_size=2,
gradient_accumulation_steps=1,
per_device_eval_batch_size=.5,
warmup_ratio=0.1,
learning_rate=1e-3,
weight_decay=0.001,
fp16=True,
lr_scheduler_type='reduce_lr_on_plateau',
report_to=None,
batch_sampler=BatchSamplers.NO_DUPLICATES,
eval_strategy="epoch",
save_strategy="epoch",
logging_steps=.2,
save_total_limit=3,
load_best_model_at_end=True,
metric_for_best_model="eval_cosine_ndcg@10",
)
"""## it fails here:"""
from sentence_transformers import SentenceTransformerTrainer
from sentence_transformers.losses import MultipleNegativesRankingLoss
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset.select_columns(
["anchor", "positive"]
),
loss=MultipleNegativesRankingLoss(model),
evaluator=ir_evaluator
)
from transformers.utils.import_utils import is_peft_available
is_peft_available()
from packaging import version
import importlib
version.parse(importlib.metadata.version("peft"))
from transformers.integrations.peft import PeftAdapterMixin
print(isinstance(trainer.model, PeftAdapterMixin))
isinstance(trainer.model[0].auto_model, PeftAdapterMixin)
embedding = model.encode("[bla] my name is [blub]")
print("Before training:")
print(embedding[:10])
import os
os.environ["WANDB_MODE"] = "disabled"
trainer.train()
print(trainer.model[0].auto_model.active_adapters())
embedding = model.encode("[bla] my name is [blub]")
print("After training:")
print(embedding[:10])
# Save the adapter itself
model.save_pretrained("all-MiniLM-L6-v2-adapter")
# Load the adapter directly
loaded_model = SentenceTransformer("all-MiniLM-L6-v2-adapter")
embedding = loaded_model.encode("[bla] my name is [blub]")
print("After loading:")
print(embedding[:10])
"""
Before training:
[-0.05346012 -0.02178508 -0.00050329 -0.02896871 -0.09822005 0.00544653
0.19883962 0.00645554 -0.04425742 -0.09591079]
After training:
[-0.05672589 -0.09013492 -0.0635585 -0.00599355 -0.12770522 0.0531613
0.11739125 -0.03458377 -0.03921237 -0.06882482]
After loading:
[-0.05675213 -0.09006883 -0.0636415 -0.00605853 -0.12771909 0.05317839
0.11733141 -0.03461097 -0.03919081 -0.06877207]
"""I'll push the changes into this PR if you're okay with that.
|
|
Perfect @tomaarsen since this was my first PR to Sentence Transformer |
This pull request addresses issue #3056. When using load_best_model_at_end=True in combination with PEFT adapters, the trainer previously attempted to load a pytorch_model.bin file that doesn’t exist, resulting in a FileNotFoundError.
And the patch introduces a safeguard: it checks whether the model has any active adapters, and if so, skips reloading the best model. A clear user-facing warning is issued in these cases. This approach helps avoid crashes during training while retaining compatibility with adapter-based models.
The implementation closely follows the logic suggested in PR #3057.