GitHub - aswathyajith/lora_mem

How Memorization Paves the Path for Generalization during Finetuning

Abstract

Large Language Models (LLMs) pre-trained on generic corpora can be adapted to specific domains and tasks by fine-tuning their weights. Due to compute and memory limitations, parameter-efficient methods like LoRA have become a popular alternative to full fine-tuning of model parameters (i.e. updating a few parameters instead of all model parameters). While full fine-tuning and LoRA fine-tuning can achieve similar validation loss on fine-tuning data distributions, there has been limited work exploring differences between their generative behaviors. In this work, we find that even when full finetuning and LoRA finetuning exhibit similar performance on the validation set, full finetuning "memorizes" more of the data it was finetuned on than LoRA. Previous work has shown that full finetuning can generalize more to out-of-distribution data than LoRA. While it is generally believed that memorization of training data leads to poor generalizability on out of domain distributions, we find empirical evidence that training data memorization during finetuning is crucial for generalization to out-of-distribution data.

Output Visualization:

To run the streamlit app comparing the outputs of LoRA and full fine-tuning, make sure pandas and streamlit are installed. pip install streamlit pandas

Run the streamlit server from the project root dir: streamlit run demo/app.py --server.port 8501

[OPTIONAL] Set up port forwarding on local machine (if server is running on remote machine):

`ssh -L 8501:localhost:8501 user@remote`

Steps to Reproduce Results

Create environment and cd to this directory: conda env create -f environment.yml conda activate lora_mem
Fine-tune models (with lora or full param finetuning): ./src/slurm/submit_finetuning.sh

To check if finetuning is complete, run python src/utils/check_status.py with the appropriate arguments.
Hyperparameter search (find the best learning rate for the finetuned models): python src/finetuning/select_optimal_hp.py
Compute next token probabilities, actual token position, top k predictions, and token frequencies: ./src/scripts/tkn_freq_probs.sh
Generate postprocess config: python src/generate_postprocess_config.py
Postprocess outputs of models: ./src/scripts/postprocess_outputs.sh

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
configs		configs
demo		demo
docs		docs
notebooks		notebooks
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Memorization Paves the Path for Generalization during Finetuning

Abstract

Output Visualization:

Steps to Reproduce Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How Memorization Paves the Path for Generalization during Finetuning

Abstract

Output Visualization:

Steps to Reproduce Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages