|
1 | 1 | # Model Training scripts |
2 | 2 |
|
3 | 3 | ## Folder Structure |
4 | | -- `ds_config.json` contains the deepspeed configuration |
5 | | -- `general_acc.yaml` contains the accelerate configuration (might need to be modified as per desired system) |
6 | | -- `lora.py` contains the code for training model with LoRA |
7 | | -- `merge_lora.py` contains the code for merging trained LoRA adapters back to model for inference |
8 | | -- `seletkt.py` contains the code for training model with our algorithm explained in our paper |
9 | | -- `sft.py` contains the code for training model with Full Supervised Finetuning |
| 4 | +- `configs` contains the deepspeed and accelerate configurations (modifialbe as per the system) |
| 5 | +- `lora` contains the code for training model with LoRA |
| 6 | +- `seletkt` contains the code for training model with SeleKT algorithm explained in our paper |
| 7 | +- `sft` contains the code for training model with Full Supervised Finetuning |
10 | 8 |
|
11 | 9 | ## Usgae |
12 | 10 | ### Preparing the dataset |
|
23 | 21 | ### Training with SFT |
24 | 22 | - modify or replace the `general_acc.yaml` file as per the desired system configuration |
25 | 23 | - set the `zero_optimization-stage` to `3` and `overlap_comm` to `false` in `ds_config` for better memory optimizations |
26 | | -- Run the following command to start training |
27 | | - ```bash |
28 | | - deepspeed sft.py \ |
29 | | - --model_name_or_path "path to pretrained LLM" \ |
30 | | - --train_data_path "path to training data" \ |
31 | | - --output_dir "path to output dir" \ |
32 | | - --num_train_epochs 3 \ |
33 | | - --model_max_length 8192 \ |
34 | | - --per_device_train_batch_size 4 \ |
35 | | - --gradient_accumulation_steps 4 \ |
36 | | - --save_strategy "epoch" \ |
37 | | - --save_steps 760 \ |
38 | | - --save_total_limit 25 \ |
39 | | - --learning_rate 1e-5 \ |
40 | | - --warmup_ratio 0.1 \ |
41 | | - --logging_steps 5 \ |
42 | | - --report_to "wandb" \ |
43 | | - --gradient_checkpointing True \ |
44 | | - --deepspeed ds_config.json \ |
45 | | - --bf16 True \ |
46 | | - --run_name "Run name for logs" \ |
47 | | - --debug True \ |
48 | | - ``` |
49 | | - Update the above command as per the model |
50 | | -- To train on conversation data by only applying loss on the response, uncomment the lines 175, 176 and 185 and run the same command with proper conversational dataset path |
51 | | - ```python |
52 | | - response_template = "#RESPONSE\n" |
53 | | - collator = DataCollatorForCompletionOnlyLM(response_template=response_template, tokenizer=tokenizer) |
54 | | - |
55 | | - # Initialize trainer |
56 | | - trainer = SFTTrainer( |
57 | | - model=model, |
58 | | - processing_class=tokenizer, |
59 | | - train_dataset=dataset, |
60 | | - args=training_config, |
61 | | - callbacks=[Callback(flush_steps=1)], |
62 | | - data_collator=collator, # pass the collator in the trainer |
63 | | - ) |
64 | | - ``` |
| 24 | +- Add the respecitive variables like `MODEL_PATH`, `TRAIN_DATA`, `OUTPUT_DIR` etc. in the `run.sh` script and run |
| 25 | +```bash |
| 26 | +bash ./sft/run.sh |
| 27 | +``` |
65 | 28 |
|
66 | 29 | ### Training with LoRA |
67 | 30 | - modify or replace the `general_acc.yaml` file as per the desired system configuration |
68 | | -- set the `zero_optimization-stage` to `2` and `overlap_comm` to `false` in `ds_config` for better memory optimizations |
69 | | -- Run the following command to start training |
70 | | - ```bash |
71 | | - deepspeed lora.py \ |
72 | | - --model_name_or_path "path to pretrained LLM" \ |
73 | | - --train_data_path "path to training data" \ |
74 | | - --output_dir "path to output dir" \ |
75 | | - --num_train_epochs 3 \ |
76 | | - --model_max_length 8192 \ |
77 | | - --per_device_train_batch_size 4 \ |
78 | | - --gradient_accumulation_steps 4 \ |
79 | | - --save_strategy "epoch" \ |
80 | | - --save_steps 760 \ |
81 | | - --save_total_limit 25 \ |
82 | | - --learning_rate 1e-5 \ |
83 | | - --warmup_ratio 0.1 \ |
84 | | - --logging_steps 5 \ |
85 | | - --report_to "wandb" \ |
86 | | - --gradient_checkpointing True \ |
87 | | - --deepspeed ds_config.json \ |
88 | | - --bf16 True \ |
89 | | - --run_name "Run name for logs" \ |
90 | | - --debug True \ |
91 | | - ``` |
92 | | - Update the above command as per the model |
93 | | -- Put the path of output LoRA adapters inside `merge_lora.py` and run following to get the final checkpoints |
94 | | - ```bash |
95 | | - python merge_lora.py |
96 | | - ``` |
| 31 | +- set the `zero_optimization-stage` to `2` and `overlap_comm` to `false` in `ds_config` |
| 32 | +- Add the respecitive variables like `MODEL_PATH`, `TRAIN_DATA`, `OUTPUT_DIR` etc. in the `run.sh` script and run |
| 33 | +```bash |
| 34 | +bash ./lora/run.sh |
| 35 | +``` |
| 36 | +>`lora/lora.py` uses `use_reentrant: True` for gradient checkpointing, and this can allow using deepspeed zero-3 optimization for large models. |
97 | 37 |
|
98 | 38 | ### Training with SeleKT |
99 | 39 | - modify or replace the `general_acc.yaml` file as per the desired system configuration |
100 | | -- set the `zero_optimization-stage` to `2` and `overlap_comm` to `false` in `ds_config` for better memory optimizations |
101 | | -- Run the following command to start training |
102 | | - ```bash |
103 | | - accelerate launch \ |
104 | | - --config_file=general_acc.yaml \ |
105 | | - selekt.py \ |
106 | | - --model_name_or_path "path to pretrained LLM" \ |
107 | | - --base_model_path "path to pretrained LLM" \ |
108 | | - --train_data_path "path to training data" \ |
109 | | - --output_dir "path to output directory" \ |
110 | | - --num_train_epochs 3 \ |
111 | | - --model_max_length 8192 \ |
112 | | - --per_device_train_batch_size 4 \ |
113 | | - --gradient_accumulation_steps 4 \ |
114 | | - --save_strategy "steps" \ |
115 | | - --save_steps "Enter the periodicity value M for seleKT" \ |
116 | | - --save_total_limit 50 \ |
117 | | - --learning_rate 1e-5 \ |
118 | | - --warmup_ratio 0.1 \ |
119 | | - --logging_steps 5 \ |
120 | | - --report_to "wandb" \ |
121 | | - --gradient_checkpointing True \ |
122 | | - --deepspeed ds_config.json \ |
123 | | - --bf16 True \ |
124 | | - --run_name "Name for logs" \ |
125 | | - --debug True \ |
126 | | - --alpha "Enter value for desired alpha parameter for SeleKT" \ |
127 | | - ``` |
128 | | - Update the above command as per the model |
129 | | -- To train on conversation data by only applying loss on the response, uncomment the lines 291, 292 and 301 and run the same command with proper conversational dataset path |
130 | | - ```python |
131 | | - ```python |
132 | | - response_template = "#RESPONSE\n" |
133 | | - collator = DataCollatorForCompletionOnlyLM(response_template=response_template, tokenizer=tokenizer) |
134 | | - |
135 | | - # Initialize trainer |
136 | | - trainer = SFTTrainer( |
137 | | - model=model, |
138 | | - processing_class=tokenizer, |
139 | | - train_dataset=dataset, |
140 | | - args=training_config, |
141 | | - callbacks=[Callback(flush_steps=1)], |
142 | | - data_collator=collator, # pass the collator in the trainer |
143 | | - ) |
144 | | - ``` |
| 40 | +- set the `zero_optimization-stage` to `3` and `overlap_comm` to `false` in `ds_config` for better memory optimizations |
| 41 | +- Add the respecitive variables like `MODEL_PATH`, `TRAIN_DATA`, `OUTPUT_DIR` etc. in the `run.sh` script and run |
| 42 | +```bash |
| 43 | +bash ./selekt/run.sh |
0 commit comments