Skip to content

Aidand vlm full finetune #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions learn/vlm-finetuning/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
donotcommit/
examples/
deepspeed_configs/
outputs/
!sample_data/images/
deploy_vars.sh
77 changes: 77 additions & 0 deletions learn/vlm-finetuning/2p5-7b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
base_model: Qwen/Qwen2.5-VL-7B-Instruct
processor_type: AutoProcessor

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false

# Qwen 2.5 uses the same chat template as Qwen 2.0
# In fact Qwen 2.5 template has been known to have bugs with images.
# https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/discussions/11
chat_template: qwen2_vl
datasets:
- path: sample_data/train.jsonl
type: chat_template
ds_type: json
split: train
field_messages: messages
dataset_prepared_path: last_run_prepared
val_set_size: 0.0
output_dir: ./outputs/out

adapter:
lora_model_dir:

sequence_len: 8192
pad_to_sequence_len: false

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

gradient_checkpointing: true
logging_steps: 1
flash_attention: true
eager_attention:

warmup_ratio: 0.1
weight_decay: 0.0

deepspeed: deepspeed_configs/zero1_torch_compile.json # multi-gpu only

# Save a checkpoint every 1000 steps
save_strategy: # Set to `"no"` to skip checkpoint saves, `"epoch"` at end of each epoch, `"best"` when better result is achieved, leave empty to infer from `save_steps`.
save_steps: 1000
evals_per_epoch:
saves_per_epoch:
# Save only the last 5 checkpoints
save_total_limit: 5

lora_r:
lora_alpha:
lora_dropout:
lora_target_modules:

bf16: true
fp16:
tf32: true

plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

max_steps: 10
130 changes: 130 additions & 0 deletions learn/vlm-finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Full Finetuning Qwen 2.5 7B and deploying on Fireworks

### Pre-requisites:
- [miniconda3](https://www.anaconda.com/docs/getting-started/miniconda/install)
- Or anyway to create a python 3.10 environment
- git
- 1 8xH100 node (1 8xA100 should should work too)

### Setup

```bash
git clone -b aidand-vlm-full-finetune [email protected]:aidando73/cookbook.git
# Or if using https:
git clone -b aidand-vlm-full-finetune https://github.com/aidando73/cookbook.git

cd cookbook/learn/vlm-finetuning

# Create environment
conda create --name axolotl-env python=3.10 -y
conda activate axolotl-env

# Install dependencies
pip install uv
uv pip install -U packaging==23.2 setuptools==75.8.0 wheel==0.45.1 ninja==1.11.1.4 requests==2.32.3 "huggingface-hub[cli]==0.31.0" torch==2.5.1
uv pip install --no-build-isolation "axolotl[flash-attn,deepspeed]==0.9.2"
```

We'll be using `axolotl` to finetune the model. I've tried finetuning with `trl`, but found that support for liger-kernel wasn't working and it adds a few breaking changes to config files.

To learn more about `axolotl`, see the [docs](https://docs.axolotl.ai/).

```bash
# Fetch deepspeed configs and examples
axolotl fetch deepspeed_configs
axolotl fetch examples
```

### Formatting your dataset

Dataset should be in a .jsonl format similar to (but not exactly the same as) OpenAI chat format. E.g.,

```json
{"messages": [{"role": "user", "content": [{"type": "text", "text": "What's in these two images?"}, {"type": "image", "base64": "data:image/jpeg;base64,..."}, {"type": "image", "path": "path/to/image/relative/to/where/command/is/being/executed.jpg"}]}, {"role": "assistant", "content": [{"type": "text", "text": "There are two images of a cat and a dog."}]}]}
{"messages": [{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]}, {"role": "user", "content": [{"type": "text", "text": "What's in this image?"}, {"type": "image", "url": "https://example.com/cat.jpg"}]}, {"role": "assistant", "content": [{"type": "text", "text": "There is a cat in the image."}]}]}
```

Reference the [axolotl multimodal docs](https://docs.axolotl.ai/docs/multimodal.html#dataset-format) for more details.
You can ask Claude/Cursor/ChatGPT to generate a script to format your dataset if you give it a few samples of your data.
It's recommended to avoid using "url" for images, as network conditions could cause your training run to fail.

For this tutorial, we'll be using a sample synthetic dataset [sample_data/train.jsonl](sample_data/train.jsonl) dataset. It contains 50 rows, of images of food (specified by path) and contains assistant responses that reason in `<think>...</think>` tags before classifying them. These responses were generated from Qwen 2.5 VL 32B Instruct. Images were downloaded from https://huggingface.co/datasets/ethz/food101.

#### Common issues:

- Messages with `{"content": "Regular text here"}` are not supported. This should be instead `{"content": [{"type": "text", "text": "Regular text here"}]}`. Otherwise you will get error:
```
pyarrow.lib.ArrowInvalid: JSON parse error: Column(/messages/[]/content) changed from string to array in row 0
```
- If using relative image path, paths should be relative to where axolotl is called from. E.g., for path `image_dir/image.jpg` if you are in `vlm-finetuning` directory, then the path to the image should be `vlm-finetuning/image_dir/image.jpg`.

### Training

An already prepared axolotl config file is provided in [2p5-7b.yaml](2p5-7b.yaml).

To finetune, run:

```bash
axolotl train 2p5-7b.yaml
```

The final model will be saved in `outputs/out` and checkpoints will be saved in `outputs/out/checkpoint-<step>`.

### Deploying on Fireworks

**Pre-requisite:**
- Install [firectl](https://docs.fireworks.ai/tools-sdks/firectl/firectl)

### Deployment

1. This requires a few variables to be set so we'll keep them in a file called `deploy_vars.sh` and load them when we run commands. First create this file:

```bash
cp deploy_vars.sh.example deploy_vars.sh
```

2. Next open `deploy_vars.sh` and add your account ID and API key.
3. Validate `ACCOUNT_ID` and `FIREWORKS_API_KEY` are correct.

```bash
source deploy_vars.sh && firectl -a $ACCOUNT_ID list models --api-key $FIREWORKS_API_KEY # You should see either an empty list or all your current models
ls $CHECKPOINT/config.json $CHECKPOINT/model-*-of-*.safetensors # You should see config.json and *.safetensors files
```

4. Next we create the model:

```bash
# For some reason axolotl checkpoints are missing a few config files
# So download them from the base model (we exclude weights)
huggingface-cli download Qwen/Qwen2.5-VL-7B-Instruct --local-dir $CHECKPOINT --exclude "*.safetensors" "model.safetensors.index.json"

# Load variables before running firectl commands
source deploy_vars.sh && firectl -a $ACCOUNT_ID create model $MODEL_NAME $CHECKPOINT --api-key $FIREWORKS_API_KEY
```

Next we create the deployment:

```bash
source deploy_vars.sh && firectl -a $ACCOUNT_ID create deployment accounts/$ACCOUNT_ID/models/$MODEL_NAME \
--accelerator-type="NVIDIA_H100_80GB" \
--min-replica-count 1 \
--accelerator-count 1 \
--api-key $FIREWORKS_API_KEY \
--deployment-id $MODEL_NAME # We set the deployment ID to the model name
```

Wait until the deployment is ready.

```bash
watch -c "firectl -a $ACCOUNT_ID list deployments --order-by='create_time desc' --api-key $FIREWORKS_API_KEY"
```

Then you can test the deployment:

```bash
source deploy_vars.sh && python fw_req.py --model accounts/$ACCOUNT_ID/models/$MODEL_NAME#accounts/$ACCOUNT_ID/deployments/$MODEL_NAME --api-key $FIREWORKS_API_KEY
```

### Thanks for trying Fireworks

Please email me at [email protected] if you have any questions/feedback. Or [drop something in my calendar](https://calendar.google.com/calendar/u/0/appointments/schedules/AcZssZ2iKVtCNOXAOLoYRcGh4ppHL_ztUU-osdlrAeR8dyvoZY2V-pMMMu_ozOjvTVeLg65Erkuu0UET).
16 changes: 16 additions & 0 deletions learn/vlm-finetuning/deploy_vars.sh.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Input required 👇
# You can get your API key at https://fireworks.ai/settings/users/api-keys
export FIREWORKS_API_KEY=
# (After you set your API key) You can get your account ID by running
# source deploy_vars.sh && firectl get account --api-key $FIREWORKS_API_KEY
export ACCOUNT_ID=





# This is the directory where the checkpoint is stored
# You can also point this at checkpoints like outputs/out/checkpoint-1000
export CHECKPOINT=outputs/out
# Model name
export MODEL_NAME=sft-qwen2p5-vl-7b-instruct
60 changes: 60 additions & 0 deletions learn/vlm-finetuning/fw_req.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/usr/bin/env python3
import requests
import base64
import json
import os
import argparse
import sys

def main():
parser = argparse.ArgumentParser(description='Query Fireworks AI vision model with an image')
parser.add_argument('--model',
help='Model to use',
required=True)
parser.add_argument('--api-key',
help='Fireworks API key',
required=True)
args = parser.parse_args()

# Read and encode image
with open("icecream.png", 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')

payload = {
"model": args.model,
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{image_data}"
}},
{"type": "text", "text": "What's in this image?"},
]
}
]
}

try:
response = requests.post(
"https://api.fireworks.ai/inference/v1/chat/completions",
headers={
"Authorization": f"Bearer {args.api_key}",
"Content-Type": "application/json"
},
json=payload
)

response.raise_for_status()
result = response.json()
print(result['choices'][0]['message']['content'])
except requests.exceptions.RequestException as e:
print(f"Error making request: {e}", file=sys.stderr)
sys.exit(1)
except json.JSONDecodeError as e:
print(f"Error parsing response: {e}", file=sys.stderr)
print(f"Raw response: {response.text}", file=sys.stderr)
sys.exit(1)

if __name__ == "__main__":
main()
Binary file added learn/vlm-finetuning/sample_data/images/1.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/10.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/11.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/12.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/13.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/14.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/15.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/16.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/17.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/18.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/2.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/20.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/21.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/22.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/24.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/25.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/26.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/27.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/28.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/29.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/3.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/30.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/31.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added learn/vlm-finetuning/sample_data/images/32.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/33.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/34.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/35.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/36.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/37.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/38.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/39.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/4.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/40.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/41.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/42.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/43.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/44.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/45.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/46.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/48.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/49.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/5.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/6.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/7.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/8.jpeg
Binary file added learn/vlm-finetuning/sample_data/images/9.jpeg
Loading