Skip to content

Commit d539139

Browse files
committed
GSoC Week 2 report by Mebin Thattil
1 parent b4ac851 commit d539139

File tree

1 file changed

+140
-0
lines changed

1 file changed

+140
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: "GSoC ’25 Week 02 Update by Mebin J Thattil"
3+
excerpt: "Fine-Tuning, Deploying, Testing & Evaluations"
4+
category: "DEVELOPER NEWS"
5+
date: "2025-06-14"
6+
slug: "2025-06-14-gsoc-25-mebinthattil-week2"
7+
author: "Mebin J Thattil"
8+
description: "GSoC'25 Contributor at SugarLabs - Speak Activity"
9+
tags: "gsoc25,sugarlabs,week02,mebinthattil,speak_activity"
10+
image: "assets/Images/GSOCxSpeak.png"
11+
---
12+
13+
<!-- markdownlint-disable -->
14+
15+
# Week 02 Progress Report by Mebin J Thattil
16+
17+
**Project:** [Speak Activity](https://github.com/sugarlabs/speak)
18+
**Mentors:** [Chihurumnaya Ibiam](https://github.com/chimosky), [Kshitij Shah](https://github.com/kshitijdshah99)
19+
**Assisting Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/pikurasa)
20+
**Reporting Period:** 2025-06-08 - 2025-06-14
21+
22+
---
23+
24+
## Goals for This Week
25+
26+
- **Goal 1:** Setup AWS for Fine-Tuning.
27+
- **Goal 2:** Fine-Tune a small model on a small dataset.
28+
- **Goal 3:** Deploy the model on AWS and create an API endpoint.
29+
- **Goal 4:** Test the endpoint using a python script.
30+
- **Goal 5:** Evaluate the model responses and think about next steps.
31+
32+
---
33+
34+
## This Week’s Achievements
35+
36+
1. **Setup AWS for Fine-Tuning**
37+
- Setup AWS SageMaker.
38+
- Provisioned GPUs on AWS SageMaker to fine-tune Llama3-1B foundation model.
39+
40+
2. **Dataset & Cleaning**
41+
- Used an open dataset. It was a dataset about conversations between a student and a teacher.
42+
- The dataset was cleaned and converted into a format that Llama needed for fine-tuning.
43+
- Wrote a small script to convert the dataset into a format that Llama can understand.
44+
- The dataset along with the script is available [here](https://github.com/mebinthattil/Education-Dialogue-Dataset).
45+
46+
3. **Fine-tuning**
47+
- Fine-tuned the model on a small set of the dataset, just to see how it performs and to get familar with AWS SageMaker.
48+
- The training job ran on a `ml.g5.2xlarge` instance.
49+
- The hyperparameters that were set so as to reduce memory footprint and mainly to test things. I'll list the hyperparameters, hoping this would serve as documentation for future fine-tuning.
50+
51+
**Hyperparameters**:
52+
53+
| Name | Value |
54+
|----------------------------------|----------------------------------------------------|
55+
| add_input_output_demarcation_key | True |
56+
| chat_dataset | True |
57+
| chat_template | Llama3.1 |
58+
| enable_fsdp | False |
59+
| epoch | 5 |
60+
| instruction_tuned | False |
61+
| int8_quantization | True |
62+
| learning_rate | 0.0001 |
63+
| lora_alpha | 8 |
64+
| lora_dropout | 0.08 |
65+
| lora_r | 2 |
66+
| max_input_length | -1 |
67+
| max_train_samples | -1 |
68+
| max_val_samples | -1 |
69+
| per_device_eval_batch_size | 1 |
70+
| per_device_train_batch_size | 4 |
71+
| preprocessing_num_workers | None |
72+
| sagemaker_container_log_level | 20 |
73+
| sagemaker_job_name | jumpstart-dft-meta-textgeneration-l-20250607-200133|
74+
| sagemaker_program | transfer_learning.py |
75+
| sagemaker_region | ap-south-1 |
76+
| sagemaker_submit_directory | /opt/ml/input/data/code/sourcedir.tar.gz |
77+
| seed | 10 |
78+
| target_modules | q_proj,v_proj |
79+
| train_data_split_seed | 0 |
80+
| validation_split_ratio | 0.2 |
81+
82+
4. **Saving the model**
83+
- The safetensors and other model files were saved in an AWS S3 bucket. The URI of the bucket is: ``` s3://sagemaker-ap-south-1-021891580293/jumpstart-run2/output/model/ ```
84+
85+
5. **Deploying the model**
86+
- The model was deployed on AWS SageMaker and an API endpoint was created.
87+
88+
6. **Testing the model**
89+
- A python script was written to test the model using the API endpoint.
90+
91+
7. **Evaluation**
92+
- The model responses were tested using the same questions used in my [benchmark](https://llm-benchmarking-sugar.streamlit.app/) done before.
93+
94+
95+
---
96+
97+
## Unexpected Model Output
98+
99+
- After fine-tuning the model, I noticed that the model was producing some unexpected output. I expected the model to behave like general chatbot but in a more friendly and teacher-like manner. While the model's responses did sound like a teacher, the model would often try to create an entire chain of conversations generating the next response from a students perspective and then proceeed to answer itself.
100+
- This behaviour was becaues of the way the dataset was strucutred. The dataset was enssentially a list of back and forth conversations between a student and a teacher. So it makes sense that the model would try to create a chain of conversations. But this is not what we need from the model.
101+
- The next step is to change the strucutre of the dataset to make it just answer questions, but also to make it more conversational and understand the nuaces of a chain of conversations.
102+
- The temporary fix was the add a stop statement while generating responses and also tweaking the system prompt. But again, this is not the right way to go about it. The right was is to change the dataset structure.
103+
104+
---
105+
106+
## Sample model output with stop condition
107+
108+
![sample model output](https://mebin.shop/Mebin-test-FT-model-tesponses.png)
109+
110+
---
111+
112+
## Key Learnings
113+
114+
- Structure of dataset needs to be changed, in order to make it more conversational and understand the nuances of a chain of conversations.
115+
116+
---
117+
118+
## Next Week’s Roadmap
119+
120+
- Re-structure the dataset
121+
- Re-Train and Fine-Tunethe model on the new dataset
122+
- Deploy, create endpoint and test the model on the new dataset
123+
- Evaluate the model on the new dataset and add to benchmarks
124+
125+
---
126+
127+
## Acknowledgments
128+
129+
Thank you to my mentors, the Sugar Labs community, and fellow GSoC contributors for ongoing support.
130+
131+
---
132+
133+
## Connect with Me
134+
135+
- Website: [mebin.in](https://mebin.in/)
136+
- GitHub: [@mebinthattil](https://github.com/mebinthattil)
137+
138+
- LinkedIn: [Mebin Thattil](https://www.linkedin.com/in/mebin-thattil/)
139+
140+
---

0 commit comments

Comments
 (0)