Skip to content

Commit c245f1d

Browse files
authored
GSoC Week 12 report + Final Report by Mebin Thattil (#415)
1 parent 1f90a11 commit c245f1d

File tree

1 file changed

+237
-0
lines changed

1 file changed

+237
-0
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
---
2+
title: "GSoC ’25 Week 12 + Final Report by Mebin J Thattil"
3+
excerpt: "Integrating everything, wrapping up the project & Final Report"
4+
category: "DEVELOPER NEWS"
5+
date: "2025-08-24"
6+
slug: "2025-08-24-gsoc-25-mebinthattil-week12"
7+
author: "@/constants/MarkdownFiles/authors/mebin-thattil.md"
8+
tags: "gsoc25,sugarlabs,week12,mebinthattil,speak_activity"
9+
image: "assets/Images/GSOCxSpeak.png"
10+
---
11+
12+
# Week 12 Progress Report by Mebin J Thattil
13+
14+
**Project:** [Speak Activity](https://github.com/sugarlabs/speak)
15+
**Mentors:** [Chihurumnaya Ibiam](https://github.com/chimosky), [Kshitij Shah](https://github.com/kshitijdshah99)
16+
**Assisting Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/pikurasa)
17+
**Reporting Period:** 2025-08-17 - 2025-08-24
18+
19+
---
20+
21+
## Goals for This Week
22+
23+
- **Goal 1:** Integration of LLM as brains  
24+
- **Goal 2:** Write a script to check if user is connected to internet and choose the appropriate mode for chatbot (LLM or SLM)  
25+
- **Goal 3:** Personas implementation
26+
- **Goal 4:** UI changes to include the new features
27+
28+
---
29+
30+
## This Week’s Progress
31+
32+
### **1. Integration of LLM as brains**
33+
34+
Now that SugarAI was deployed I was finally able to integrate the LLM into speak! It was a fairly simple implementation. I just had to call the API and feed the response to kokoro once it passed the profanity checks. This is ofcourse only if the user is connected to internet and the SugarAI servers are up.
35+
36+
### **2. Script to check internet connectivity**
37+
38+
This is a fairly simple function that checks if the user is able to reach our servers that run SugarAI. Only if this is true will it send request to the servers.
39+
40+
### **3. Personas implementation**
41+
42+
Personas have been implemented! Each persona has a unique name, voice and personality. Personas are stored in `personas.json` and new personas can be created very easily by adding more to this file. Here is the current json file that show all the default personas:
43+
44+
```json
45+
{
46+
"Jane": {
47+
"voice": "af_bella",
48+
"prompt": "You are a friendly teacher named Jane who is 28 years old. You teach 10 year old children. Always give helpful, educational responses in simple words that children can understand. Keep your answers between 20-40 words. Be encouraging and enthusiastic but never use emojis(ever). If you notice spelling mistakes, gently correct them. Stay focused on the topic and give relevant answers."
49+
},
50+
51+
"Dr.Sam": {
52+
"voice": "am_adam",
53+
"prompt": "You are Dr. Sam, a friendly and thoughtful doctor who enjoys talking to 10-year-old children about what it's like to be a doctor. You don’t give medical advice—you just explain how doctors help people, what hospitals are like, and how the human body works in fun, simple ways. Use clear, easy-to-understand language and keep your answers between 20–40 words. You're curious, caring, and always calm. You love when kids ask questions and you're happy to share what it's like to care for others. If there are any spelling mistakes, gently correct them. Stay focused on the topic and give helpful, encouraging answers. Sometimes you share neat facts about the body or how doctors train, always making learning feel safe and interesting."
54+
},
55+
56+
"Captain Stella": {
57+
"voice": "am_santa",
58+
"prompt": "You are Captain Stella, an adventurous space explorer who loves teaching 10-year-old children about planets, stars, and the mysteries of the universe. Use simple words and keep answers between 20–40 words. Be enthusiastic and encourage curiosity about space. If children make spelling mistakes, gently correct them. Stay focused on space topics and give fun, educational answers."
59+
},
60+
61+
"Professor Oakley": {
62+
"voice": "bm_george",
63+
"prompt": "You are Professor Oakley, a curious scientist who loves explaining experiments, nature, and how things work to 10-year-old children. Use simple, clear words and keep answers between 20–40 words. Be excited about learning and encourage questions. Gently correct spelling mistakes. Stay on topic and share interesting science facts."
64+
},
65+
66+
"Liam the Football Player": {
67+
"voice": "am_liam",
68+
"prompt": "You are Liam, a fun and energetic football player who teaches 10-year-old children about teamwork, sportsmanship, fitness, and how practice helps improve skills. Use simple words and keep answers between 20–40 words. Be motivating and friendly. Gently correct spelling mistakes. Stay focused on sports topics and give helpful answers."
69+
},
70+
71+
"Ollie the Owl": {
72+
"voice": "ef_dora",
73+
"prompt": "You are Ollie, a wise and curious owl who teaches 10-year-old children about nature, nighttime animals, and how to observe the world quietly and carefully. Use simple words and keep answers between 20–40 words. Be calm, patient, and encouraging. Gently correct spelling mistakes. Share interesting facts about animals and nature."
74+
}
75+
}
76+
77+
```
78+
79+
### **4. UI changes to include the new features**
80+
81+
The UI was updated to include all the new features like personas switching, voice switching etc. A new icon was made for the button to switch personas.
82+
![Personas Icon](https://raw.githubusercontent.com/mebinthattil/speak-ai/df1291828088316f4684569a56680e38e7e72491/icons/Personas_Icon.svg)
83+
84+
85+
# Everything finally comes together
86+
87+
This marks the completion of a good MVP. I also made a demo showcasing all the new AI features. Note that this demo was shot before few things like the personas icon was changed, but shows how all the new features work.
88+
89+
*Demo:*
90+
<iframe src="https://drive.google.com/file/d/1CZNywu1THdSUpR0my-UWqnZ4YhZ2BGkW/preview" width="840" height="480" allow="autoplay"></iframe>
91+
92+
---
93+
94+
## Final Report
95+
96+
### **Project Overview**
97+
98+
The objective of this GSoC project was to **modernize and enhance the Speak Activity** using gen-AI and transforming it from a simple text-to-speech tool into an intelligent, conversational learning companion. The project aimed to integrate modern TTS models, deploy both local Small Language Models (SLMs) and cloud-hosted Large Language Models (LLMs), and create an engaging persona-based interaction system for children.
99+
100+
### **Key Deliverables**
101+
1. **Modern TTS Integration** - Replaced traditional espeak with Kokoro TTS for natural-sounding, multi-voice audio generation
102+
2. **Dual Model System For Chatbot Brains** - Implemented both local SLM and cloud LLM as part of the chatbot mode
103+
3. **SugarAI** - Deployed cloud infrastructure at [ai.sugarlabs.org](https://ai.sugarlabs.org) for hosting the LLMs. Used by other activities as well.
104+
4. **Interactive Personas** - Created character based learning experiences with unique voices and personalities
105+
5. **Comprehensive Safety Features** - Built profanity filters and child safe interaction mechanisms
106+
107+
All features are optimized for educational environments and resource constrained devices.
108+
109+
---
110+
111+
### **Project Timeline and Achievements**
112+
113+
#### **Phase 1: Research and Benchmarking (Weeks 1-3)**
114+
115+
**Week 1: Model Selection and Benchmarking**
116+
- **LLM/SLM Evaluation:** Created a [Streamlit benchmarking app](https://llm-benchmarking-sugar.streamlit.app/) to compare different models
117+
- **Dual-Model Architecture Discovery:** Experimented with generation + refinement approach using Gemma3-1B, achieving performance comparable to 30B parameter models
118+
- **Resource Constraint Analysis:** Identified need for models under 100MB for packaging with Speak activity
119+
120+
**Week 2: Fine-tuning Infrastructure and Dataset Development**
121+
- **AWS SageMaker Setup:** Provisioned GPU infrastructure for model training on `ml.g5.2xlarge` instances
122+
- **Educational Dataset Creation:** Developed and cleaned [Education-Dialogue-Dataset](https://github.com/mebinthattil/Education-Dialogue-Dataset) with teacher-student conversations
123+
- **Model Training:** Fine-tuned Llama3-1B with educational conversation patterns
124+
- **Deployment Infrastructure:** Created model storage and API endpoint systems on AWS
125+
126+
**Week 3: Dataset Restructuring and Optimization**
127+
- **Conversation Format Refinement:** Restructured dataset to prevent chain response generation issues
128+
- **Model Behavior Analysis:** Identified and resolved conversational flow problems in fine tuned models
129+
- **Training Optimization:** Developed improved training approaches for educational use cases
130+
131+
---
132+
133+
#### **Phase 2: TTS Integration and Voice Development (Weeks 4-6)**
134+
135+
**Week 4: Kokoro TTS Integration**
136+
- **Modern TTS Implementation:** Successfully integrated Kokoro TTS with minimal additional dependencies
137+
- **Voice Catalog Access:** Enabled entire collection of Kokoro voices for different personas
138+
- **Audio Pipeline Development:** Built temporary WAV file approach as initial implementation
139+
- **Dependency Optimization:** Swapped Kokoro's fallback from espeak-ng to espeak to reduce dependencies
140+
- **Community Testing Platform:** Deployed [voice mixing web app](https://newstreamlit-frontend.blackpond-9921706d.eastus.azurecontainerapps.io/) for feedback collection
141+
142+
**Week 5: SLM Development and Quantization**
143+
- **Lightweight Model Training:** Fine-tuned [Llama 135M](https://huggingface.co/amd/AMD-Llama-135m) on educational dataset
144+
- **Size Optimization:** Achieved ~500MB model size with potential for further quantization
145+
- **Performance Evaluation:** Benchmarked model performance against larger alternatives
146+
- **Dataset Quality Improvement:** Enhanced training data with better conversational patterns
147+
148+
**Week 6: Dataset Enhancement and Performance Optimization**
149+
- **Comprehensive Dataset Revision:** Created higher-quality training data using Gemini for teacher-child conversation patterns
150+
- **Model Re-training:** Conducted multiple fine-tuning iterations with improved datasets
151+
- **Performance Analysis:** Formal benchmarking against 50-question evaluation set
152+
- **Size Constraint Solutions:** Achieved critical component sizes:
153+
- **TTS:** 0.7MB base + 0.5MB per additional voice
154+
- **SLM:** 82.6MB
155+
- **Llama.cpp:** 2MB (if using distributed binaries)
156+
157+
---
158+
159+
#### **Phase 3: Infrastructure and Streaming Optimization (Weeks 7-9)**
160+
161+
**Week 7: Community Feedback and Platform Deployment**
162+
- **Comprehensive Model Benchmarking:** Added all 16 fine-tuned SLM variants to [benchmark comparison](https://slm-benchmark.streamlit.app/)
163+
- **AWS Infrastructure Success:** Secured G-series GPU instances after multiple service limit requests
164+
- **Model Repository Organization:** Created [comprehensive model collection](https://huggingface.co/MebinThattil/models) on Hugging Face
165+
- **Community Evaluation Platform:** Deployed benchmarking tools for community model selection
166+
167+
**Week 8: Audio Streaming and Safety Features**
168+
- **GStreamer Optimization:** Implemented direct audio streaming from Kokoro to GStreamer using `appsrc` element
169+
- **Platform-Agnostic Inference:** Replaced compiled binaries with `llama-cpp-python` for cross-platform compatibility
170+
- **Safety Implementation:** Built comprehensive profanity filtering system with base64-encoded word lists
171+
- **Latency Reduction:** Achieved significant performance improvements through streaming architecture
172+
173+
**Week 9: Critical Bug Fixes and System Integration**
174+
- **Mouth Movement Synchronization:** Resolved timing issues through three iterations of optimization
175+
- **Audio Pipeline Sync:** Achieved synchronization between voice output and mouth movements
176+
- **System Architecture Completion:** Integrated all components into cohesive Speak activity
177+
178+
---
179+
180+
#### **Phase 4: Cloud Infrastructure and Final Integration (Weeks 10-12)**
181+
182+
**Week 10: SugarAI Deployment**
183+
- **Cloud Infrastructure:** Successfully deployed SugarAI on AWS EC2 with G5 GPUs
184+
- **Containers:** Implemented Docker-based deployment with GPU acceleration
185+
- **Network Security:** Configured secure inbound rules limiting access to HTTPS and SSH only
186+
- **Service Architecture:** Established foundation for public API accessibility
187+
188+
**Week 11: Production Deployment and Security**
189+
- **SSL Certificate Integration:** Implemented Let's Encrypt certificates for secure HTTPS access
190+
- **Nginx Proxy Configuration:** Created proxy setup mapping internal services to public endpoints
191+
- **DNS Configuration:** Established A record for [ai.sugarlabs.org](https://ai.sugarlabs.org) domain
192+
- **Authentication Systems:** Integrated Google OAuth under Sugar Labs organization
193+
- **Public API Launch:** Made SugarAI publicly accessible with comprehensive API documentation
194+
195+
**Week 12: Complete System Integration**
196+
- **LLM Integration:** Connected cloud-hosted LLM to Speak activity for enhanced conversations
197+
- **Intelligent Mode Switching:** Implemented automatic fallback between LLM (online) and SLM (offline) based on connectivity
198+
- **Personas System:** Deployed character-based learning with unique voices and personalities
199+
- **UI Enhancement:** Complete interface update accommodating all new AI-powered features
200+
- **Production Demo:** Created comprehensive demonstration showcasing all integrated features
201+
202+
---
203+
204+
### **Repositories and Resources**
205+
206+
#### **Core Repositories**
207+
- [SpeakAI Activity](https://github.com/sugarlabs/speak-ai)
208+
- [PR to SpeakAI](https://github.com/sugarlabs/speak-ai/pull/1)
209+
- [Benchmarking Tools](https://github.com/mebinthattil/LLM-benchmarking)
210+
- [Educational Dataset](https://github.com/mebinthattil/Education-Dialogue-Dataset)
211+
- [Model Archive](https://github.com/mebinthattil/Fine-Tune-Attempts-LlaMA-135)
212+
- [Kokoro Integration](https://github.com/mebinthattil/Kokoro-FastAPI)
213+
214+
#### **Testing Platforms**
215+
- [Model Benchmarking](https://llm-benchmarking-sugar.streamlit.app/)
216+
- [SLM Comparison](https://slm-benchmark.streamlit.app/)
217+
- [Voice Testing](https://newstreamlit-frontend.blackpond-9921706d.eastus.azurecontainerapps.io/)
218+
219+
#### **Model Collection**
220+
[16+ fine-tuned models](https://huggingface.co/MebinThattil/models) on Hugging Face
221+
222+
---
223+
224+
### **Acknowledgments**
225+
226+
Thanks to mentors **Chihurumnaya Ibiam** and **Kshitij Shah**, assisting mentors **Walter Bender** and **Devin Ulibarri**, and the Sugar Labs community.
227+
228+
---
229+
230+
### **Conclusion**
231+
232+
This project transformed the Speak Activity from basic text-to-speech into an intelligent learning companion. The hybrid model architecture ensures accessibility regardless of connectivity, while personas make learning engaging through specialized characters. The SugarAI platform provides scalable infrastructure for future Sugar activities.
233+
234+
The modernized Speak activity demonstrates how AI can enhance education while maintaining offline functionality and resource efficiency for all students globally.
235+
236+
---
237+

0 commit comments

Comments
 (0)