Skip to content

Commit 475e7db

Browse files
authored
DMP 25 week 02 blog by Aman Chadha (#222)
* DMP Week-2 Blog * DMP '25 Week 02 Update by Aman Chadha
1 parent 6e302b6 commit 475e7db

File tree

2 files changed

+123
-0
lines changed

2 files changed

+123
-0
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
name: "Aman Chadha"
3+
slug: "aman-chadha"
4+
title: "DMP'25 Contributor"
5+
organization: "SugarLabs"
6+
description: "DMP'25 Contributor at SugarLabs"
7+
avatar: "https://avatars.githubusercontent.com/u/79802170?v=4"
8+
---
9+
10+
<!--markdownlint-disable-->
11+
12+
# About Aman Chadha
13+
14+
I am a DMP 2025 contributor working with Sugar Labs on enhancing Music Blocks' internationalization system using AI-supported translation. I'm passionate about building intelligent systems, developer tools, and creative educational platforms that empower users across languages.
15+
16+
## Experience
17+
18+
- Contributor at Sugar Labs (DMP '25)
19+
20+
## Current Projects
21+
22+
- **JS Internationalization with AI Translation Support**:
23+
Integrating a modern i18n workflow in Music Blocks and enhancing it with AI-powered fallback translations, context-aware retrieval, and part-of-speech–informed RAG models.
24+
25+
## Connect with Me
26+
27+
- **GitHub**: [@ac-mmi](https://github.com/ac-mmi)
28+
- **Email**: [[email protected]](mailto:[email protected])
29+
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: "DMP '25 Week 02 Update by Aman Chadha"
3+
excerpt: "Enhanced RAG output format with POS tagging and optimized code chunking for Music Blocks"
4+
category: "DEVELOPER NEWS"
5+
date: "2025-06-16"
6+
slug: "2025-06-16-dmp-25-aman-chadha-week02"
7+
author: "@/constants/MarkdownFiles/authors/aman-chadha.md"
8+
tags: "dmp25,sugarlabs,week02,aman-chadha"
9+
image: "assets/Images/c4gt_DMP.png"
10+
---
11+
12+
<!-- markdownlint-disable -->
13+
14+
# Week 02 Progress Report by Aman Chadha
15+
16+
**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4459)
17+
**Mentors:** [Walter Bender](https://github.com/walterbender)
18+
**Assisting Mentors:** *None this week*
19+
**Reporting Period:** 2025-06-09 - 2025-06-16
20+
21+
---
22+
23+
## Goals for This Week
24+
25+
- **Refactor RAG model output** to a structured dictionary format that includes part-of-speech (POS) tagging.
26+
- **Optimize AST-based chunking** by limiting code context to 5 lines above and below translation usage, per mentor feedback.
27+
- **Begin functional testing** of the updated RAG pipeline on real-world translation queries.
28+
29+
---
30+
31+
## This Week's Achievements
32+
33+
1. **RAG Output Enhancement**
34+
- Refactored the Retrieval-Augmented Generation model to return results as structured dictionaries.
35+
- Each entry now includes `msgid`, `msgstr`, source metadata, and the dominant part of speech, improving retrieval relevance.
36+
37+
2. **Code Chunking Optimization**
38+
- Reduced each extracted code chunk to include only 5 lines above and below the relevant `msgid` usage.
39+
- This improves retrieval precision and avoids irrelevant surrounding code.
40+
- Implemented using Babel’s AST traversal logic.
41+
42+
3. **Initial Model Testing**
43+
- Started testing the RAG model using sample translation queries.
44+
- Observed noticeable improvements in answer context relevance due to cleaner chunks and richer metadata.
45+
46+
---
47+
48+
## Challenges & How I Overcame Them
49+
50+
- **Challenge:** Integrating POS tagging meaningfully into the RAG data pipeline.
51+
**Solution:** Designed a dictionary schema that includes the part-of-speech alongside translation metadata, and verified correctness using test entries.
52+
53+
- **Challenge:** Tuning chunk granularity without losing contextual utility.
54+
**Solution:** Followed mentor Walter’s advice to use fixed ±5 line windows, and manually verified semantic coherence of resulting chunks.
55+
56+
---
57+
58+
## Key Learnings
59+
60+
- Part-of-speech tagging can significantly improve the contextual strength of retrieved translations.
61+
- Smaller, focused code chunks often result in better retrieval precision for RAG applications.
62+
- Mentor feedback and collaborative iteration are key to refining both code structure and user outcomes.
63+
64+
---
65+
66+
## Next Week's Roadmap
67+
68+
- Integrate POS-tagged RAG responses into the full i18n fallback translation pipeline.
69+
- Expand test coverage to include edge-case translations and re-used `msgid`s.
70+
- Prepare an internal demo to show RAG-powered retrieval resolving contextually ambiguous translation strings.
71+
72+
---
73+
74+
## Resources & References
75+
76+
- **Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks)
77+
- **RAG Concepts:** [arxiv.org/abs/2005.11401](https://arxiv.org/abs/2005.11401)
78+
- **Babel Parser Docs:** [babeljs.io/docs/en/babel-parser](https://babeljs.io/docs/en/babel-parser)
79+
- **spaCy POS Tagging:** [spacy.io/usage/linguistic-features#pos-tagging](https://spacy.io/usage/linguistic-features#pos-tagging)
80+
81+
---
82+
83+
## Acknowledgments
84+
85+
Thanks to my mentor Walter Bender for his guidance on optimizing chunking strategy and enriching the retrieval logic with linguistic features.
86+
87+
---
88+
89+
## Connect with Me
90+
91+
- GitHub: [@aman-chadha](https://github.com/ac-mmi)
92+
93+
94+
---

0 commit comments

Comments
 (0)