Skip to content

Commit ed445b0

Browse files
committed
DMP '25 Week 02 Update by Aman Chadha
1 parent eee4146 commit ed445b0

File tree

2 files changed

+69
-41
lines changed

2 files changed

+69
-41
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
name: "Aman Chadha"
3+
slug: "aman-chadha"
4+
title: "DMP'25 Contributor"
5+
organization: "SugarLabs"
6+
description: "DMP'25 Contributor at SugarLabs"
7+
avatar: "https://avatars.githubusercontent.com/u/79802170?v=4"
8+
---
9+
10+
<!--markdownlint-disable-->
11+
12+
# About Aman Chadha
13+
14+
I am a DMP 2025 contributor working with Sugar Labs on enhancing Music Blocks' internationalization system using AI-supported translation. I'm passionate about building intelligent systems, developer tools, and creative educational platforms that empower users across languages.
15+
16+
## Experience
17+
18+
- Contributor at Sugar Labs (DMP '25)
19+
20+
## Current Projects
21+
22+
- **JS Internationalization with AI Translation Support**:
23+
Integrating a modern i18n workflow in Music Blocks and enhancing it with AI-powered fallback translations, context-aware retrieval, and part-of-speech–informed RAG models.
24+
25+
## Connect with Me
26+
27+
- **GitHub**: [@ac-mmi](https://github.com/ac-mmi)
28+
- **Email**: [[email protected]](mailto:[email protected])
29+

src/constants/MarkdownFiles/posts/dmp-25-AmanChadha-week02.md

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,88 @@
11
---
2-
title: "DMP 25 Week 02 Update by Aman Chadha"
3-
excerpt: "Enhancing RAG output with part-of-speech tagging and optimizing chunk granularity"
2+
title: "DMP '25 Week 02 Update by Aman Chadha"
3+
excerpt: "Enhanced RAG output format with POS tagging and optimized code chunking for Music Blocks"
44
category: "DEVELOPER NEWS"
55
date: "2025-06-16"
6-
slug: "dmp-25-aman-week02"
7-
author: "Aman Chadha"
8-
description: "DMP '25 Contributor working on retrieval-augmented generation for Music Blocks"
9-
tags: "dmp25,musicblocks,rag,week02"
6+
slug: "2025-06-16-dmp-25-aman-chadha-week02"
7+
author: "@/constants/MarkdownFiles/authors/aman-chadha.md"
8+
tags: "dmp25,sugarlabs,week02,aman-chadha"
109
image: "assets/Images/c4gt_DMP.png"
1110
---
1211

12+
<!-- markdownlint-disable -->
13+
1314
# Week 02 Progress Report by Aman Chadha
1415

1516
**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4459)
16-
17-
**Mentors:** [Walter Bender](https://github.com/walterbender)
18-
19-
**Reporting Period:** 2025-06-09 – 2025-06-16
17+
**Mentors:** [Walter Bender](https://github.com/walterbender)
18+
**Assisting Mentors:** *None this week*
19+
**Reporting Period:** 2025-06-09 - 2025-06-16
2020

2121
---
2222

2323
## Goals for This Week
2424

25-
- Refine the RAG model output format for improved downstream use.
26-
- Implement part-of-speech tagging to enrich context awareness in RAG retrieval.
27-
- Reduce chunk size for more precise retrieval based on mentor feedback.
28-
- Begin testing the RAG model with real-world queries.
25+
- **Refactor RAG model output** to a structured dictionary format that includes part-of-speech (POS) tagging.
26+
- **Optimize AST-based chunking** by limiting code context to 5 lines above and below translation usage, per mentor feedback.
27+
- **Begin functional testing** of the updated RAG pipeline on real-world translation queries.
2928

3029
---
3130

32-
## This Weeks Achievements
31+
## This Week's Achievements
3332

34-
1. **Enhanced RAG Output Format**
35-
- Updated the RAG model to return results in a dictionary structure.
36-
- Included part-of-speech information for each translation unit, enabling more nuanced context retrieval.
33+
1. **RAG Output Enhancement**
34+
- Refactored the Retrieval-Augmented Generation model to return results as structured dictionaries.
35+
- Each entry now includes `msgid`, `msgstr`, source metadata, and the dominant part of speech, improving retrieval relevance.
3736

38-
2. **Chunk Optimization**
39-
- Adjusted AST-based code chunking logic to include only 5 lines above and below the relevant translation call.
40-
- This change was implemented based on feedback from mentor Walter during a sync-up meeting.
41-
- The refined chunk size improves focus and reduces noise in context matching.
37+
2. **Code Chunking Optimization**
38+
- Reduced each extracted code chunk to include only 5 lines above and below the relevant `msgid` usage.
39+
- This improves retrieval precision and avoids irrelevant surrounding code.
40+
- Implemented using Babel’s AST traversal logic.
4241

43-
3. **Initial Testing of RAG Model**
44-
- Started testing the RAG system with real query samples from Music Blocks.
45-
- Observed initial improvements in contextual relevance due to enriched metadata and refined chunks.
42+
3. **Initial Model Testing**
43+
- Started testing the RAG model using sample translation queries.
44+
- Observed noticeable improvements in answer context relevance due to cleaner chunks and richer metadata.
4645

4746
---
4847

4948
## Challenges & How I Overcame Them
5049

51-
- **Challenge:** Integrating part-of-speech tagging meaningfully into the RAG pipeline.
52-
**Solution:** Created a structured dictionary-based output that includes the msgid, msgstr, pos, and source metadata for every entry.
50+
- **Challenge:** Integrating POS tagging meaningfully into the RAG data pipeline.
51+
**Solution:** Designed a dictionary schema that includes the part-of-speech alongside translation metadata, and verified correctness using test entries.
5352

54-
- **Challenge:** Deciding optimal chunk boundaries without losing semantic context.
55-
**Solution:** Followed mentor advice to use 5-line windows above and below relevant code, then verified accuracy by manual testing.
53+
- **Challenge:** Tuning chunk granularity without losing contextual utility.
54+
**Solution:** Followed mentor Walter’s advice to use fixed ±5 line windows, and manually verified semantic coherence of resulting chunks.
5655

5756
---
5857

5958
## Key Learnings
6059

61-
- Better metadata, such as part-of-speech labels, can significantly improve the performance of retrieval-augmented models.
62-
- Small refinements in chunk size and structure can lead to clearer, more actionable context.
63-
- Collaborative iteration with mentor input is crucial in aligning technical decisions with practical outcomes.
60+
- Part-of-speech tagging can significantly improve the contextual strength of retrieved translations.
61+
- Smaller, focused code chunks often result in better retrieval precision for RAG applications.
62+
- Mentor feedback and collaborative iteration are key to refining both code structure and user outcomes.
6463

6564
---
6665

67-
## Next Weeks Roadmap
66+
## Next Week's Roadmap
6867

69-
- Integrate the refined RAG model into the full translation flow in Music Blocks.
70-
- Evaluate RAG accuracy with various translation strings, particularly ambiguous or reused ones.
71-
- Continue improving the fallback logic for missing translations using AI suggestions.
68+
- Integrate POS-tagged RAG responses into the full i18n fallback translation pipeline.
69+
- Expand test coverage to include edge-case translations and re-used `msgid`s.
70+
- Prepare an internal demo to show RAG-powered retrieval resolving contextually ambiguous translation strings.
7271

7372
---
7473

7574
## Resources & References
7675

77-
- **Music Blocks Repository:** [github.com/your-org/musicblocks](https://github.com/your-org/musicblocks)
78-
- **Babel AST Docs:** https://babeljs.io/docs/en/babel-parser
79-
- **Part-of-Speech Tagging (spaCy):** https://spacy.io/usage/linguistic-features#pos-tagging
80-
- **RAG Model Concepts:** https://arxiv.org/abs/2005.11401
76+
- **Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks)
77+
- **RAG Concepts:** [arxiv.org/abs/2005.11401](https://arxiv.org/abs/2005.11401)
78+
- **Babel Parser Docs:** [babeljs.io/docs/en/babel-parser](https://babeljs.io/docs/en/babel-parser)
79+
- **spaCy POS Tagging:** [spacy.io/usage/linguistic-features#pos-tagging](https://spacy.io/usage/linguistic-features#pos-tagging)
8180

8281
---
8382

8483
## Acknowledgments
8584

86-
Thanks to my mentor Walter Bender for his continued feedback and suggestions to improve retrieval relevance and model usability.
85+
Thanks to my mentor Walter Bender for his guidance on optimizing chunking strategy and enriching the retrieval logic with linguistic features.
8786

8887
---
8988

0 commit comments

Comments
 (0)