Skip to content

Commit 496be5e

Browse files
committed
Add DMP '25 Week 12 blog: context-aware Arabic translation pipeline
1 parent a868738 commit 496be5e

File tree

1 file changed

+36
-58
lines changed

1 file changed

+36
-58
lines changed
Lines changed: 36 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,71 @@
11
---
2-
title: "DMP '25 Final Report by Aman Chadha"
3-
excerpt: "Concluding my DMP '25 project: Migrating Music Blocks’ localization from webL10n.js to i18next and building an AI-assisted translation system with contextual support."
2+
title: "DMP '25 Weekly Update: Context-Aware AI Translation Pipeline"
3+
excerpt: "Final week progress: Generating full context data, testing the AI-assisted translation pipeline, and producing Arabic translations using Google Translate API."
44
category: "DEVELOPER NEWS"
55
date: "2025-09-08"
6-
slug: "2025-09-08-dmp-25-aman-chadha-final"
6+
slug: "2025-09-08-dmp-25-weekly-update-aman-chadha"
77
author: "@/constants/MarkdownFiles/authors/aman-chadha.md"
8-
tags: "dmp25,sugarlabs,finalreport,aman-chadha"
8+
tags: "dmp25,sugarlabs,weeklyupdate,aman-chadha"
99
image: "assets/Images/c4gt_DMP.png"
1010
---
1111

12-
<!-- markdownlint-disable -->
13-
14-
# Final Report by Aman Chadha
12+
# Weekly Update: AI-Assisted Translation Pipeline Progress
1513

1614
**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4731)
1715
**Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/devinulibarri)
18-
**Duration:** July – September 2025
16+
**Week:** September 8 – September 14, 2025
1917

2018
---
2119

22-
## Project Overview
20+
## Full Context Extraction Completed
2321

24-
The aim of this project was to modernize the **internationalization (i18n) system** of Music Blocks by migrating from the legacy `webL10n.js` framework to **i18next**, and to introduce an **AI-assisted translation workflow** to reduce the burden on human translators.
22+
This week, I finalized the **generation of the full context file** for Music Blocks UI strings. Using my **RAG (Retrieval-Augmented Generation) model**, I extracted:
23+
- All `msgid`s from `.po` files.
24+
- Surrounding 5 lines of code above and below each string.
25+
- Any developer comments related to the string.
2526

26-
Key goals included:
27-
- Making the i18n workflow **cleaner, modular, and maintainable**.
28-
- Automating missing translations with **AI + context-awareness**.
29-
- Supporting community-driven refinements of translations.
27+
All extracted snippets were consolidated into a single **JSON file** with metadata including: source file, line numbers, and code snippets. This file is now ready for semantic indexing in **ChromaDB**, enabling fast and context-aware retrieval during translation.
3028

3129
---
3230

33-
## Achievements
34-
35-
1. **Migration from webL10n.js to i18next**
36-
- Replaced outdated framework with i18next for modern i18n support.
37-
- Added flexible fallback strategies (cleaned text, lowercase, title case, hyphenated).
38-
- Enabled JSON-based translation files for better maintainability.
39-
40-
2. **AI-Assisted Translation System**
41-
- Designed a pipeline to parse `.po` files and extract `msgid`s.
42-
- Generated **context automatically** by analyzing where each string occurs in the codebase.
43-
- Built a **RAG (Retrieval-Augmented Generation) model** to store these contexts.
44-
- Integrated **Google Translate API** to auto-fill missing translations using this context.
31+
## Testing the AI-Assisted Translation Pipeline
4532

46-
3. **Contributor-Friendly Workflow**
47-
- Human translators can review AI-generated suggestions instead of starting from scratch.
48-
- New language files can be created or updated automatically.
49-
- Significantly lowers the barrier for contributors to help Music Blocks reach more learners.
33+
With the context data ready, I started testing **language translator models**:
34+
- Generated explanations of UI strings using **Ollama’s language model**.
35+
- Integrated **Google Translate API** to auto-fill missing translations, producing the first **Arabic `.po` file** for Music Blocks.
5036

51-
---
37+
### Why Google Translate?
5238

53-
## Challenges & Solutions
39+
While evaluating translation options, I considered open-source alternatives like **LibreTranslate**:
40+
- **LibreTranslate / Argos Translate**: free and open-source.
41+
- **Drawback:** Translates word-by-word without understanding the surrounding code context, which can cause ambiguous translations for strings with multiple meanings (e.g., “duck” for pitch vs. volume).
5442

55-
- **Challenge:** Extracting meaningful context for each translation key.
56-
**Solution:** Implemented a RAG approach that links `msgid`s to their source code usage.
43+
**Google Translate API** was chosen because:
44+
- Handles context better in practical UI usage.
45+
- Integrates seamlessly with Python scripts and the RAG pipeline.
46+
- Produces more accurate and human-readable translations for the Arabic `.po` file.
5747

58-
- **Challenge:** Ensuring migration didn’t break existing functionality.
59-
**Solution:** Incremental testing with sample `.json` files and stepwise replacement of webL10n.
48+
The pipeline is designed to be **pluggable**, so other translation APIs like DeepL or OpenAI can be added in the future.
6049

6150
---
6251

63-
## Key Learnings
52+
## Outcome This Week
6453

65-
- Infrastructure-level improvements (like i18n) may not be flashy, but they **unlock global accessibility**.
66-
- Context is essential for high-quality translations — raw machine translation alone isn’t enough.
67-
- Clean migration strategy + thorough testing makes adoption smoother for the community.
54+
- Successfully generated **full context JSON** for all UI strings.
55+
- Tested retrieval and translation pipeline with Arabic.
56+
- Generated a **working Arabic `.po` file** with contextual translations.
57+
- Verified that translations respect the intended meaning of each UI string, reducing ambiguity for end-users.
6858

6959
---
7060

71-
## Future Work
61+
## Next Steps
7262

73-
- Add support for more AI translation providers (e.g., DeepL, OpenAI).
74-
- Build a simple **web-based review UI** for translators to accept/refine AI suggestions.
75-
- Automate detection of new/changed strings via GitHub Actions and update translation files dynamically.
63+
- Complete remaining `.po` files for additional languages (Japanese, Hindi).
64+
- Integrate translation workflow with the Music Blocks repository via **PRs**.
65+
- Build a simple **web-based review UI** for translators to refine AI suggestions.
7666

7767
---
7868

79-
## Closing Thoughts
80-
81-
This project was a unique opportunity to contribute infrastructure that strengthens Music Blocks for a **global user base**. By combining **modern i18n practices with AI translation support**, I hope this work helps learners worldwide access Music Blocks in their own languages with greater ease.
82-
83-
I’m deeply grateful to my mentors Walter and Devin, and to the Sugar Labs community, for their support and guidance throughout this journey.
69+
## Reflection
8470

85-
---
86-
87-
## Resources & References
88-
89-
- **Music Blocks Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks)
90-
- **GitHub PR:** [#4731](https://github.com/sugarlabs/musicblocks/pull/4731)
91-
- **i18next Documentation:** [i18next.com](https://www.i18next.com/)
92-
93-
---
71+
This week was crucial in validating the RAG-based pipeline and demonstrating the **value of context-aware AI translations**. Choosing the right translation model and generating comprehensive context sets the foundation for scaling Music Blocks’ localization to many more languages, increasing accessibility and engagement for learners worldwide.

0 commit comments

Comments
 (0)