|
1 | 1 | --- |
2 | | -title: "DMP '25 Final Report by Aman Chadha" |
3 | | -excerpt: "Concluding my DMP '25 project: Migrating Music Blocks’ localization from webL10n.js to i18next and building an AI-assisted translation system with contextual support." |
| 2 | +title: "DMP '25 Weekly Update: Context-Aware AI Translation Pipeline" |
| 3 | +excerpt: "Final week progress: Generating full context data, testing the AI-assisted translation pipeline, and producing Arabic translations using Google Translate API." |
4 | 4 | category: "DEVELOPER NEWS" |
5 | 5 | date: "2025-09-08" |
6 | | -slug: "2025-09-08-dmp-25-aman-chadha-final" |
| 6 | +slug: "2025-09-08-dmp-25-weekly-update-aman-chadha" |
7 | 7 | author: "@/constants/MarkdownFiles/authors/aman-chadha.md" |
8 | | -tags: "dmp25,sugarlabs,finalreport,aman-chadha" |
| 8 | +tags: "dmp25,sugarlabs,weeklyupdate,aman-chadha" |
9 | 9 | image: "assets/Images/c4gt_DMP.png" |
10 | 10 | --- |
11 | 11 |
|
12 | | -<!-- markdownlint-disable --> |
13 | | - |
14 | | -# Final Report by Aman Chadha |
| 12 | +# Weekly Update: AI-Assisted Translation Pipeline Progress |
15 | 13 |
|
16 | 14 | **Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4731) |
17 | 15 | **Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/devinulibarri) |
18 | | -**Duration:** July – September 2025 |
| 16 | +**Week:** September 8 – September 14, 2025 |
19 | 17 |
|
20 | 18 | --- |
21 | 19 |
|
22 | | -## Project Overview |
| 20 | +## Full Context Extraction Completed |
23 | 21 |
|
24 | | -The aim of this project was to modernize the **internationalization (i18n) system** of Music Blocks by migrating from the legacy `webL10n.js` framework to **i18next**, and to introduce an **AI-assisted translation workflow** to reduce the burden on human translators. |
| 22 | +This week, I finalized the **generation of the full context file** for Music Blocks UI strings. Using my **RAG (Retrieval-Augmented Generation) model**, I extracted: |
| 23 | +- All `msgid`s from `.po` files. |
| 24 | +- Surrounding 5 lines of code above and below each string. |
| 25 | +- Any developer comments related to the string. |
25 | 26 |
|
26 | | -Key goals included: |
27 | | -- Making the i18n workflow **cleaner, modular, and maintainable**. |
28 | | -- Automating missing translations with **AI + context-awareness**. |
29 | | -- Supporting community-driven refinements of translations. |
| 27 | +All extracted snippets were consolidated into a single **JSON file** with metadata including: source file, line numbers, and code snippets. This file is now ready for semantic indexing in **ChromaDB**, enabling fast and context-aware retrieval during translation. |
30 | 28 |
|
31 | 29 | --- |
32 | 30 |
|
33 | | -## Achievements |
34 | | - |
35 | | -1. **Migration from webL10n.js to i18next** |
36 | | - - Replaced outdated framework with i18next for modern i18n support. |
37 | | - - Added flexible fallback strategies (cleaned text, lowercase, title case, hyphenated). |
38 | | - - Enabled JSON-based translation files for better maintainability. |
39 | | - |
40 | | -2. **AI-Assisted Translation System** |
41 | | - - Designed a pipeline to parse `.po` files and extract `msgid`s. |
42 | | - - Generated **context automatically** by analyzing where each string occurs in the codebase. |
43 | | - - Built a **RAG (Retrieval-Augmented Generation) model** to store these contexts. |
44 | | - - Integrated **Google Translate API** to auto-fill missing translations using this context. |
| 31 | +## Testing the AI-Assisted Translation Pipeline |
45 | 32 |
|
46 | | -3. **Contributor-Friendly Workflow** |
47 | | - - Human translators can review AI-generated suggestions instead of starting from scratch. |
48 | | - - New language files can be created or updated automatically. |
49 | | - - Significantly lowers the barrier for contributors to help Music Blocks reach more learners. |
| 33 | +With the context data ready, I started testing **language translator models**: |
| 34 | +- Generated explanations of UI strings using **Ollama’s language model**. |
| 35 | +- Integrated **Google Translate API** to auto-fill missing translations, producing the first **Arabic `.po` file** for Music Blocks. |
50 | 36 |
|
51 | | ---- |
| 37 | +### Why Google Translate? |
52 | 38 |
|
53 | | -## Challenges & Solutions |
| 39 | +While evaluating translation options, I considered open-source alternatives like **LibreTranslate**: |
| 40 | +- **LibreTranslate / Argos Translate**: free and open-source. |
| 41 | +- **Drawback:** Translates word-by-word without understanding the surrounding code context, which can cause ambiguous translations for strings with multiple meanings (e.g., “duck” for pitch vs. volume). |
54 | 42 |
|
55 | | -- **Challenge:** Extracting meaningful context for each translation key. |
56 | | - **Solution:** Implemented a RAG approach that links `msgid`s to their source code usage. |
| 43 | +**Google Translate API** was chosen because: |
| 44 | +- Handles context better in practical UI usage. |
| 45 | +- Integrates seamlessly with Python scripts and the RAG pipeline. |
| 46 | +- Produces more accurate and human-readable translations for the Arabic `.po` file. |
57 | 47 |
|
58 | | -- **Challenge:** Ensuring migration didn’t break existing functionality. |
59 | | - **Solution:** Incremental testing with sample `.json` files and stepwise replacement of webL10n. |
| 48 | +The pipeline is designed to be **pluggable**, so other translation APIs like DeepL or OpenAI can be added in the future. |
60 | 49 |
|
61 | 50 | --- |
62 | 51 |
|
63 | | -## Key Learnings |
| 52 | +## Outcome This Week |
64 | 53 |
|
65 | | -- Infrastructure-level improvements (like i18n) may not be flashy, but they **unlock global accessibility**. |
66 | | -- Context is essential for high-quality translations — raw machine translation alone isn’t enough. |
67 | | -- Clean migration strategy + thorough testing makes adoption smoother for the community. |
| 54 | +- Successfully generated **full context JSON** for all UI strings. |
| 55 | +- Tested retrieval and translation pipeline with Arabic. |
| 56 | +- Generated a **working Arabic `.po` file** with contextual translations. |
| 57 | +- Verified that translations respect the intended meaning of each UI string, reducing ambiguity for end-users. |
68 | 58 |
|
69 | 59 | --- |
70 | 60 |
|
71 | | -## Future Work |
| 61 | +## Next Steps |
72 | 62 |
|
73 | | -- Add support for more AI translation providers (e.g., DeepL, OpenAI). |
74 | | -- Build a simple **web-based review UI** for translators to accept/refine AI suggestions. |
75 | | -- Automate detection of new/changed strings via GitHub Actions and update translation files dynamically. |
| 63 | +- Complete remaining `.po` files for additional languages (Japanese, Hindi). |
| 64 | +- Integrate translation workflow with the Music Blocks repository via **PRs**. |
| 65 | +- Build a simple **web-based review UI** for translators to refine AI suggestions. |
76 | 66 |
|
77 | 67 | --- |
78 | 68 |
|
79 | | -## Closing Thoughts |
80 | | - |
81 | | -This project was a unique opportunity to contribute infrastructure that strengthens Music Blocks for a **global user base**. By combining **modern i18n practices with AI translation support**, I hope this work helps learners worldwide access Music Blocks in their own languages with greater ease. |
82 | | - |
83 | | -I’m deeply grateful to my mentors Walter and Devin, and to the Sugar Labs community, for their support and guidance throughout this journey. |
| 69 | +## Reflection |
84 | 70 |
|
85 | | ---- |
86 | | - |
87 | | -## Resources & References |
88 | | - |
89 | | -- **Music Blocks Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks) |
90 | | -- **GitHub PR:** [#4731](https://github.com/sugarlabs/musicblocks/pull/4731) |
91 | | -- **i18next Documentation:** [i18next.com](https://www.i18next.com/) |
92 | | - |
93 | | ---- |
| 71 | +This week was crucial in validating the RAG-based pipeline and demonstrating the **value of context-aware AI translations**. Choosing the right translation model and generating comprehensive context sets the foundation for scaling Music Blocks’ localization to many more languages, increasing accessibility and engagement for learners worldwide. |
0 commit comments