🌍 Fine-tuning NLLB on Menyo-20k

This project fine-tunes Meta’s NLLB (No Language Left Behind) model using Parameter Efficient Fine-Tuning on the Menyo-20k dataset to improve translation quality for African languages. Training was conducted until early stopping to prevent overfitting.

📘 Overview

Model: facebook/nllb-200-distilled-600M
Dataset: Menyo-20k — parallel corpus for English–African language translation
Goal: Enhance NLLB performance on low-resource African languages (Yoruba, Igbo, Hausa, etc.)
Framework: Hugging Face Transformers

⚙️ Training Setup

Setting	Value
Batch Size	8
Learning Rate	1e-4
Gradient Accumulation Step	8
Scheduler	Cosine
Epochs	50
Early Stopping	Patience = 10
Environment	Kaggle (T4 GPU x 2)

🧩 Method

Preprocess and tokenize Menyo-20k using NLLB tokenizer
Fine-tune the model using LoRA and with Huggingface Trainer
Apply early stopping based on validation loss
Evaluate with BLEU and qualitative translation tests

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Machine Translation: English-Yoruba .ipynb		Machine Translation: English-Yoruba .ipynb
README.md		README.md
dsn-multilingual.ipynb		dsn-multilingual.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 Fine-tuning NLLB on Menyo-20k

📘 Overview

⚙️ Training Setup

🧩 Method

About

Uh oh!

Releases

Packages

Languages

Yanmi01/Machine-translation-English---Yoruba

Folders and files

Latest commit

History

Repository files navigation

🌍 Fine-tuning NLLB on Menyo-20k

📘 Overview

⚙️ Training Setup

🧩 Method

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages