Skip to content

Commit e7028ed

Browse files
authored
GSoC 25 week 8 update by Elwin Li (sugarlabs#340)
1 parent bf86793 commit e7028ed

File tree

1 file changed

+113
-0
lines changed

1 file changed

+113
-0
lines changed
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "GSoC '25 Week 8 Update by Elwin Li"
3+
excerpt: "MusicBlocks generation model"
4+
category: "DEVELOPER NEWS"
5+
date: "2025-07-26"
6+
slug: "2025-07-26-gsoc-25-Elwin-Li-week08"
7+
author: "@/constants/MarkdownFiles/authors/elwin-li.md"
8+
tags: "gsoc25,sugarlabs,week8,music generation,RAG"
9+
image: "assets/Images/GSOC.png"
10+
---
11+
12+
<!-- markdownlint-disable -->
13+
14+
# Week 8 Progress Report by Elwin Li
15+
16+
**Project:** MusicBlocks Generation Model
17+
18+
**Mentors:** [Walter Bender](https://github.com/walterbender), [Anindya Kundu](https://github.com/meganindya), [Devin Ulibarri](https://github.com/pikurasa)
19+
20+
**Reporting Period:** 2025-07-19 - 2025-07-26
21+
22+
---
23+
24+
## Goals for This Week
25+
26+
- **Goal:** Generate MIDI from prompt for MusicBlocks generation model
27+
28+
---
29+
30+
## This Week’s Achievements
31+
32+
Last week, I made the pivot from trying to fine tune a model to building a RAG pipeline. This week, I have completed building a RAG pipeline that takes in a prompt in the form of a song, artist, or music style, and generates a MIDI note sequence in a similar style.
33+
34+
This was done by the following:
35+
1. **Data Collection & Cleaning**: Found and cleaned a large dataset of MIDI files to use as the foundation for the generation model.
36+
37+
2. **Metadata Extraction**: Extracted important metadata from each MIDI file including:
38+
- Artist name
39+
- Song title
40+
- Musical style/genre
41+
- BPM (Beats Per Minute)
42+
- Additional musical characteristics
43+
This step proved crucial for improving the retrieval accuracy of the RAG pipeline.
44+
45+
3. **Vector Embedding**: Used Langchain to:
46+
- Create embeddings of the MIDI data and metadata
47+
- Store the embeddings in a vector database
48+
This forms the "Retrieval" component of the RAG system.
49+
50+
4. **Similarity Search**: When a user inputs a prompt (e.g., "hotel california"):
51+
- The system performs a similarity search between the query and vector database
52+
- Returns either the exact matching song (if present in dataset)
53+
- Or returns similar songs based on musical characteristics
54+
55+
5. **Generation Pipeline**: Using the retrieved MIDI representation:
56+
- Leveraged Gemini API with carefully engineered prompts
57+
- Generated new melodies that maintain similar musical characteristics
58+
- Output new MIDI files that capture the style of the requested song
59+
60+
---
61+
62+
## Challenges & How I Overcame Them
63+
64+
- **Challenge:** Realized that the available dataset was too small for effective fine-tuning.
65+
66+
**Solution:** Shifted focus to learning about Retrieval-Augmented Generation (RAG) as an alternative approach.
67+
68+
- **Challenge:** Some MIDI files in the dataset had formatting issues and corruption.
69+
70+
**Solution:** Implemented thorough data cleaning and validation:
71+
- Checked for proper MIDI file structure
72+
- Removed corrupted or malformed files
73+
- Validated tempo and time signature information
74+
- Ensured consistent formatting across the dataset
75+
76+
- **Challenge:** Initial attempts at embedding raw MIDI data resulted in poor retrieval accuracy.
77+
78+
**Solution:** Enhanced the embedding process by:
79+
- Including rich metadata alongside MIDI data
80+
- Adding musical characteristics like genre, tempo, and key
81+
- Incorporating artist and song information
82+
- This significantly improved the relevance of retrieved results
83+
84+
---
85+
86+
## Key Learnings
87+
88+
- **RAG as an Alternative to Fine-tuning**: Learned that RAG can be an effective approach when dealing with limited training data, as it leverages existing knowledge rather than requiring extensive fine-tuning.
89+
90+
- **Data Quality is Critical**: Discovered the importance of thorough data preprocessing and validation in building robust ML systems. Poor quality data can significantly impact system performance.
91+
92+
- **Embedding Strategy Matters**: Realized that the choice of what information to include in embeddings greatly affects retrieval accuracy. Including rich metadata alongside raw data can substantially improve results.
93+
94+
- **MIDI Data Handling**: Gained practical experience in:
95+
- Working with MIDI file formats
96+
- Handling corrupted files
97+
- Extracting musical characteristics
98+
99+
---
100+
101+
## Next Week’s Roadmap
102+
103+
- Improve Output Quality
104+
- Documentation & Testing
105+
- Use gemini embedding model
106+
107+
---
108+
109+
## Acknowledgments
110+
111+
Thank you to my mentors, the Sugar Labs community, and fellow GSoC contributors for ongoing support.
112+
113+
---

0 commit comments

Comments
 (0)