Skip to content

Commit 4b16068

Browse files
authored
Update README.md
1 parent 80edcc6 commit 4b16068

File tree

1 file changed

+9
-104
lines changed

1 file changed

+9
-104
lines changed

README.md

Lines changed: 9 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -9,112 +9,17 @@
99

1010

1111
<h2>Overview</h2>
12-
<p> This is a preview of the paper "Generative Urdu Speech Synthesis". All the weights are opensourced <a href="https://huggingface.co/zohann/urdu-tts">here.</a></p>
13-
For any suggestions feel free to email me at: ahanzala[dot]cs[at]gmail[dot]com
14-
12+
This repository provides the official implementation of our paper:
13+
"Generative Urdu Speech Synthesis"
14+
Published in the IEEE Conference Proceedings, 2024.​
1515

16+
## 📄 Paper
17+
IEEE Xplore: [https://ieeexplore.ieee.org/document/10795832](https://ieeexplore.ieee.org/document/10795832)
1618

17-
<h2> Audio Samples </h2>
19+
DOI: 10.1109/ICCS62594.2024.10795832
1820

19-
<table border="1">
20-
<thead>
21-
<tr>
22-
<th>Prompt</th>
23-
<th>Audio</th>
24-
</tr>
25-
</thead>
26-
<tbody>
27-
<tr>
28-
<td><pre>[English Prompt on our Urdu Model] we are testing this model for our project.</pre> </td>
29-
<td>
30-
<audio controls>
31-
<source src="audios/english-only.wav" type="audio/wav">
32-
Your browser does not support the audio element.
33-
</audio>
34-
</td>
35-
</tr>
36-
<tr>
37-
<td><pre>[English + Urdu Prompt] I'm doing good میں اچھا ہو آپ سناؤ </pre> </td>
38-
<td>
39-
<audio controls>
40-
<source src="audios/urdu-n-english.wav" type="audio/wav">
41-
Your browser does not support the audio element.
42-
</audio>
43-
</td>
44-
</tr>
45-
<tr>
46-
<td><pre> seecs ایک بہت اچھا ڈیپارٹمنٹ ہے</pre> </td>
47-
<td>
48-
<audio controls>
49-
<source src="audios/urdu-only.mov" type="audio/wav">
50-
Your browser does not support the audio element.
51-
</audio>
52-
</td>
53-
</tr>
54-
<tr>
55-
<td><pre>آپ کا نام کیا ہے؟</pre> </td>
56-
<td>
57-
<audio controls>
58-
<source src="audios/1.wav" type="audio/wav">
59-
Your browser does not support the audio element.
60-
</audio>
61-
</td>
62-
</tr>
63-
<tr>
64-
<td><pre> كيا آپ انگريزی بولتے ہیں؟</pre> </td>
65-
<td>
66-
<audio controls>
67-
<source src="audios/2.wav" type="audio/wav">
68-
Your browser does not support the audio element.
69-
</audio>
70-
</td>
71-
</tr>
72-
<tr>
73-
<td><pre> میں اردو سیکھنے کی کوشش کر رہا ہوں</pre> </td>
74-
<td>
75-
<audio controls>
76-
<source src="audios/3.wav" type="audio/wav">
77-
Your browser does not support the audio element.
78-
</audio>
79-
</td>
80-
</tr>
81-
<tr>
82-
<td><pre> آپ کہاں سے ہیں؟</pre> </td>
83-
<td>
84-
<audio controls>
85-
<source src="audios/4.wav" type="audio/wav">
86-
Your browser does not support the audio element.
87-
</audio>
88-
</td>
89-
</tr>
90-
<tr>
91-
<td><pre> آپ سے مل کر خوشی ہوئی</pre> </td>
92-
<td>
93-
<audio controls>
94-
<source src="audios/5.wav" type="audio/wav">
95-
Your browser does not support the audio element.
96-
</audio>
97-
</td>
98-
</tr>
99-
<tr>
100-
<td><pre>!یہ مجھے بہت پَسند آیا</pre> </td>
101-
<td>
102-
<audio controls>
103-
<source src="audios/7.wav" type="audio/wav">
104-
Your browser does not support the audio element.
105-
</audio>
106-
</td>
107-
</tr>
108-
</tbody>
109-
</table>
110-
111-
Adding more and more soon..
11221

22+
For any suggestions feel free to email me at: ahanzala[dot]cs[at]gmail[dot]com
11323

114-
<h2>Reference</h2>
115-
<ul>
116-
<li> https://github.com/152334H/DL-Art-School</li>
117-
<li> https://github.com/neonbjb/tortoise-tts</li>
118-
</ul>
119-
<h2>License</h2>
120-
This project is licensed under the MIT License. Feel free to use and modify the code according to your needs.
24+
<h2>Abstract</h2>
25+
In recent years, Natural Language Processing (NLP) and speech synthesis have witnessed significant progress, resulting in the development of advanced Text-to-Speech (TTS) systems for various applications. While many TTS models excel in synthesizing English speech, their adaptability to new the languages, and diverse accents remains a challenging area of exploration. Urdu is a language spoken by millions of people around the globe especially in South Asia. Existing TTS models focus mainly on English and Chinese languages, having a minimal focus on Urdu and other low-resource languages. In this paper, we propose a generative Urdu TTS system. This research also undertakes a comprehensive investigation into the challenges associated with Urdu speech synthesis and evaluates the capabilities of Tortoise-TTS, a TTS model inspired by the DALL-E architecture, when applied to non-English languages, with a primary focus on Urdu.

0 commit comments

Comments
 (0)