|
9 | 9 |
|
10 | 10 |
|
11 | 11 | <h2>Overview</h2> |
12 | | -<p> This is a preview of the paper "Generative Urdu Speech Synthesis". All the weights are opensourced <a href="https://huggingface.co/zohann/urdu-tts">here.</a></p> |
13 | | -For any suggestions feel free to email me at: ahanzala[dot]cs[at]gmail[dot]com |
14 | | - |
| 12 | +This repository provides the official implementation of our paper: |
| 13 | +"Generative Urdu Speech Synthesis" |
| 14 | +Published in the IEEE Conference Proceedings, 2024. |
15 | 15 |
|
| 16 | +## 📄 Paper |
| 17 | +IEEE Xplore: [https://ieeexplore.ieee.org/document/10795832](https://ieeexplore.ieee.org/document/10795832) |
16 | 18 |
|
17 | | -<h2> Audio Samples </h2> |
| 19 | +DOI: 10.1109/ICCS62594.2024.10795832 |
18 | 20 |
|
19 | | - <table border="1"> |
20 | | - <thead> |
21 | | - <tr> |
22 | | - <th>Prompt</th> |
23 | | - <th>Audio</th> |
24 | | - </tr> |
25 | | - </thead> |
26 | | - <tbody> |
27 | | - <tr> |
28 | | - <td><pre>[English Prompt on our Urdu Model] we are testing this model for our project.</pre> </td> |
29 | | - <td> |
30 | | - <audio controls> |
31 | | - <source src="audios/english-only.wav" type="audio/wav"> |
32 | | - Your browser does not support the audio element. |
33 | | - </audio> |
34 | | - </td> |
35 | | - </tr> |
36 | | - <tr> |
37 | | - <td><pre>[English + Urdu Prompt] I'm doing good میں اچھا ہو آپ سناؤ </pre> </td> |
38 | | - <td> |
39 | | - <audio controls> |
40 | | - <source src="audios/urdu-n-english.wav" type="audio/wav"> |
41 | | - Your browser does not support the audio element. |
42 | | - </audio> |
43 | | - </td> |
44 | | - </tr> |
45 | | - <tr> |
46 | | - <td><pre> seecs ایک بہت اچھا ڈیپارٹمنٹ ہے</pre> </td> |
47 | | - <td> |
48 | | - <audio controls> |
49 | | - <source src="audios/urdu-only.mov" type="audio/wav"> |
50 | | - Your browser does not support the audio element. |
51 | | - </audio> |
52 | | - </td> |
53 | | - </tr> |
54 | | - <tr> |
55 | | - <td><pre>آپ کا نام کیا ہے؟</pre> </td> |
56 | | - <td> |
57 | | - <audio controls> |
58 | | - <source src="audios/1.wav" type="audio/wav"> |
59 | | - Your browser does not support the audio element. |
60 | | - </audio> |
61 | | - </td> |
62 | | - </tr> |
63 | | - <tr> |
64 | | - <td><pre> كيا آپ انگريزی بولتے ہیں؟</pre> </td> |
65 | | - <td> |
66 | | - <audio controls> |
67 | | - <source src="audios/2.wav" type="audio/wav"> |
68 | | - Your browser does not support the audio element. |
69 | | - </audio> |
70 | | - </td> |
71 | | - </tr> |
72 | | - <tr> |
73 | | - <td><pre> میں اردو سیکھنے کی کوشش کر رہا ہوں</pre> </td> |
74 | | - <td> |
75 | | - <audio controls> |
76 | | - <source src="audios/3.wav" type="audio/wav"> |
77 | | - Your browser does not support the audio element. |
78 | | - </audio> |
79 | | - </td> |
80 | | - </tr> |
81 | | - <tr> |
82 | | - <td><pre> آپ کہاں سے ہیں؟</pre> </td> |
83 | | - <td> |
84 | | - <audio controls> |
85 | | - <source src="audios/4.wav" type="audio/wav"> |
86 | | - Your browser does not support the audio element. |
87 | | - </audio> |
88 | | - </td> |
89 | | - </tr> |
90 | | - <tr> |
91 | | - <td><pre> آپ سے مل کر خوشی ہوئی</pre> </td> |
92 | | - <td> |
93 | | - <audio controls> |
94 | | - <source src="audios/5.wav" type="audio/wav"> |
95 | | - Your browser does not support the audio element. |
96 | | - </audio> |
97 | | - </td> |
98 | | - </tr> |
99 | | - <tr> |
100 | | - <td><pre>!یہ مجھے بہت پَسند آیا</pre> </td> |
101 | | - <td> |
102 | | - <audio controls> |
103 | | - <source src="audios/7.wav" type="audio/wav"> |
104 | | - Your browser does not support the audio element. |
105 | | - </audio> |
106 | | - </td> |
107 | | - </tr> |
108 | | - </tbody> |
109 | | - </table> |
110 | | - |
111 | | -Adding more and more soon.. |
112 | 21 |
|
| 22 | +For any suggestions feel free to email me at: ahanzala[dot]cs[at]gmail[dot]com |
113 | 23 |
|
114 | | -<h2>Reference</h2> |
115 | | -<ul> |
116 | | - <li> https://github.com/152334H/DL-Art-School</li> |
117 | | - <li> https://github.com/neonbjb/tortoise-tts</li> |
118 | | - </ul> |
119 | | -<h2>License</h2> |
120 | | - This project is licensed under the MIT License. Feel free to use and modify the code according to your needs. |
| 24 | +<h2>Abstract</h2> |
| 25 | +In recent years, Natural Language Processing (NLP) and speech synthesis have witnessed significant progress, resulting in the development of advanced Text-to-Speech (TTS) systems for various applications. While many TTS models excel in synthesizing English speech, their adaptability to new the languages, and diverse accents remains a challenging area of exploration. Urdu is a language spoken by millions of people around the globe especially in South Asia. Existing TTS models focus mainly on English and Chinese languages, having a minimal focus on Urdu and other low-resource languages. In this paper, we propose a generative Urdu TTS system. This research also undertakes a comprehensive investigation into the challenges associated with Urdu speech synthesis and evaluates the capabilities of Tortoise-TTS, a TTS model inspired by the DALL-E architecture, when applied to non-English languages, with a primary focus on Urdu. |
0 commit comments