urdu-tts

Generative Urdu Speech Synthesis

License: MIT Inference in Colab

Overview

This repository provides the official implementation of our paper: “Generative Urdu Speech Synthesis” Published in the IEEE Conference Proceedings, 2024.​

📄 Paper

IEEE Xplore: https://ieeexplore.ieee.org/document/10795832

DOI: 10.1109/ICCS62594.2024.10795832

For any suggestions feel free to email me at: ahanzala[dot]cs[at]gmail[dot]com

Abstract

In recent years, Natural Language Processing (NLP) and speech synthesis have witnessed significant progress, resulting in the development of advanced Text-to-Speech (TTS) systems for various applications. While many TTS models excel in synthesizing English speech, their adaptability to new the languages, and diverse accents remains a challenging area of exploration. Urdu is a language spoken by millions of people around the globe especially in South Asia. Existing TTS models focus mainly on English and Chinese languages, having a minimal focus on Urdu and other low-resource languages. In this paper, we propose a generative Urdu TTS system. This research also undertakes a comprehensive investigation into the challenges associated with Urdu speech synthesis and evaluates the capabilities of Tortoise-TTS, a TTS model inspired by the DALL-E architecture, when applied to non-English languages, with a primary focus on Urdu.