axinc-ai
Published in

axinc-ai

PytorchDcTts : A Machine Learning Model for Text-to-speech Synthesis

This is an introduction to「PytorchDcTts」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

PytorchDcTts (Pytorch Deep Convolutional Text-to-Speech) is a machine learning model released in October 2017. It is capable of generating an audio file of a voice pronouncing a given input text.

Architecture

Recursive Neural Networks (RNN) are commonly used for speech synthesis tasks, but they have the problem of taking a long time to learn. To address this problem, PytorchDcTts uses CNNs to construct speech synthesis, which can be learned in about 15 hours on a typical gaming PC.

Speech synthesis without deep learning relies on a complex system with multiple components such as text analyzer, F0 generator, spectrum generator, pause estimator, and vocoder.

With deep learning, these multiple components can be aggregated into a single end-to-end model, allowing input to output to be computed directly.

The model architecture of PytorchDcTts works as follows.

Source: https://arxiv.org/pdf/1710.08969

In the flow diagram above, the input text is vectorized using TextEnc. Attention creates pairs of text and melspectogram with weigths. Then AudioDec compute the melspectrum and the SSRN (Spectrogram Super-resolution Netrowk) is used to improve the audio quality.

Below is an example of speech synthesis. From the top, we can see Attention, mel spectrogram, and linear STFT spectrogram.

Source: https://arxiv.org/pdf/1710.08969

The LJ Speech Dataset was used for training, which consists of 13K pairs of text and associated speeches, for a total of 24 hours of data.

Usage

You can use the following command to output a wav file from any English text.

$ python3 pytorch-dc-tts.py -i "Hello world" -s output.wav

Here is an example of the output speech of an input text introducing ailia SDK.

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store