Photo by Daniel Sandvik on Unsplash

Text to Speech in C++

A simple program to implement text to speech in C++

Ahmad Anis
Published in
6 min readFeb 25, 2020

--

Text to speech is a common implementation of Machine Learning and indeed a lot of great machine learning applications have been built which uses text to speech. It is a lot easier to do text to speech in C++ just by importing some predefined models and use them. Let’s take a look at it.

import pyttsx3
engine = pyttsx3.init() # object creation
""" RATE"""
rate = engine.getProperty('rate') # getting details of current speaking rate
print (rate) #printing current voice rate
engine.setProperty('rate', 125) # setting up new voice rate
"""VOLUME"""
volume = engine.getProperty('volume') #getting to know current volume level (min=0 and max=1)
print (volume) #printing current volume level
engine.setProperty('volume',1.0) # setting up volume level between 0 and 1
"""VOICE"""
voices = engine.getProperty('voices') #getting details of current voice
#engine.setProperty('voice', voices[0].id) #changing index, changes voices. o for male
engine.setProperty('voice', voices[1].id) #changing index, changes voices. 1 for female
engine.say("Hello World!")
engine.say('My current speaking rate is ' + str(rate))
engine.runAndWait()
engine.stop()

You see, 20 lines of code in Python and basic TTS system. Now, let’s take a look at how we can do it in C++.

Pre-Requisites

  • Microsoft Visual Studio with C++ or downloaded SAPI for Linux
  • Familiarity with Pointers
  • Familiarity with Multi-Threaded & COM programming helps but not Necessary

Lets Start

We will start with basics coding and then we will go with the object-oriented approach.

Let’s include Headers.

#include <sapi.h>
#include<iostream>
#include <string>
using namespace std;

We are going to declare a pointer of the ISpVoice type. The ISpVoice interface enables an application to perform text synthesis operations.

ISpVoice* pVoice=NULL;

Now we declare a variable of HRESULT type. The HRESULT is a data type used in Windows operating systems, and the earlier IBM/Microsoft OS/2 operating system, to represent error conditions, and warning conditions. The original purpose of HRESULT was to formally layout ranges of error codes for both public and Microsoft internal use to prevent collisions between error codes in different subsystems of the OS/2 operating system.HRESULTs are numerical error codes. Various bits within an HRESULT encode information about the nature of the error code, and where it came from.HRESULT error codes are most commonly encountered in COM programming, where they form the basis for a standardized COM error handling convention.

HRESULT hr;

and now a wstring type variable in which we will take input from the user.

wstring input;

SAPI is a COM-based application, and COM must be initialized both before use and during the time SAPI is active. In most cases, this is for the lifetime of the host application.

a=CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if (FAILED(a))
{
cout << "ERROR 404 FAILED INITIALIZING COM\n";
exit(1);
}

Let’s understand the parameters of CoInitializeEx.

HRESULT CoInitializeEx(LPVOID pvReserved, DWORD dwCoInit);

The first parameter is reserved and must be NULL. The second parameter specifies the threading model that your program will use.

Once we have COM in working conditions, our next thing is to create the voice which is simply COM object. The default speeches are from the Speech section in Control Panel and include such information as the voice (if more than one is available on your system), and the language (English, Japanese, etc.).

hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice);

Let’s understand it’s parameters.

rclsid

The CLSID associated with the data and code that will be used to create the object.

pUnkOuter

If NULL indicates that the object is not being created as part of an aggregate. If non-NULL, pointer to the aggregate object’s IUnknown interface (the controlling IUnknown).

dwClsContext

The context in which the code that manages the newly created object will run. The values are taken from the enumeration CLSCTX.

riid

A reference to the identifier of the interface to be used to communicate with the object.

ppv

Address of pointer variable that receives the interface pointer requested in riid. Upon successful return, *ppv contains the requested interface pointer. Upon failure, *ppv contains NULL.

Now we have to do our actual task which is to speak. Speaking is a simple single line task and we have to call speak out of our voice object.

if( SUCCEEDED( hr ) )
{ getline(wcin,input);
hr = pVoice->Speak(input.c_str(), 0, NULL);
pVoice->Release();
pVoice = NULL;
}

Read more about parameters of Speak here

Combining all we get

#include <sapi.h>
#include<iostream>
#include <string>
using namespace std;
int main()
{
ISpVoice* pVoice=NULL;
HRESULT hr;
wstring input;
a=CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if (FAILED(a))
{
cout << "ERROR 404 FAILED INITIALIZING COM\n";
exit(1);
}
HRESULT CoInitializeEx(LPVOID pvReserved, DWORD dwCoInit);
hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice);

if( SUCCEEDED( hr ) )
{ getline(wcin,input);
hr = pVoice->Speak(input.c_str(), 0, NULL);
pVoice->Release();
pVoice = NULL;
}
return 0;
}

Let’s do it in the object-oriented approach

Our File structure is as follows:

Header Files

  • Basic Voice.h
  • Female Voice.h
  • Male Voice.h

Source Files

  • BasicVoice.cpp
  • FemaleVoice.cpp
  • TTS.cpp
  • Malevoice.cpp

Basic Voice.h is going to be our base class. Let’s look in it

#pragma once#include <sapi.h>#include<iostream>#include <string>using namespace std;class BasicVoice{protected:int choice;ISpVoice* pVoice;HRESULT hr,a;wstring input;public:BasicVoice() {pVoice = NULL;input = L"";a=CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);//HRESULT CoInitializeEx(LPVOID pvReserved, DWORD dwCoInit);if (FAILED(a)){cout << "ERROR 404 FAILED INITIALIZING COM\n";exit(1);}hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice);}virtual void setSpeech();virtual void byeSpeech() = 0;virtual void outSpeech();virtual ~BasicVoice() {::CoUninitialize();delete pVoice;}};

In our base class, we initialize all the variables. Let’s look in BasicVoice.cpp

#include "BasicVoice.h"//pure abstract class so empty functionsvoid BasicVoice::setSpeech(){}void BasicVoice::byeSpeech(){}void BasicVoice::outSpeech(){}

MaleVoice.h

#pragma once#include "BasicVoice.h"class MaleVoice :public BasicVoice{public:void setSpeech();void outSpeech();void byeSpeech();};

Malevoice.cpp

#include "MaleVoice.h"void MaleVoice::setSpeech(){if (SUCCEEDED(hr)){cout << "Enter text:\n";cin.ignore(1, '\n');getline(wcin, input);}else{cout << "NOt Initalized";exit(-1);}system("cls");cout << "At What Speed you want to Play your Voice\n1 for Normal \n2 for -2x\n3 for 2x";cin >> choice;if (choice == 2)hr = pVoice->Speak((L"<rate absspeed='-2'>" + input).c_str(), 0, NULL);else if (choice == 3)hr = pVoice->Speak((L"<rate absspeed='2'>" + input).c_str(), 0, NULL);elsehr = pVoice->Speak(input.c_str(), 0, NULL);}void MaleVoice::outSpeech(){pVoice->Release();pVoice = NULL;::CoUninitialize();}void MaleVoice::byeSpeech(){}

Notice that we are setting speeds. There is a lot of functionality that can be added via XML tags. Have a complete look here.

FemaleVoice.h

#pragma once#include "BasicVoice.h"class FemaleVoice : public BasicVoice{public:void setSpeech();void outSpeech();void byeSpeech();};

FemaleVoice.cpp

#include "FemaleVoice.h"void FemaleVoice::setSpeech(){if (SUCCEEDED(hr)){cout << "Enter text:\n";cin.ignore(1,'\n');getline(wcin, input);}cout << "At What Speed you want to Play your Voice\n1 for Normal \n2 for -2x\n3 for 2x";cin >> choice;if (choice == 2)hr = pVoice->Speak((L"<rate absspeed='-2'><voice required='Gender = Female;'>" + input).c_str(), 0, NULL);else if (choice == 3)hr = pVoice->Speak((L"<rate absspeed='2'><voice required='Gender = Female;'>" + input).c_str(), 0, NULL);elsehr = pVoice->Speak((L"<voice required='Gender = Female;'>" + input).c_str(), 0, NULL);}void FemaleVoice::outSpeech(){pVoice->Release();pVoice = NULL;::CoUninitialize();}void FemaleVoice::byeSpeech(){if (SUCCEEDED(hr)){hr = pVoice->Speak(L"<voice required='Gender = Female'> < rate absspeed = '-5' > Bhut Shukria Sir", 0, NULL);}}

Here we added female voice option via using XML tags.

TTS.cpp

#include "BasicVoice.h"#include "MaleVoice.h"#include "FemaleVoice.h"int main(){BasicVoice* b1 = NULL;b1 = new MaleVoice;int choice;do {cout << "1 to Output in Male Voice \n2 to Output in Female Voice\n";cin >> choice;switch (choice){case 1:b1 = new MaleVoice; //  we create a new malevoice object.b1->setSpeech();b1->outSpeech();delete b1; //after outputing that voice , we delete that objectbreak;case 2:b1 = new FemaleVoice;// we create a new femalevoiceb1->setSpeech();b1->outSpeech();delete b1;//after outputing that voice , we delete that objectbreak;case 3:b1 = new FemaleVoice;b1->byeSpeech();b1->outSpeech();delete b1;break;default:break;}} while (choice != 3);system("pause");return 0;}

Here we used a menu-driven approach to take the user’s input and produce them in the voice format.

Look for the complete organized code here.

Let me know in the comments if you succeeded in implementing Text to speech in C++.

Connect with me on Twitter.

Leave your feedback in Comments.

--

--

Ahmad Anis
Analytics Vidhya

Deep Learning at Roll.ai, Researcher at Data Providence Initiative, Community Lead at Cohere for AI