Python Project — Convert Speech to Text and Text to Speech

Published in

Wiki Flood

3 min readMar 19, 2024

In today’s digital landscape, the Text-to-Speech and Speech-to-Text converter stands as a versatile solution. Seamlessly translating text into spoken words and vice versa.

This tool breaks barriers between written and verbal communication. Its robust capabilities empower accessibility and productivity and enable efficient interaction across various platforms. It utilizes cutting-edge technology. It offers a seamless, user friendly experience and revolutionized how the information is processed and communicated.

Convert Speech to text and text to Speech in Python

Ambient Noise

Ambient Noise refers to background sounds present in an environment. It includes various natural and artificial sounds like chatter, machinery and traffic. When capturing audio, adjusting for ambient noise helps to improve speech recognition by minimizing interference and enhancing the accuracy of voice-based systems and recordings.

Prerequisites For Python Convert Speech to Text and Text to Speech

Proficiency in advanced Python along with a compatible system is essential for maximizing this tool’s performance.

Python 3.7 (64-bit) and above
Any python editor (VS code, Pycharm)

Installation

Open windows cmd as administrator

Install the gtts.

pip install gtts

2. Install the speech_recognition.

pip install speech_recognition

Python Convert Speech to Text and Text to Speech Implementation

Import necessary packages.

from gtts import gTTS
import os
import speech_recognition as sr

2. It converts the input text to speech and saves the audio in .mp3 format.

def text_to_speech():
    text = input("Enter the text:- ")
    tts = gTTS(text)
    tts.save("output.mp3")
    os.system("start output.mp3")

3. It utilizes speech recognition to transcribe spoken words from the microphone into text using google’s speech recognition API and it also calibrates the microphone by sampling ambient noise and the speech recognition accuracy.

def speech_to_text():
    recognizer = sr.Recognizer()
    with sr.Microphone() as source:
        print("Speak...")
        recognizer.adjust_for_ambient_noise(source, duration=0.2)
        audio = recognizer.listen(source)

    try:
        print("Recognizing...")
        text = recognizer.recognize_google(audio)
        print(f"You said: {text}")
    except sr.UnknownValueError:
        print("Sorry, could not understand audio.")
    except sr.RequestError as e:
        print(f"Error: {e}")

4. It creates a menu-driven interface offering options for text-to-speech and speech-to-text functionalities.

while True:
    print("Select an option:")
    print("1. text to speech")
    print("2. speech to text")
    print("3. Exit")
    
    choice = input("choice (1/2/3): ")

    if choice == '1':
        text_to_speech()
    elif choice == '2':
        speech_to_text()
    elif choice == '3':
        print("Exiting the program...")
        break
    else:
        print("Invalid choice. Please select a valid option.")

Python Convert Speech to Text and Text to Speech Output

Convert Speech to text and text to Speech in Python Output

Convert Speech to text and text to Speech in Python Project — Convert Speech to text and text to Speech

Python Convert Speech to text and text to Speech Output

Python Convert Speech to Text and Text to Speech Video Output

Convert Speech to text and text to Speech in Python Video Output

Conclusion

In conclusion, Text-to-Speech (TTS) and Speech-to-Text(STT) technologies have revolutionized communication, accessibility and user interaction. TTS empowers machines to convert written content into spoken words and enhances accessibility and user experience.

On the other hand STT enables seamless conversion of spoken language into written text and facilitates efficient transcription and communication. Together these advancements break down barriers and make the information more accessible. The convergence of these technologies heralds a future where communication knows no limitations.