Convert PDF File Text to Audio Speech and vice versa using Python

Rahul Patodi
Wiki Flood
Published in
4 min readFeb 27, 2024

Introducing an innovative Python-based project that seamlessly converts PDF text to audio speech and vice versa. This user-friendly tool combines the power of text-to-speech and speech-to-text functionalities and provides a versatile solution for individuals seeking efficient document accessibility.

With easy-to-use python scripts users can convert written content into spoken words or transform spoken words into text. It enhances the reading experience or transcribes spoken content effortlessly using this converter. It is useful to bridge the gap between written and spoken communication.

Convert PDF File Text to Audio Speech and vice versa using Python
Convert PDF File Text to Audio Speech and vice versa using Python

About Python PDF File Text to Audio Speech and Audio Speech to PDF File Text

This Python project converts text of a PDF file into audio speech and vice versa, transforming audio speech into text within a PDF file. This versatile project utilizes Python to enhance accessibility and communication, offering both text-to-speech and speech-to-text functionality for PDF documents.

Modules

pdfminer

It is a python library for extracting text, images and metadata from PDF files. It enables efficient parsing and analysis and makes it a valuable tool for extracting structured data from PDF documents programmatically.

gtts

Google-Text-To-Speech (gtts) is a python package that interfaces with google translate’s text-to-speech API. It allows developers to easily convert text into spoken words and provides natural-sounding audio output.

Speech Recognition

It is a python package that interfaces with various speech recognition APIs including Google Web Speech API. It enables developers to convert spoken language into text, facilitating voice-controlled applications and audio transcription with ease.

Prerequisites for Python Project

Proficiency in advanced Python along with a compatible system is essential for maximizing this tool’s performance.

  • Python 3.7 (64-bit) and above
  • Any python editor (VS code, Pycharm)

Installation

Open windows cmd as administrator

  1. Install the gtts.
pip install gtts

2. Install the speech_recognition.

pip install speech_recognition

3. Install the pdfminer.

pip install pdfminer

Let’s Implement

  1. Import necessary libraries.
from pdfminer.high_level import extract_text
from gtts import gTTS
import speech_recognition as sr
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import os

2. This function employs a PDF library to extract text from a designated file path. It returns the text or handles exceptions, signaling failure with None.

def text_from_pdf(pdf_path):
try:
text = extract_text(pdf_path)
return text
except Exception as e:
print(f"Error: {e}")
return None

3. This line of code utilizes the gTTS library to convert input text into an MP3 file,and specify language. It handles exceptions and returns the output file path or None on failure.

def text_to_speech(text, output_file='output.mp3', language='en'):
try:
tts = gTTS(text=text, lang=language, slow=False)
tts.save(output_file)
return output_file
except Exception as e:
print(f"Error: {e}")
return None

4. It employs the SpeechRecognition library to capture and convert audio from a microphone to text using Google’s speech recognition API.

def speech_to_text():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Speak Now...")
recognizer.adjust_for_ambient_noise(source)
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio)
print(f"Text from speech: {text}")
return text
except sr.UnknownValueError:
print("Could not understand audio")
return None
except sr.RequestError as e:
print(f"Error: {e}")
return None

5. This function writes input text to a PDF file. It returns the output file path or None on the failure.

def text_to_pdf(text, output_file='output.pdf'):
try:
with open(output_file, 'w') as file:
file.write(text)
print(f"Text saved to PDF: {output_file}")
return output_file
except Exception as e:
print(f"Error: {e}")
return None

6. Specify the pdf file path here.

pdf_path = 'sample.pdf'

7. This loop presents a menu for PDF and speech operations. It extracts text from PDF, converts it to audio or converts speech to PDF. It utilizes defined functions. The loop exits upon user choice.

while True:
print("\nChoose an option at Flood:")
print("1. PDF Text to Audio")
print("2. Audio Text to PDF")
print("3. Exit")
choice = input("Choice: 1/2/3: ")
if choice == '1':
user_extracted_text = text_from_pdf(pdf_path)
if user_extracted_text:
user_output_file_audio = text_to_speech(user_extracted_text)
if user_output_file_audio:
os.system(f"start {user_output_file_audio}")
elif choice == '2':
extracted_text_from_speech = speech_to_text()
if extracted_text_from_speech:
output_file_pdf = text_to_pdf(extracted_text_from_speech)
if output_file_pdf:
print(f"Text converted from speech to PDF. Output file: {output_file_pdf}")
elif choice == '3':
print("Exiting the program. Thank you!")
break
else:
print("Invalid choice. Please enter 1, 2, or 3.")

Convert PDF File Text To Audio Speech And Vice Versa Using Python Output

Convert PDF File Text To Audio Speech And Vice Versa Using Python Output
Python Convert PDF File Text To Audio Speech And Vice Versa Output
python Convert PDF File Text To Audio Speech And Vice Versa project
python Convert PDF File Text To Audio Speech And Vice Versa project output

Video Output

Convert PDF File Text To Audio Speech And Vice Versa Using Python Video Output
Convert PDF File Text To Audio Speech And Vice Versa Using Python Video Output

Conclusion

In conclusion, Python offers a user-friendly approach to transforming PDF text into audio speech and vice versa. Leveraging tools like pdfminer and gTTS developers can effortlessly implement these features in their projects promoting accessibility and interactivity.

Python’s adaptability makes it an excellent choice for those seeking efficient text-to-speech and speech-to-text capabilities. This versatile solution not only simplifies the conversion process but also opens the door to diverse applications where seamless integration of written and spoken content is key.

--

--