Automated Reader — Reading Printed Text Out Loud

5 min readSep 1, 2021

“Daddy, read me a story, please?”

That’s something my sons ask me to to quite often. And most days, I’m happy to do just that. But occasionally, work gets in the way and I’m exhausted by day’s end. One day, I was explaining to my oldest how tired I was, and he said, “If only we had a Luka Reads!” He proceeded to introduce me to this device he read about in an issue of Young Scientist. I was intrigued, and thought it was a device that can read arbitrary texts. After learning more about it, I realized that it is more like a repository of recorded audio, and the device scans images/barcodes/other identifying markers, to figure out what the text says.

By this point, the wheels in my head were spinning in overdrive. Can I build a reading device that does what I thought the Luka does? Can I put together a Raspberry Pi OCR + text-to-speech device that can take images of a page and read it back? You betcha.

Getting Started

To begin with, I should mention that I have a whole bunch of Raspberry Pis lying around, and I also bought something like 10–20 microSD cards just for projects like these — each project gets its own SD card because I often just copy the instructions from the internet, making tweaks along the way to get things to work. As a result, it often happens that I don’t want to risk changing any installed bits that can break other projects. It’s pretty primitive, I know, but this is a hobby it’s supposed to be fun! Anyway, I picked a Pi 2 for this project (big mistake, as you’ll soon see). I installed Raspbian Stretch (omg I know it’s so old!) on a 32GB SD card for this project.

I did the usual updates (apt-get update, upgrade), then attempted to follow the instruction from here to install the Tesseract OCR onto the Pi. Unfortunately, the instructions are pretty dated, which resulted in a lot of issues installing OpenCV. After some back and forth (and updating pip with sudo pip install --upgrade pip ), I got everything installed as written except:

sudo pip3 install opencv-contrib-python libwebp6

which was changed to

sudo pip3 install opencv-contrib-python==3.4.4.19

libwebp6 throws an error but I don’t think it is needed? So I took it off.

I tried to install pytesseract with

sudo pip3 install pytesseract

but it didn’t work. I looked back at the version history and thought maybe it’s a problem with the versions. So, I went and installed the one current when the instructions above were written (2019), which was 0.30. “Pillows” was a dependency but installing it resulted in an out-of-memory error. I figured perhaps the Pi 2 was a bit too old for the job (which I had suspected, but my Pi 3 was hooked up to another project, so…..). Anyway, I swapped out the board, and it installed with nary a hitch.

Reading Text

Reading text with pytesseract is easy — simply pipe the frames from the picamera into the pytesseract and you end up with text. The python script grabs the image when it detects a keypress (in this case, the “S” key), and the text comes out in about 2–3 seconds. Unfortunately, the text tended to be a little garbled, which is a combination of lighting conditions, camera quality, and whether the OCR can make sense of the text. Nevertheless, it reproduces the results decently, so I left it at that for now.

Image (left) and text that is pulled out (right).

Reading Out Loud

For the text-to-speech portion, I used the handy instructions from here, which utilized the espeak engine. For some reason, the command

espeak "Text" 2>/dev/null

did not work for me. Instead, after the text, I had to use

--stdout|aplay

instead. This was a previously reported issue. Again, this is a quick hack so I didn’t bother to fix it. The Pi can now speak!

Putting it Together… Sort Of…

I took the scripts from those two sites and just blended them together. I pressed “S”, and…. nothing. Absolutely nothing. The camera went dark, the program froze. Sometimes, it would throw an “Unterminated quoted string” error. I was quite frustrated, but decided to play around with the settings a bit. After a while, I realized that the garbled text is what’s causing me issues — the stray punctuation marks were wreaking havoc on the text-to-speech engine. I took away all the “\n” and “\r” symbols, replacing them with different numbers of the underscore mark (which denote pauses in the engine). This got it working occasionally, but not all the time. So, I decided to remove all the punctuation marks, using a little replacement script from here. With that, the program works almost all the time now. And since Pytesseract works best with black and white images, I added a setting for the camera to force it to take black and white images. Lastly, I looked up the documentation for the speech engine, and adjusted the reading speed to something my kids would like. Here’s the script:

import cv2 
import pytesseract
from picamera.array import PiRGBArray
from picamera import PiCamera
from num2words import num2words
from subprocess import callcamera = PiCamera()
camera.resolution = (640, 480)
camera.color_effects = (128, 128) #set to black and white
camera.framerate = 30rawCapture = PiRGBArray(camera, size=(640, 480))cmd_beg= 'espeak -s100 -g12 '
cmd_end= ' --stdout|aplay' # To play backfor frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
        image = frame.array
        cv2.imshow("Frame", image)
        key = cv2.waitKey(1) & 0xFF
        
        rawCapture.truncate(0)if key == ord("s"):
                text = pytesseract.image_to_string(image)
                print(text)
                with open('readme.txt', 'w') as f:
                        f.write(text)
                #Replacing ' ' with '_' to identify words in the text entered
                text = text.replace(' ', '_')
                #Replacing newline with '___' to identify words in the text entered
                text = text.replace('\n', '___')
                #Replacing carriage return with '__' to identify words in the text entered
                text = text.replace('\r', '__')
                
                # initializing punctuations string
                punc = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
 
                # Removing punctuations in string
                # Using loop + punctuation string
                for ele in text:
                    if ele in punc:
                        text = text.replace(ele, "_")
 
                print(text)
                cv2.imshow("Frame", image)
                #Calls the Espeak TTS Engine to read aloud a Text
                call([cmd_beg+text+cmd_end], shell=True)
                cv2.waitKey(0)
                breakcv2.destroyAllWindows()

What’s Next

Next up, I will try to make a little frame and lighting setup for the device, and also put in a push button using the GPIO so the user can tap on it to read the page out loud. I will update a ridiculous video of the device (it’s really hokey) very soon! This was a fun little 2-hour build.

(Update 03–09–2021: here it is!)