From phone screen to console — Using Google Cloud Vision to read text from your phone.

Published in

Google Cloud - Community

4 min readMay 27, 2018

I wonder if I can make something that will do that for me? This common question is usually the prelude to a new hobby project for me.

A few months ago, I found myself manually copying text from my phone, and it was tedious. This made me wonder if I could build a system to expedite this process.

This guide will outline the steps I took to take a screenshot from my phone, prepare the image, and setup/use Google Cloud Platform’s vision API client. I will be using Python 3.x and a Macbook.

If you need to get text off of your phone, this is one way to do it.

Mirroring your phone.

First things first, lets get your phone screen mirrored on your computer.

To do this, I use QuickTime. Plug your phone into your computer and launch QuickTime.

From the dock right click on the QuickTime app and select “New Movie Recording”.

Make sure your phone is unlocked, otherwise the next part will say that it is unable to proceed.

Your phone should now be mirrored, sometimes this takes a minute.

I always keep the mirror in the top left hand corner of my screen to ensure the cropping we do later will be aligned properly.

Getting your screenshot.

To take a screenshot, I use the pyscreenshot module.

pip3 install pyscreenshot

The following demonstrates how to take a screenshot and crop the image to pull out the text of interest. In my case, the text always appears in the same location on the screen, so I only needed to figure out where to crop it once.

import pyscreenshot as ImageGrab# I found these values through lots of trial and errorx1 = ...  # top left corner x
y1 = ...  # top left corner y
x2 = ...  # bottom right corner x
y2 = ...  # bottom right corner yimage = ImageGrab.grab()
cropped_image = image.crop((x1, y1, x2, y2))# Save the image to a file to be used later
cropped_image.save("current_image.bmp")

Preparing Google Cloud Platform.

I’m going to assume that you already have a GCP account. If not, you can sign up for a free $300 credit here.

After you’ve signed in, follow this link and from the dropdown select “Create a Project” and then continue. This will create a new project, named “My Project”, and enable the Google Cloud Vision API on this project.

Next, you need set up the authentication. Follow this link to create a key.

When setting up the key, you do not need to specify a role. When prompted, select continue without a role.

Ensure that JSON is selected at the bottom and click proceed, a JSON file will begin to download. You will need to store this file somewhere safe.

Caution! This JSON file includes sensitive information. If you choose to place it in the same directory as your project, ensure to include it in your .gitignore file if you plan on uploading your project to a public repository.

Performing our API call.

We start by getting the required dependencies.

pip3 install --upgrade google-api-python-client
pip3 install --upgrade google-cloud
pip3 install --upgrade google-cloud-vision

Next, we want to set an environmental variable indicating the location of the authentication file we just generated.

import osos.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/json/file"

Finally, we load the cropped image that we saved from a screenshot earlier, and use it in our API call.

import io
import osfrom google.cloud import vision
from google.cloud.vision import types# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file
file_name = os.path.join(os.path.dirname(__file__),   
      'current_image.bmp')

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = types.Image(content=content)

response = client.text_detection(image=image)

Note: you may receive the following error, “403 This API method requires billing to be enabled”
If so, follow the link provided, click OK when prompted, and wait a few minutes. Then run the script again.

What about the response?

Getting the text from our vision API response is simple.

entire_text = response.text_annotations[0].description

This will store the text, as read left-to-right and top-to-bottom, in the entire_text variable to be used however you please.

Putting it all together.

Now all you need to do is open a terminal window beside your mirrored phone, and run this script whenever you wish to grab the text on your screen.

import io
import os

import pyscreenshot as ImageGrabfrom google.cloud import vision
from google.cloud.vision import types

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/json"x1 = ...  # top left corner x
y1 = ...  # top left corner y
x2 = ...  # bottom right corner x
y2 = ...  # bottom right corner yimage = ImageGrab.grab()
cropped_image = image.crop((x1, y1, x2, y2))# Save the image to a file to be used later
cropped_image.save("current_image.bmp")# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file
file_name = os.path.join(os.path.dirname(__file__), 'current_image.bmp')

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = types.Image(content=content)

response = client.text_detection(image=image)
entire_text = response.text_annotations[0].description
print(entire_text)

I hope this guide was informative and can save you some time when automating!