Using Google Cloud Vision API with Golang
(Exploring LABEL_DETECTION and TEXT_DETECTION)
Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy to use REST API.
The Cloud Vision API is an easy to use REST API that uses HTTP POST operations to perform data analysis on images you send in the request. The API uses JSON for both requests and responses. A typical Vision API JSON request includes the contents of image(s) on which to perform detection, and a set of operations (called features) to run against each image.
Of the many feature that the API gives us, we are going to explore “LABEL_DETECTION”. A “LABEL_DETECTION” request annotates an image with a label (or “tag”) that is selected based on the image content. For example, a picture of a barn may produce a label of “barn”, “farm”, or some other similar annotation. A label request is one of the most common use cases for the Vision API. A “TEXT_DETECTION” request finds and reads printed words contained within images.
The Google Cloud Vision API is in general availability and there is a free tier, where you are allowed 1,000 units per Feature Request per month free. Beyond that there is a tiered pricing model based on the number of units that you use in a month.
My friend Romin Irani has written an excellent article “How to Build a Monitoring Application With the Google Cloud Vision API” using Python. Please refer to pages 2, 3 and 4 of this article and execute the steps 1 to 4. These are necessary to use the Google Cloud Vision API.
After you have completed steps 1 to 4, let us start writing our Go code.
Get Dependencies
Our Go program depends on the following packages. Before getting started, be sure to get them.
go get -u golang.org/x/net/context golang.org/x/oauth2/google google.golang.org/api/vision/...
label.go
I have created a folder “cvision” which will hold my Go source code “label.go”, “text.go” and an image “dog.jpg” to analyze. The Go code for “label.go” is the same as the sample code for Google Cloud Vision with minor changes.
C:\go_projects\go\src\github.com\SatishTalim\cvision

We shall be running our program at the command prompt in the folder “cvision” as follows:
go run label.go dog.jpg
First draft of our program “label.go”
Let us understand the program so far.
First “import” the libraries necessary to run the program.
The package “flag” implements command-line flag parsing. Usage is a variable that holds a function. It is called when an error occurs while parsing flags. Let us run the program written so far as:
go run label.go
Usage: label.exe <path-to-image>
exit status 1
I get the above error since I did not give the name of the image after “label.go”.
Package “os” provides a platform-independent interface to operating system functionality. “Stderr” points to the standard error file.
Package “filepath” implements utility routines for manipulating filename paths in a way compatible with the target operating system-defined file paths. “filepath.Base” returns the last element of path. Trailing path separators are removed before extracting the last element. If the path is empty, “Base” returns “.”
“flag.Parse” parses the command-line flags from os.Args[1:]
“flag.Args” returns the non-flag command-line arguments.
“os.Exit” causes the current program to exit with the given status code. Conventionally, code zero indicates success, non-zero an error. The program terminates immediately; deferred functions are not run.
Second draft of our program “label.go”
We pass the name of the image file to a function “run()”.
Authenticate your Service
Package “context” defines the Context type, which carries deadlines, cancellation signals, and other request-scoped values across API boundaries and between processes. “context.Background()” returns a non-nil, empty Context. It is never canceled, has no values, and has no deadline. It is typically used by the main function, initialization, and tests, and as the top-level Context for incoming requests. Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named “ctx”.
Before communicating with the Vision API service, you will need to authenticate your service using previously acquired credentials. Within an application, the simplest way to obtain credentials is to use Application Default Credentials (ADC). By default, ADC will attempt to obtain credentials from the GOOGLE_APPLICATION_CREDENTIALS environment variable, which should be set to point to your service account’s JSON key file (Step 4 of Romin Irani’s article).
Package “google” provides support for making OAuth2 authorized and authenticated HTTP requests to Google APIs. “google.DefaultClient” returns an HTTP Client that uses the “DefaultTokenSource” to obtain authentication credentials. It looks for credentials in a JSON file whose path is specified by the GOOGLE_APPLICATION_CREDENTIALS environment variable.
Package “vision” provides access to the Cloud Vision API. “vision.CloudPlatformScope” is a constant that can view and manage your data across Google Cloud Platform services.
“vision.New” returns a “Service”.
We now have a Vision API service with which we can make API calls.
Read the image and create a request, encoding the image in base64
We first read our image data into a variable. ioutil.ReadFile reads the whole file named by filename and returns the contents. Requests to the Google Cloud Vision API are provided as JSON objects. However, JSON does not support the transmission of binary data, so we will need to escape our binary data into text by encoding it in Base64. Variable “StdEncoding” is the standard base64 encoding. “EncodeToString(b)” returns the base64 encoding of b.
Currently, the Vision API consists of one collection (images) which supports one HTTP Request method (annotate). The annotate request passes a JSON request of type “AnnotateImageRequest”. An example is shown below:
“requests” — An array of requests, one for each image.
“image” — The image data for this request.
“features” — The array of features to detect for this image
“type” — The feature type example LABEL_DETECTION
“maxResults” — The maximum number of results to return for this feature type. The API can return fewer results.
“vision.AnnotateImageRequest” is a request for performing Vision tasks over a user-provided image, with user-requested features.
“vision.Image”: Client image to perform Vision tasks over.
“vision.Feature” indicates what type of image detection task to perform. Users describe the type of Vision tasks to perform over images by using Features. Features encode the Vision vertical to operate on and the number of top-scoring results to return. We use LABEL_DETECTION. In our case we ask for 5 results, which will be given to us in increasing order of probability.
Submit the Requests on Batched
“vision.BatchAnnotateImagesRequest” — Multiple image annotation requests are batched into a single service call.
“Annotate”: Run image detection and annotation for a batch of images. “Do” executes the “vision.images.annotate” call.
A POST request has now been made.
JSON Response Format
The “Annotate” request receives a JSON response of type “AnnotateImageResponse”. Although the requests are similar for each feature type, the responses for each feature type can be quite different.
“LabelAnnotations” if present, label detection completed successfully. “Description” is the Entity textual description.
Run the program
go run label.go dog.jpg
The output
Found label: pet, Score: 0.989092 for dog.jpg
Found label: dog, Score: 0.988883 for dog.jpg
Found label: mammal, Score: 0.962768 for dog.jpg
Found label: animal, Score: 0.954685 for dog.jpg
Found label: labrador retriever, Score: 0.941879 for dog.jpg
The results are not very accurate for some of the labels that it found but you can see where things stand today and the possibilities that this opens up.
text.go
This program is very much similar to “label.go” with some minor changes as shown below.
Run the program
go run text.go dog.jpg
The output
Found text: BENZY
You can now easily write a Go program that can use “LOGO_DETECTION” and “FACE_DETECTION”.
That’s it!