Using Go with an image and video recognition API from Clarifai

“Clarifai specializes in using deep learning algorithms for visual search. In short, it’s building software that will help you find photos — whether they’re on your mobile phone, a dating website, or on a corporate network — and it will sell this software to all sorts of other companies that want to roll it into their own online services.” — Wired magazine.

Create your account with Clarifai

They have many plans available including a free plan which is perfect to start experimenting with. Please note that all API calls require an account. Please create your account first.

The API is built around a simple idea. You send inputs (images) to the service and it returns predictions. The type of prediction is based on what model you run the input through.

API calls are tied to an account and application. After creating your account with Clarifai, you need to create an application. Head on over to the applications page and press the ‘CREATE NEW APPLICATION’ button. At a minimum, you’ll need to provide an application name and select the Base Workflow as General. You can create as many applications as you want and can edit or delete them as you see fit. Each application has a unique API key. These are used for authentication.


Authentication to the API is handled through API Keys. Select the application that you want to authorize using this key. An API Key cannot be used across multiple apps. All API access is over HTTPS, and accessed via the domain. The relative path prefix /v2/ indicates that we are currently using version 2 of the API.

To retrieve an Access Token, send a POST request to with your client_id and client_secret. You must also include grant_type=client_credentials


I have created a folder “clarifai” which will hold the Go source code “mycf.go”.


We shall be running our program at the command prompt in the folder “clarifai” as follows:

go run mycf.go

The code so far:


Let us understand the above code:

type TokenResp struct {
AccessToken string `json:”access_token”`
ExpiresIn int `json:”expires_in”`
Scope string `json:”scope”`
TokenType string `json:”token_type”`

The JSON response when you send a POST request to with your “client_id” and “client_secret” is encapsulated by the “TokenResp” structure.

func requestAccessToken() (string, error) {

The function “requestAccessToken()” returns a string which should be the AccessToken and “error”. “error” is Go’s predeclared identifier.

type error interface {
Error() string

The “error” built-in interface type is the conventional interface for representing an error condition, with the “nil” value representing no error.

type Values map[string][]string

“Values” maps a string key to a list of values. It is typically used for query parameters and form values.

func (v Values) Set(key, value string)

“Set” sets the key to value. It replaces any existing values.

func (v Values) Encode() string

“Encode” encodes the values into “URL encoded” form (“bar=baz&foo=quux”) sorted by key.

Package “strings” implements simple functions to manipulate UTF-8 encoded strings.

func NewReader(s string) *Reader

“NewReader” returns a new “Reader” reading from s.

req.Header.Set(“Content-Type”, “application/x-www-form-urlencoded”)

Any HTTP/1.1 message containing an entity-body SHOULD include a “Content-Type” header field defining the media type of that body. If you have binary (non-alphanumeric) data (or a significantly sized payload) to transmit, use multipart/form-data. Otherwise, use “application/x-www-form-urlencoded”.

When you run the program the output on the console is:

AccessToken = WS2bEYTrEeKxSQilf8JqmxWMmlqL7U

You would get a different value when you run the program again.

You can now use the “access_token” value to authorize your API calls.

Tag endpoint

The tag endpoint is used to tag the contents of our images or videos. Data is input into their system, processed with their deep learning platform and a list of tags is returned. Typical process times are in the milliseconds.

If you’d like to get tags for one image or video using a publicly accessible url, you may either send a GET or POST request. We shall use a GET request.

We are going to analyze an image at the URL — which has the following image:


Open your browser and type the following URL —

Replace 1P3LNShlwE1HpL2xd0ZLL2rrMKMDzz with your own “access_token”.

You should see something like this in your browser:

{"status_code": "OK", "status_msg": "All images in request have completed successfully. ", "meta": {"tag": {"timestamp": 1463478660.484501, "model": "general-v1.3", "config": "34fb1111b4d5f67cf1b8665ebc603704"}}, "results": [{"docid": 17763255747558799694, "url": "", "status_code": "OK", "status_msg": "OK", "local_id": "", "result": {"tag": {"concept_ids": ["ai_HLmqFqBf", "ai_fvlBqXZR", "ai_Xxjc3MhT", "ai_6kTjGfF6", "ai_RRXLczch", "ai_VRmbGVWh", "ai_SHNDcmJ3", "ai_jlb9q33b", "ai_46lGZ4Gm", "ai_tr0MBp64", "ai_l4WckcJN", "ai_2gkfMDsM", "ai_CpFBRWzD", "ai_786Zr311", "ai_6lhccv44", "ai_971KsJkn", "ai_WBQfVV0p", "ai_dSCKh8xv", "ai_TZ3C79C6", "ai_VSVscs9k"], "classes": ["train", "railway", "transportation system", "station", "train", "travel", "tube", "commuter", "railway", "traffic", "blur", "platform", "urban", "no person", "business", "track", "city", "fast", "road", "terminal"], "probs": [0.9989112019538879, 0.9975532293319702, 0.9959157705307007, 0.9925730228424072, 0.9925559759140015, 0.9878921508789062, 0.9816359281539917, 0.9712483286857605, 0.9690325260162354, 0.9687051773071289, 0.9667078256607056, 0.9624242782592773, 0.960752010345459, 0.9586490392684937, 0.9572030305862427, 0.9494642019271851, 0.940894365310669, 0.9399334192276001, 0.9312160611152649, 0.9230834245681763]}}, "docid_str": "76961bb1ddae0e82f683c2fd17a8794e"}]}

The struct for the above is:

type TagResp struct {
StatusCode string `json:”status_code”`
StatusMsg string `json:”status_msg”`
Meta struct {
Tag struct {
Timestamp float64 `json:”timestamp”`
Model string `json:”model”`
Config string `json:”config”`
} `json:”tag”`
} `json:”meta”`
Results []struct {
Docid uint64 `json:”docid”`
URL string `json:”url”`
StatusCode string `json:”status_code”`
StatusMsg string `json:”status_msg”`
LocalID string `json:”local_id”`
Result struct {
Tag struct {
ConceptIds []string `json:”concept_ids”`
Classes []string `json:”classes”`
Probs []float64 `json:”probs”`
} `json:”tag”`
} `json:”result”`
DocidStr string `json:”docid_str”`
} `json:”results”`

The complete program is:


When you run the program, the output is:

Tag0 = train
Tag1 = railway
Tag2 = transportation system
Tag3 = station
Tag4 = train
Tag5 = travel
Tag6 = tube
Tag7 = commuter
Tag8 = railway
Tag9 = traffic
Tag10 = blur
Tag11 = platform
Tag12 = urban
Tag13 = no person
Tag14 = business
Tag15 = track
Tag16 = city
Tag17 = fast
Tag18 = road
Tag19 = terminal

Have fun!

Like what you read? Give Satish Manohar Talim a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.