Deep Dive in Detecting Labels, Faces, and Landmarks in Images with the Cloud Vision API

Roya
5 min readAug 15, 2023

--

Photo by Antoine Beauvillain on Unsplash

This instruction should be used in parallel with the Detecting Labels, Faces, and Landmarks in Images with the Cloud Vision API lab in the Computer Vision Fundamentals with Google Cloud course.

Lab objectives

In this lab, the user will learn how to perform the following tasks:

  • Create a Vision API request and call the API with curl.
  • Use the vision API's label, face, and landmark detection methods.

Task 0. Setup and requirements

Activate Cloud Shell:

As the lab instruction suggests, the developer shall click the Activate Cloud Shell button on the top right of the console. To enable the Cloud Shell, the developer should click on Authorize.

Next is to list the active accounts using:

gcloud auth list

This command in the Google Cloud Platform (GCP) lists all accounts currently authenticated to the GCP (Command Line Interface) CLI.

Next is to list the project ID with this command:

gcloud config list project

This command in the Google Cloud Platform (GCP) lists the project currently active in the GCP CLI. The active project is the project that will be used for all subsequent GCP commands unless the user explicitly specify a different project.

Task 1, Create an API key:

An API key is a unique identifier that is used to authenticate requests to an API. API keys are typically used to control API access and track usage. In this project, Vision API will be used, where an API key is needed.

As specified in the instruction, to create an API key, from the Navigation menu, the user must go to APIs & Services > Credentials in the Cloud Console. Click Create Credentials and select the API key. Save the API key in a note for future use.

To avoid copying the API key, the instruction advised the developer to create an environment variable.

export API_KEY=<YOUR_API_KEY>

In GCP, exportis a command to assign a value to a variable. Here in this example, the value of the e YOUR_API_KEYis exported to the variableAPI_KEY.

Task 2. Upload an image to the Cloud Storage bucket

As explained in the instruction, the next step is to build a bucket to store the images that will be later used for Vision API. To create a bucket, select Cloud Storage > Buckets from the Navigation menu. Next to Buckets, click Create. For the bucket’s name, a unique name must be chosen. The project ID is a unique name. It is essential to uncheck Enforce public access prevention on this bucket and select the Fine-grained circle.

By default, when creating a new bucket in Google Cloud Storage (GCS), Enforce public access prevention will be checked. This means that no one will be able to access the bucket or its contents unless those for whom the user explicitly granted access.

However, if the user only wants to give specific users or service accounts access to the bucket, they can uncheck Enforce public access prevention and select the Fine-grained circle. This will allow the user to control who has access to the bucket and its contents more granularly.

If this was not done at the time of creation, to uncheck Enforce public access prevention and select the Fine-grained circle after building a bucket in GCS, the following steps can be taken.

  1. Go to the Cloud Storage browser:
  2. Click the bucket that must be edited.
  3. Click the Settings tab.
  4. In the Public access prevention section, uncheck Enforce public access prevention.
  5. Select the Fine-grained circle.
  6. Click Save.

The next step is to upload an image and modify its access to allow it to be accessible by the public.

Task 3. Create the request

After uploading an image, the user can create a Cloud Vision API request and pass the URL to access the uploaded image in the bucket.

To do that, first, a request file must be created in the Cloud Shell.

echo '
{
"requests": [
{
"image": {
"source": {
"gcsImageUri": "gs://my-bucket-name/donuts.png"
}
},
"features": [
{
"type": "LABEL_DETECTION",
"maxResults": 10
}
]
}
]
}
' > request.json

This code is a JSON file that defines a request to the Google Cloud Vision API. The request specifies that the API should detect labels in the image located at gs://my-bucket-name/donuts.png and return up to 10 results.

The JSON file is created using the echo command, which prints the specified text to the standard output. The > operator redirects the output of the echo command to the file request.json.

The requests array contains a single request object. The image object specifies the image that the API should process. The source object specifies the location of the image. In this case, the image is located in the Google Cloud Storage bucket my-bucket-name.

The features array specifies the types of features that the API should detect in the image. In this case, the API should detect labels in the image. The maxResults property specifies the maximum number of labels that the API should return.

Task 4. Label detection

To define the labels, the user shall send the request to the Cloud Vision API using the following command.

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

This code is a curlcommand that sends a POSTrequest to the Google Cloud Vision API. The request specifies that the API should detect labels in the image located at gs://my-bucket-name/donuts.png and return up to 10 results.

Here, curl is a command-line tool that allows the user to transfer data from a URL to a local file or vice versa. The -sflag tells curl to be silent, which means that it will not print any output to the console. The -X POST flag tells curl to send a POST request. The -H "Content-Type: application/json" header specifies that the content type of the request is JSON.

The --data-binary option in this command tells curl to send the request body as binary data. The request.json option specifies that the request body is the contents of the file request.json.

The https://vision.googleapis.com/v1/images:annotate?key=${API_KEY } calls the vision API and provides the API_KEY . The URL is the endpoint for the Google Cloud Vision API’s image annotation method.

At this point, the user should be able to follow along with the lab with a deeper understanding of the commands and code that are used to request annotation for a variety of images. This post is written to serve as a guide for those who are starting their ML journey with limited knowledge of GCP.

Disclaimer: To write this post, a variety of tools have been utilized, including but not limited to Bard, ChatGPT, and Grammarly.

--

--

Roya

Enthusiastic young professional with a love for science, women's rights, and photography. Roya means "a sweet dream" in Farsi.