Deep Dive in Detecting Labels, Faces, and Landmarks in Images with the Cloud Vision API
This instruction should be used in parallel with the Detecting Labels, Faces, and Landmarks in Images with the Cloud Vision API lab in the Computer Vision Fundamentals with Google Cloud course.
Lab objectives
In this lab, the user will learn how to perform the following tasks:
- Create a Vision API request and call the API with
curl
. - Use the vision API's label, face, and landmark detection methods.
Task 0. Setup and requirements
Activate Cloud Shell:
As the lab instruction suggests, the developer shall click the Activate Cloud Shell button on the top right of the console. To enable the Cloud Shell, the developer should click on Authorize.
Next is to list the active accounts using:
gcloud auth list
This command in the Google Cloud Platform (GCP) lists all accounts currently authenticated to the GCP (Command Line Interface) CLI.
Next is to list the project ID with this command:
gcloud config list project
This command in the Google Cloud Platform (GCP) lists the project currently active in the GCP CLI. The active project is the project that will be used for all subsequent GCP commands unless the user explicitly specify a different project.
Task 1, Create an API key:
An API key is a unique identifier that is used to authenticate requests to an API. API keys are typically used to control API access and track usage. In this project, Vision API will be used, where an API key is needed.
As specified in the instruction, to create an API key, from the Navigation menu, the user must go to APIs & Services > Credentials in the Cloud Console. Click Create Credentials and select the API key. Save the API key in a note for future use.
To avoid copying the API key, the instruction advised the developer to create an environment variable.
export API_KEY=<YOUR_API_KEY>
In GCP, export
is a command to assign a value to a variable. Here in this example, the value of the e YOUR_API_KEY
is exported to the variableAPI_KEY
.
Task 2. Upload an image to the Cloud Storage bucket
As explained in the instruction, the next step is to build a bucket to store the images that will be later used for Vision API. To create a bucket, select Cloud Storage > Buckets from the Navigation menu. Next to Buckets, click Create. For the bucket’s name, a unique name must be chosen. The project ID is a unique name. It is essential to uncheck Enforce public access prevention on this bucket and select the Fine-grained circle.
By default, when creating a new bucket in Google Cloud Storage (GCS), Enforce public access prevention will be checked. This means that no one will be able to access the bucket or its contents unless those for whom the user explicitly granted access.
However, if the user only wants to give specific users or service accounts access to the bucket, they can uncheck Enforce public access prevention and select the Fine-grained circle. This will allow the user to control who has access to the bucket and its contents more granularly.
If this was not done at the time of creation, to uncheck Enforce public access prevention and select the Fine-grained circle after building a bucket in GCS, the following steps can be taken.
- Go to the Cloud Storage browser:
- Click the bucket that must be edited.
- Click the Settings tab.
- In the Public access prevention section, uncheck Enforce public access prevention.
- Select the Fine-grained circle.
- Click Save.
The next step is to upload an image and modify its access to allow it to be accessible by the public.
Task 3. Create the request
After uploading an image, the user can create a Cloud Vision API request and pass the URL to access the uploaded image in the bucket.
To do that, first, a request file must be created in the Cloud Shell.
echo '
{
"requests": [
{
"image": {
"source": {
"gcsImageUri": "gs://my-bucket-name/donuts.png"
}
},
"features": [
{
"type": "LABEL_DETECTION",
"maxResults": 10
}
]
}
]
}
' > request.json
This code is a JSON file that defines a request to the Google Cloud Vision API. The request specifies that the API should detect labels in the image located at gs://my-bucket-name/donuts.png
and return up to 10 results.
The JSON file is created using the echo
command, which prints the specified text to the standard output. The >
operator redirects the output of the echo
command to the file request.json
.
The requests
array contains a single request object. The image
object specifies the image that the API should process. The source
object specifies the location of the image. In this case, the image is located in the Google Cloud Storage bucket my-bucket-name
.
The features
array specifies the types of features that the API should detect in the image. In this case, the API should detect labels in the image. The maxResults
property specifies the maximum number of labels that the API should return.
Task 4. Label detection
To define the labels, the user shall send the request to the Cloud Vision API using the following command.
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}
This code is a curl
command that sends a POST
request to the Google Cloud Vision API. The request specifies that the API should detect labels in the image located at gs://my-bucket-name/donuts.png
and return up to 10 results.
Here, curl
is a command-line tool that allows the user to transfer data from a URL to a local file or vice versa. The -s
flag tells curl
to be silent, which means that it will not print any output to the console. The -X POST
flag tells curl to send a POST request. The -H "Content-Type: application/json"
header specifies that the content type of the request is JSON.
The --data-binary
option in this command tells curl to send the request body as binary data. The request.json
option specifies that the request body is the contents of the file request.json
.
The https://vision.googleapis.com/v1/images:annotate?key=${API_KEY }
calls the vision API and provides the API_KEY
. The URL is the endpoint for the Google Cloud Vision API’s image annotation method.
At this point, the user should be able to follow along with the lab with a deeper understanding of the commands and code that are used to request annotation for a variety of images. This post is written to serve as a guide for those who are starting their ML journey with limited knowledge of GCP.
Disclaimer: To write this post, a variety of tools have been utilized, including but not limited to Bard, ChatGPT, and Grammarly.