Process Images With Google Cloud AI

Gunjan Garg
Google Cloud - Community
8 min readJan 26, 2022

Whether you are an enterprise or a startup, today almost every business has a stack of images that they need to process and derive value out of it. With the disruption of tech innovation and online penetration in our lives after the pandemic, businesses are adopting technology first approach. Therefore, agility and ability to operate at scale are becoming key parameters for success.

In this blog, I will discuss and try to answer following questions

  • Why and where do we need to process images beyond OCR?
  • What are the current challenges?
  • How can Google Cloud help?

Where we need to process images

To understand the need better let’s take a few examples from industries where we see potential use cases related to images.

1. Retail: This segment is majorly consumer driven. The rise of online shopping in every little commodity has not only increased the competition for retailers but has also given a tool to consumers to share their reviews and feedback.

Consumers sharing feedback about the product in the form of emojis, pictures or text is a valuable asset to the business. The sentiment of this feedback gives retailers or distributors an opportunity to understand their customers and their problems better. This allows businesses to assess the market and streamline the overall operations to achieve greater customer satisfaction.

Other important use cases from a retailer’s lens are product cataloging, product recommendations, safe and suggestive search etc.

2. Food Industry: Online food ordering has grown rapidly and has become a preferred choice for many of us. Businesses that work on a B2B model i.e., provide a platform for food sellers to sell online or that own a food chain, everyone deals with digital images.

Some popular use cases in this segment are menu digitisation, sentiment analysis, feedback value chain etc. For menu digitisation, local vendors or big restaurants can provide an image of their menu that acts as a source to onboard them on the digital platform quickly. Another use case emerges when you have a global presence and require multilingual capabilities.

3. Social Platforms: These platforms heavily support crowd-sourced content i.e., allow people to upload and share images. This is the new normal and the trend for millennials and genZ. While these platforms provide features that allow people to be socially connected and be more aware, at the same time platforms have to underpin many social obligations and regional compliances.

There is a use case to scan images and flag or redact out the explicit content like negative national sentiment, violence, pornography, abuse etc.

4. Publishers: Big publishing houses have a wealth of history preserved in boxes. This content resides in silos and is not searchable. Digitising that content can not only help us preserve our existence but can also help uncover and understand the history better. Same content can be used by researchers and can be used in universities as another source for revenue models.

Similar work has been done by the New York Times and is a very interesting case study. Check this link for more information.

5. Manufacturing: Digitisation in this segment is picking up pace and has a great potential. With the rise of smart devices, use cases are spinning from finding defects in machines, smart homes, smart cameras, and just going beyond autonomous cars.

In addition to the above list there are a bundle of use cases across industries like automobile, healthcare, security, gaming, government, military, media and entertainment etc. where we need image processing and analytics to derive value. There is OCR technology existing for decades that can bring out text from the image, but today the world is not limited to this. We need enriched metadata associated with the source content to have a better understanding of the data that helps make informed decisions.

Understanding current challenges

Now, as we have seen some use cases, let’s try to understand some existing challenges and complexity that AI/ML is trying to solve.

1. Scale and Agility: Whatever we have discussed so far, doing that at a scale for millions of requests in near real time is humanly not possible. Hence there is a direct need to have a technology that can provide solutions at scale. Just imagine an operations team scanning your images manually for a high scale social platform.

2. Cost: Having a dedicated team that does metadata tagging or scans all the images to remain compliant is a cost heavy model. Investing in the right technology that can reduce your operational overhead by a meaningful percent is the smarter choice.

3. Text extraction: Although there are many sophisticated OCR technology products that existing in the market, but complexity increases with

  • Hand written text
  • Images with different fonts
  • Crumbled or historic images
  • Images with segmented columns like we see in newspapers

4. Multilingual: As businesses going global, industries need to provide applications supporting multiple global languages. At the same time, with smartphones in everyone’s hand there is a huge segment that requires support of native languages as well.

5. Metadata extraction: To derive business value out of the image repository we require metadata that includes genre, objects, sentiment, colour, labels, text, faces etc. This contextual data helps industries make right choices.

How Google Cloud can help

Google has been a pioneer in providing AI/ML services to the world. Forrester Research has named Google Cloud a Leader in The Forrester Wave™: AI Infrastructure, Q4 2021 report authored by Mike Gualtieri and Tracy Woo.

Google cloud provides a set of AI/ML services that can be stitched together to solve complex use cases related to images. Let’s take a brief summary of these services and see how we can use them to build a solution.

Cloud Vision AI from Google Cloud is the core service for image analysis. This helps you derive insights from the images using pre-trained machine learning models through REST and RPC APIs. It quickly classifies images into millions of predefined categories (e.g., “fruit”, “dog”, “Eiffel Tower”), detects faces with associated emotions, and recognises printed words in many languages. Other interesting features include Logo detection, landmark detection and many more. One can build a valuable metadata library on their image catalog using Cloud Vision AI. It can analyse images uploaded on request or integrate with your image storage on Google Cloud Storage.

You can try this API just by uploading your image over here.

Once text is extracted out of the image, other text based AI services like Cloud Translate and Cloud Natural Language AI can be used to add dimension.

Cloud Translate uses Google’s neural machine translation technology to instantly detect source language and translate texts into more than one hundred languages. Google has a rich history of providing translation services with industry-leading accuracy.

Refer link for check supported languages.

Cloud Natural Language AI uses Google machine learning to derive insights from unstructured text including sentiment analysis, entity extraction, content classification, and syntax analysis.

For a quick demo click here

Cloud AutoML : Services leveraging AI/ML technologies are built to solve unique and complex problems which otherwise are difficult or cumbersome to handle by humans. Every image and use case is unique, hence there can be a need to build custom models that are industry or use case specific. To address this, Google Cloud has AutoML that enables developers to train high-quality models and require limited machine learning expertise.

Time to build something

Let’s take a use case where an application allows users to upload profile pictures. These pictures are scanned to identify explicit content. With the popularity of this application, demand is going unpredictable and spiky and hence a need for automation.

To implement this I have put together a high level schematic view with prototype using Google Cloud. This is completely event driven and is using fully managed serverless services. This is just a sample and might not be following all the best practices.

1. As a pre-requisite I am assuming that you have a basic understanding of Google Cloud platform. Make sure Cloud Build API and Cloud Vision API are enabled. Now configure your project from gCloud

$ gcloud config set project YOUR_PROJECT_ID

2. Create Cloud Storage buckets as specified in the diagram above.

You might not get these bucket names since GCS Bucket names are unique globally. In this case choose different ones and make further changes accordingly.

$ gsutil mb gs://project1-source-image$ gsutil mb gs://project1-safe-image$ gsutil mb gs://project1-flagged-image

3. Create source code files main.py and requirements.txt for Cloud Function

Cloud Vision — SafeSearch Detection feature detects explicit content such as adult content or violent content within an image. This feature uses five categories (adult, spoof, medical, violence, and racy) and returns the likelihood that each is present in a given image.

4. Deploy Cloud Function by executing below command

5. Test Code by uploading an image to the Cloud Storage bucket ‘project1-source-image’

Sample output 1: Image is moved to project1-flagged-image bucket

https://pixabay.com/photos/fist-aggression-abuse-1131143/
https://pixabay.com/photos/fist-aggression-abuse-1131143/
adult: VERY_UNLIKELYmedical: UNLIKELYspoofed: VERY_UNLIKELYviolence: POSSIBLEracy: POSSIBLE

Sample output 2: Image is moved to project1-safe-image bucket

https://www.whatsappprofiledpimages.com/wp-content/uploads/2021/08/Profile-Photo-Wallpaper.jpg
adult: VERY_UNLIKELYmedical: UNLIKELYspoofed: VERY_UNLIKELYviolence: VERY_UNLIKELYracy: VERY_UNLIKELY

6. Troubleshoot: Provide runtime Service Account (Optional)

If you get the following access error then continue else skip.

PROJECT_ID@appspot.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket.

Unless you specify a different runtime service account when deploying a function, Cloud Functions uses the App Engine default service account, PROJECT_ID@appspot.gserviceaccount.com, as its identity for function execution.

Note: Make sure you provide the required access to the correct service account.

$ gsutil iam ch serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com:roles/storage.admin gs://project1-source-image$ gsutil iam ch serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com:roles/storage.admin gs://project1-safe-image$ gsutil iam ch serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com:roles/storage.admin gs://project1-flagged-image

This is just a brief overview of a few services from the complete portfolio that Google Cloud has. There are potential use cases and problems to be solved around images. If you have any questions about what you have read in this blog or have any interesting use cases that you might want to solve, please feel free to leave a comment below.

PS: Refer this link for other high level schematic view of various use cases provided by Google.

--

--