Matthew Gilliard
Apr 11 · 6 min read
A group of people sitting around a table using computers. Azure Computer Vision has detected people and various objects in the image and highlighted them

AI services like Computer Vision are getting easier and easier to play with, and we can have some fun by making them available to use from our cellphones. In this post, we will use Java to connect the Twilio API for WhatsApp with Azure’s Computer Vision APIs to create a bot that can describe photos. It would be neat to use this for generating alt-text to help make your images more accessible online, for example.

We will need the following to get started with this post:

How it works

When Twilio receives a WhatsApp message it will send an HTTP request to a URL we provide.

Our mission is to create an app in Java which can handle those requests. The app will take the URL of any photo in the WhatsApp message and pass it to the Azure Computer Vision API which will generate a description of whatever is in the picture. The app will then grab Azure’s caption and use it as a reply to the original message on WhatsApp.

Are You Ready?

Create a new project

If you would like to check out the completed code, it can be found on my GitHub repo. Or you can follow along with the post where we will be building a fresh app using Maven.

mvn archetype:generate \
-DarchetypeGroupId=pl.org.miki \
-DarchetypeArtifactId=java8-quickstart-archetype \
-DarchetypeVersion=1.0.0 \
-DtestLibrary=none

This command will prompt for a groupId and artifactId. If you’re not sure what those are then check the Maven naming guide - I used lol.gilliard and twilio-whatsapp-azure. We can accept the defaults for version and package. The project will be created in a subdirectory with the same name as the artifactId. Open the project in your favourite IDE.

Create an HTTP server to listen for Twilio webhooks

SparkJava is a microframework for creating web applications in Java — it doesn’t need much code to get started so it’s perfect for this project.

Add SparkJava to the <dependencies> section of pom.xml

<dependency>
<groupId>com.sparkjava</groupId>
<artifactId>spark-core</artifactId>
<version>2.7.2</version>
</dependency>

Create an App.java file in src/main/java and add a main method that configures SparkJava to respond to HTTP requests:

import static spark.Spark.*;public class App {
public static void main(String[] args) {
get("/", (req, res) -> "Hello \uD83D\uDC4B");
}
}

Run the project from the IDE and browse to http://localhost:4567. You will see the following in the browser:

screenshot of Hello from SparkJava

Now that the app is up and running, add the endpoint which will respond to Twilio’s webhooks. Add these handy error messages to the top of your class:

private static final String NO_IMAGE_MESSAGE =
"<Response><Message>" +
"I can't help if you don't send an image \uD83D\uDE09" +
"</Message></Response>";
private static final String NO_DESCRIPTION_MESSAGE =
"<Response><Message>" +
"Sorry, I couldn't describe that \uD83D\uDE23" +
"</Message></Response>";

Then put the following code inside the main method, after the get call we wrote previously:

post("/msg", (req, res) -> {
String mediaUrl = req.queryParams("MediaUrl0");
if (mediaUrl == null) return NO_IMAGE_MESSAGE;
String description = getAzureCVDescription(mediaUrl);
if (description == null) return NO_DESCRIPTION_MESSAGE;
// Return TwiML to send the description back to WhatsApp
return "<Response><Message>It’s " +
description +
"</Message></Response>";
});

The XML returned here is Twilio Markup Language (TwiML). For something this small TwiML can be written by hand, but there is also a comprehensive Java helper library for Twilio which can generate TwiML.

The IDE will show an error as we haven’t written the getAzureCVDescription method yet.

Call the Azure Computer Vision API

To use Azure’s APIs we will need a free Azure account. Note that to sign up you will need a credit card but everything in this tutorial is available from Azure’s free trial.

Microsoft has written a great quickstart for calling the Azure Computer Vision API from Java. We can use the code they provide with a couple of modifications to be able to call it from our own main method, and to extract the image caption from the response.

Add the following Maven dependencies next to spark-core in the pom.xml:

<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.6</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4.10</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20180813</version>
</dependency>

Add these imports to App.java:

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.util.EntityUtils;
import org.json.JSONObject;
import java.net.URI;

Also add the getAzureCVDescription method body, underneath the main method. Don’t forget to add the Computer Vision API subscription key. There are Azure docs on getting a subscription key:

private static String getAzureCVDescription(String mediaUrl) {

// Replace <Subscription Key> with your valid
// subscription key.
// SECURITY WARNING: Do NOT commit this to GitHub
// or put it anywhere public
String subscriptionKey = "<Subscription Key>";

String uriBase =
"https://westcentralus.api.cognitive.microsoft.com/" +
"vision/v2.0/analyze";

CloseableHttpClient httpClient =
HttpClientBuilder.create().build();

try {
URIBuilder builder = new URIBuilder(uriBase);

// Request parameters. All of them are optional
builder.setParameter("visualFeatures", "Description");
builder.setParameter("language", "en");

// Prepare the URI for the REST API method.
URI uri = builder.build();
HttpPost request = new HttpPost(uri);

// Request headers.
request.setHeader("Content-Type", "application/json");
request.setHeader("Ocp-Apim-Subscription-Key",
subscriptionKey);

// Request body.
StringEntity requestEntity =
new StringEntity("{\"url\":\"" + mediaUrl + "\"}");
request.setEntity(requestEntity);

// Call the REST API method and get the response entity.
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();

if (entity != null)
// Format and display the JSON response.
String jsonString = EntityUtils.toString(entity);
JSONObject json = new JSONObject(jsonString);

// This extracts the caption from the
// JSON returned by Azure CV
return json
.getJSONObject("description")
.getJSONArray("captions")
.getJSONObject(0)
.getString("text");
}
} catch (Exception e) {
// Display error message.
System.out.println(e.getMessage());
}
return null;
}

App.java should now look like the final code on GitHub. With this new code added, restart the app in your IDE.

Expose your SparkJava server via ngrok

Ngrok is a great tool for helping to develop webhooks. It provides a temporary internet-accessible URL for your development environment. Install it and run

Ngrok will start, and part of the output will show the hostname that ngrok has created for our web server:

ngrok output

Set up Twilio Sandbox for WhatsApp

Once logged into your Twilio account, visit the Twilio Sandbox for WhatsApp to add your phone number to the sandbox. My sandbox code is salmon-finally but yours will be different:

screenshot of how to join a WhatsApp sandbox

Configure the sandbox with the ngrok URL and a path of /msg, to be called when a message comes in:

screenshot of WhatsApp sandbox configuration

Save the settings, and we are ready to go.

Play with your IRL Alt-text generator

Everything is up and running. Twilio Sandbox for WhatsApp is configured to call a webhook when it gets a message. The Java app will receive HTTP requests from Twilio, will extract and forward the MediaUrl to the Azure Computer Vision API and will respond with TwiML that sends back a description of the photo in your message. Nice!

Try it out:

A conversation with the WhatApp bot

What Next?

There is so much you can make…

I can’t wait to see what you build — let me know about it by email or on Twitter:

mgilliard@twilio.com
@MaximumGilliard

Matthew is a Developer Evangelist for Twilio. He is based in the UK and serves the Java community worldwide. The image at the top of this post is a modified version of an image from WOCINTECH stock photos


Originally published at www.twilio.com.

Microsoft Azure

Any language. Any platform. Our team is focused on making the world more amazing for developers and IT operations communities with the best that Microsoft Azure can provide. If you want to contribute in this journey with us, contact us at medium@microsoft.com

Matthew Gilliard

Written by

Microsoft Azure

Any language. Any platform. Our team is focused on making the world more amazing for developers and IT operations communities with the best that Microsoft Azure can provide. If you want to contribute in this journey with us, contact us at medium@microsoft.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade