Ok Google, Charge $2 for Coffee

When I received my Google Home, I immediately felt the urge to build something with it. After a good night’s sleep, I had an idea: Baristas and business owners usually have their hands full behind the counter. What if they could complete transactions without tapping buttons at their point of sale? What if a merchant could accept Apple Pay transactions using only their voice?

In the video, we’re using Google Home to activate the Square Contactless Reader and take a real Apple Pay transaction backed by a Square Cash virtual card.

This was built in a few hours using only public APIs. Let’s look at how it works!

Turning voice into coffee

Here’s what seems to be happening:

While baristas make it look really simple, making espresso actually depends on many moving parts. Similarly, the above diagram is really a large oversimplification. Many more components were involved in enabling this seemingly simple transaction.

Roasting the Java beans

Let’s start with the connection to the Square Contactless Reader. Square does not currently offer APIs to connect directly to it. However, our Point-Of-Sale app exposes an API that lets other apps move the Point-of-Sale app to the foreground to process in-person payments.

That means we can build a custom merchant app that calls the Point-of-Sale API. Once in the foreground, the Point-of-Sale app activates the NFC chip in the reader over bluetooth:

Let’s create a new Android app, Home Charge, that triggers the Point-of-Sale API! According to the documentation, we just need to register our new app, then add a dependency and a few lines of code:

public class ChargeActivity extends Activity {
// ...
  public void startTransaction(int dollarAmount, String note) {
ChargeRequest request =
new ChargeRequest.Builder(dollarAmount * 1_00, USD)
.autoReturn(3_200, MILLISECONDS)
Intent intent = posClient.createChargeIntent(request);
startActivityForResult(intent, CHARGE_REQUEST_CODE);

Pushing the tamper down

Calling startTransaction() in Home Charge moves the Point-of-Sale app to the foreground and activates the Contactless reader. We want to trigger this without any user touch input. That means we need a server to send a message to our app, which will then call startTransaction().

Let’s use Firebase Cloud Messaging to push server messages to our app. The setup is straightforward, and we just need to add a few lines of custom code:

public class ChargeService extends FirebaseMessagingService {

final Handler handler = new Handler(Looper.getMainLooper());

@Override public void onMessageReceived(RemoteMessage msg) {
Map<String, String> data = remoteMessage.getData();
final int dollarAmount = Integer.parseInt(data.get("amount"));
final String note = data.get("note");

handler.post(new Runnable() {
@Override public void run() {
startTransaction(dollarAmount, note);

private void startTransaction(int dollarAmount, String note) {
App app = (App) getApplicationContext();
ChargeActivity chargeActivity = app.getResumedChargeActivity();
if (chargeActivity != null) {
chargeActivity.startTransaction(dollarAmount, note);

We can retrieve the registration token with FirebaseInstanceId.getInstance().getToken() and manually test the app works with curl:

curl \
--header "Authorization: key=API_KEY" \
--header Content-Type:"application/json" \
https://fcm.googleapis.com/fcm/send \
-d \
\"data\": {
\"amount\": \"2\",

It works!

Receipt printing works as well!

Now let’s focus on Google Home.

If Tired Then Triple (shot)

Google Home provides APIs to have a two-way dialog with users. There’s also a much simpler solution: IFTTT (if this, then that) supports Google Assistant triggers. It can match any text pattern with a number and a text parameter. This is perfect for us, as we want to parse a simple sentence: “Charge {amount} dollars for {note}”.

We also need to define an Action in response to that Trigger. IFTTT has a Maker channel that provides a Web Request Action. Unfortunately, we cannot use it to directly call the Firebase Cloud Messaging servers, because it does not support authorization headers.

Channeling the coffee stream

Let’s create an endpoint that will receive IFTTT HTTP requests and proxy those to Firebase Cloud Messaging. One quick way to do this is to set up an App Engine project and write a good old Java servlet:

public class MessageServlet extends HttpServlet {
  @Override public void doGet(
HttpServletRequest request,
HttpServletResponse response) throws IOException {
Map<String, String> queryParams = parseQueryParams(request);
String amount = queryParams.get("amount");
String note = queryParams.get("note");
String apiKey = queryParams.get("api-key");
String token = queryParams.get("registration-token");
    String json = String.format(
+ "\"to\":\"%s\"}",
    URL url = new URL("https://fcm.googleapis.com/fcm/send");
HttpURLConnection conn =
(HttpURLConnection) url.openConnection();
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Authorization", "key=" + apiKey);
    OutputStreamWriter writer =
new OutputStreamWriter(conn.getOutputStream());

response.getWriter().println("Over Extracted?");

Then we just need to define the corresponding IFTTT Web Request Action:

The Full Picture

Now that we know the steps needed to turn voice into coffee, we can update our initial sequence diagram to include all the interactions taking place:

I hope you enjoyed this blog post! The sources for this project are available on GitHub. What cool things are you going to build on top of our public APIs?