Towards machine-understandable service descriptions at Web scale


Part I: Mobile Apps and Affordances


Affordances

Comparing with computers, we, humans, don’t have much difficulty discovering and using services we need to get stuff done. Natural language, commons sense, social norms, all of them make it relatively easy to navigate the world around us. For example, I know that I can ask my friend Shawn to play my favorite song by “Dual Core” on Spotify at a party; in computerspeak, my friend offers an affordance.

Shawn spinning

In addition to knowing what can be done, I also know how to ask for it (using a verbal request directed at my friend). What about computers? How can we describe and invoke affordances in a scalable and machine-understandable way? And why would we want to?

If you have done any kind of distributed computing (CORBA, XML-RPC, SOAP, REST, …), you’re probably quite familiar with interface descriptions expressed in a plethora of ways (IDL, XML-Schema, WSDL, JSON-Schema, …). While they’ve served us well over the years, with computing becoming more and more ubiquitous and eventually invisible, we’ve already reached certain limits of these technologies. Sam Goto explains it well in this blog post.

Today, our interactions with service providers are typically mediated by mobile apps. When you need something done, you launch the service provider’s app. Going back to our music use case, if I want my mobile device to play a song by “Dual Core”, I launch the music app of my music service provider, and, using the app-specific UI, I navigate to the music I want the app to play. I happen to know that the Spotify app offers the “listen to music” affordance. Simple enough (for a human).

Now, imagine I want my Android mobile phone to do something for me and I use natural language to request it. If the mobile device got my accent correctly (a big if) and successfully parsed the request, it needs to know which app to invoke, what is the interface exposed by the app, and how to pass the request information to it. In the music example, my phone needs to know that:

a) Spotify can play music,
b) that the app has the artist I am looking for (“Dual Core”), and
c) how the Spotify app can be triggered to play that artist.

If you think that a system to do all this is still a bit far-fetched, think again. Shortly before Google I/O 2014, without much fanfare, a small search feature launched on Android devices in the US. It implements the steps described above. If you were at Google I/O this year, or watched the “The future of Apps and Search” session recording on YouTube, you know that the feature is powered by Schema.org Actions. What makes it possible is a shared vocabulary, which is supported by several music service providers and Google Search. Service providers publish the affordances their mobile apps support on the Web, using JSON-LD markup syntax similar to the one shown below:

https://gist.github.com/wjarek/5b97349e3fdb7fe09fd8

Note that the supported action (ListenAction), the way to invoke it (a deep link into the Android app) as well as information about the artist (“Dual Core”) are all expressed using JSON-LD markup published on the entity Web page for the music artist. If you find the last sentence rather confusing, you are not alone.


It is more like baseball rather than soccer

After switching projects late last year, I’ve become interested in Schema.org Actions, and how they can enable machine-understandable service description and discovery at Web scale. From mobile app developer’s point of view, the work required to publish Schema.org service descriptions (aka potentialActions) on the Web is not very involved. A simple update to one’s Android app, marking up entity pages as shown in the gist above and some quality time with Webmaster Tools does the trick.

However, in order to understand the technology end-to-end, one must become acquainted with several building blocks which typically are not found in today’s developer’s toolset. But this is exactly what makes it so interesting! If you are one of those people who like to understand what’s under the hood, this blog post is for you.

Schema.org Actions Vocabulary

At first glance, the technology stack which makes a Schema.org Actions implementation possible is rather complex. Structured data, Schema.org, JSON-LD, RDF triples, vocabularies, graph processing, Android intents, hypermedia, … The list is pretty long. When I first started learning all this the learning curve was frustrating. It was not until I managed to create (JSON-LD) graph visualizations in my head that it all started making sense.

Visualizing JSON-LD (via RDF N-triples)

Since Actions technology builds upon structured data, RDF, the Knowledge Graph, and a few other things, developing a deeper understanding of it is very much an exponential, rather than a logarithmic experience. To paraphrase one of my favorite writers, David Brooks of the New York Times, “it is more like baseball rather than soccer” (a follow-up olive branch to soccer fans who find such comparisons upsetting, by Brooks himself, here).


Crossing the chasm

For Google I/O 2014, together with Shawn Simister, and building on some excellent work by Barak Michener, we’ve decided to create a gentle introduction to the technologies involved. Since the best way to learn something is to build it, we’ve covered both the service provider’s side (markup, mobile app), as well as introduced a small Knowledge Graph implementation running in an open source Graph database called Cayley to illustrate with open source examples how the service descriptions can be used in practice.

Reference Architecture for the “Build a Small Knowledge Graph” series

The 3-part series launched at Google I/O, and it includes a companion codelab. For best results, watch the videos linearly. The total running time is just short of 30 minutes, but the content is rather dense, so stop, rewind, and research at will.

In the first video “Build a Small Knowledge Graph Part 1 of 3: Creating and Processing Linked Data”, we introduce you to the reference architecture for support of Schema.org Actions in the context of a specific use case (a music service). The video then focuses on exposing entities using Schema.org markup with JSON-LD.

https://www.youtube.com/watch?v=W9pRpSW_KqA

The second video “Build a Small Knowledge Graph Part 2 of 3: Managing Graph Data With Cayley” introduces data loading and graph processing using Cayley, an open source graph database written in Go. Together, the first and second videos cover ETL for linked data.

https://www.youtube.com/watch?v=0oOwrBEeQss

The third video “Build a Small Knowledge Graph Part 3 of 3: Activating Graph Data With Actions” details activating the small Knowledge Graph stored in Cayley with Schema.org Actions.

https://www.youtube.com/watch?v=KB94dIamAQc

What’s next?

Once you digest the material and play around with the code, a good Google I/O 2014 big picture talk is the “The future of Apps and Search” by Lawrence Chang and Jason Douglas. The session goes into more detail of another foundational technology involved — App Indexing, which, combined with Schema.org Actions, comprises what the presenters refer to App Actions.

https://www.youtube.com/watch?v=O-bfVfxol1E

Standing on the shoulders of giants

If you have reached this far, congratulations! The technology involved in Schema.org Actions is quite fascinating and it stands on the shoulders of giants. Even if it is only a small step forward on the long road towards enabling machines to do useful stuff for us, it is worth learning about.