AI Obsoletes Open Standards

I came to this realization tonight thinking about Caavo, the set-top box that claims to unite all the others. The thing apparently achieves the impossible — are you ready? Unplug all of your cables from your TV, plug them into this box, and plug this box into your TV. It presents a consistent, unified visual interface, and can confidently control all of the boxes you own to navigate to the content you select. How do they do it? Using modern computer vision techniques they analyze the output from each set-top box, and a proprietary agent learns what commands to send and how to send them.

They’ve been teaching the agent to control a wide range of different set-top boxes. Whether or not it has already, with enough data, this agent can generalize to work on new (even heretofore unseen!) boxes. Much different than the comparatively-open world of modern mobile development, or even game development, the set-top box world has a strong track record of closedness and a complete lack of interoperability. In many cases, these boxes have no command APIs and what they do provide is often riddled with implementation bugs. By teaching an AI to use the same interface that a human would, they’ve done an end-run around the need for any kind of open APIs.

What’s cool is that this concept generalizes! Your phone’s operating system can be emulated in the cloud, and agents can be trained to send taps or fake camera shots. Can’t decide which app’s story to send your latest photo to? Someone can make one story app to rule them all.

In fact this could even be a case of innovator’s dilemma for bigcos like Apple, Google, and Amazon. They’re all rolling out APIs and systems for companies to integrate one-by-one for Alexa and Siri, hoping to unify all of our internet services behind their voice AIs. A small startup could never hope to compete on those grounds. Indeed — even larger companies can’t hope to compete without some control at the platform level. But by using AI in this new fashion, a startup can actually beat Apple’s or Google’s approach, because it can generalize.

I’m busy building VR things, but please someone go out and do this stuff!