Left: A human Jarvis goes above and beyond butler duties in assisting Agent Carter of Marvel fame. Right: 50 years later, a fully automated Jarvis assists Iron Man in much the same way, now as an integrated AI from within his helmet.

Finding Jarvis: An Insider’s Look at the Future of Digital Messaging Agents

Ivan Lee
The Startup
Published in
10 min readMar 29, 2016

--

In October 2015 I joined a startup named GoButler as VP of Product to pursue the vision of a messaging-based, human-assisted, on-demand virtual assistant for everything. 6 months later the company has announced it is pivoting to a fully-automated travel assistant. This is no longer the mission for which I signed on and so I left the company. I remain very excited about messaging-based virtual assistants, and look forward to the innovation 2016 will bring. In the meantime, I’d like to share an insider’s experience and some of the obstacles we encountered along the way. I hope this can save other startups’ time and spark a discussion about the potential and pitfalls in this space.

This will be a longer post. I’ll start with explaining the opportunity I identified, some challenges that have yet to be resolved, and end with predictions about the coming year.

The Power and Limitation of Search

Prior to GoButler I had the opportunity to work on and launch a new take on mobile search at Yahoo. During my time on that project, I had a close-up view of how users interact with the internet and the expectations we have come to associate with search engines and algorithmic intelligence. Over the last 15 years we’ve been taught a shorthand notation that has been at times wryly referred to as “Google-fu”. Let’s say I want to research a potential family trip to Miami. As a reasonably search-savvy user I might type in [miami top sights to see]. I’m returned a slew of results vying for my attention — reputable sites like Time Out and Fodors advertise attractions alongside Google’s own list of locations they’ve learned to be popular. But I had pre-conceived expectations of beaches and water activities — this is a tropical getaway, after all. I now have to reformulate my query — maybe I’ll try [miami beaches]. Hmmm… good list but that doesn’t really help me decide where to go. How about [top 3 beaches within 30 minutes of Miami]? The thread’s going cold. There are tasks at which search engines excel. But the nature of a single query box, with limited ability to refine as we would like in a natural conversation, is a situation we have come to accept.

Singular queries have come a long way, but ultimately don’t capture the conversational context inherent in a single search session.

One aspect I was excited to explore at GoButler was the conversational angle. As many in the industry have published in detail, messaging has quickly become the app genre where consumers increasingly spend the most time. Silicon Valley is sitting up and taking note of the meteoric rise of WeChat and its seat at the center of all user habits. Dozens of startups sit straight-faced in front of investors and promise to become the “WeChat of the West”. And for good reason. Our messaging apps have quickly become the digital embodiment and center of our relationships. Why shouldn’t that definition naturally expand to capture our (significantly more lucrative relationships) with businesses? E-commerce, local/restaurant guides, travel agents, news companies, and even legal companies are all scrambling to be the first to invent the conversational successors to eBay, Yelp, Expedia, Flipboard, and… lawyers. Chat bots — automated and otherwise — are launching an assault on traditionally search-oriented tasks; contrary to a search-box experience, when a user has a research task a chat bot doesn’t have to play a guessing game as to her intent and preferences. It can just ask. The dream:

Miami? Sure thing — where will you be staying?
Are you interested in water sports or tanning?
Will you have a car?
Perfect — I suggest these three beaches. Here are photos from Instagram and reviews from TripAdvisor to help you decide.

AI, AI, AI It’s Christmas

I also had the good fortune to work with Yahoo Labs, launching a machine-learned entity-detection system. The easiest way to describe what we worked on is with an example. If you type in the term [gravity] in a search box, are you a middle schooler looking for help with a homework assignment? Are you looking for the Oscar-winning movie starring Sandra Bullock? Or hours of operation for the Gravity Bar down the street? We can use a variety of aggregate and individual variables to determine what you most likely mean.

Machine learning has been improving in leaps and bounds in the last few years, with many heralding a new Golden Age of AI. We are on the cusp of self-driving cars, robots standing up to human bullies, and we just cleared a monumental milestone in defeating the reigning human Go champion. There’s seemingly no end to the sweeping impact machine learning, neural networks, and (un)parallel(ed) computing power will have on our lives.

So when we talk about virtual assistants there’s understandably a significant amount of excitement around the ability to build a real-life, fully-automated Jarvis, Jeeves, or Alfred. It’s my personal opinion we are still some dedicated, well-funded years from such an accomplishment, but I am nevertheless excited about the potential for AI to assist and optimize the workflow of human operators.

Have you heard of Oz/Ginny/Jenny/Blitzen.ai?

These are (as of now) names I’ve entirely made up. But the point is, they may well be incorporated by the end of the year. There are a lot of apps, each with their own human names and personalities. Yet I don’t want to download, sign up for, and give my payment credentials to different apps for on-the-go alcohol, organic diamonds, or second-hand luxury bags. On-demand app fatigue is real. Per the original pitch points of Magic and Operator, I’m excited to see a consolidation of service apps to a single starting point. The on-demand service landscape is as disorganized as the web in 1998. A one-stop shop startup (like GoButler pre-pivot) could begin with a directory (Yahoo c. 1996) and eventually rank and automate service execution across the best providers (Google c. 2001). From a user perspective, all you would have to know is that they are constantly updating the list and filtering out everything but the best. If you want Pad Thai from Basil Spoon, it doesn’t matter if they’re on DoorDash, Seamless, or only take phone calls — they’ll handle it for you. If you want flowers delivered to your significant other in Portland, Oregon, they know what service has the best quality and can reliably deliver in your time frame. You focus on driving — they’ve got this covered.

Credits to Sarah Guo, and her insightful take on virtual agents.

Peeking beneath the hype

For every hot up-and-coming trend, there is an equal and corresponding backlash. Just as many hopefuls clamored for more messaging apps and bots in the last 2 years, so have people been critiquing them in recent months as a clear winner has failed to emerge from the hundreds of millions of capital poured into the space. Below are some of the challenges I believe need to be overcome before this catches on with the mainstream.

Mobile UI challenges 2.0

In the last decade we’ve witnessed websites struggle to adapt to the smaller screen and slower data speeds of the mobile device. As we consider the standard messaging application, a whole new batch of questions emerge.

Do interactions belong in the messaging area or the keyboard area?

As users, we currently interact primarily with the bottom half of the screen as we consume content from the top half. As richer content is produced, is it acceptable to be seamlessly moving back and forth between the two halves?

We’ve seen some innovation around customized keyboards. It makes sense to keep interaction where the users’ thumbs are. But content should go where the messages are, right? Should you be expected to interact with those too? If so, is it okay to be frequently switching back and forth, interacting with both the bottom and top halves of the screen? If not, the already small mobile screen is now effectively halved. We learned before that constraints mean better UI, because we limit the info to what is truly necessary. Just how far can we take this philosophy? ¯\_(ツ)_/¯

Which way is back?

A screenshot from Luka, an automated chat application to recommend nearby eateries. Luka has innovated frequently on chat-app UIs and has a hypothesis on how back-buttons should operate.

Let’s say you are presented with a mobile restaurant page featuring all the information you could need — images, reviews, a menu, and hours of operation. If you click through on a menu, what pops up? A full screen modal? A partial screen overlay? Is there a back button? In my usual conversations, when I want to see historical information I just… scroll up. Speaking of which — if I interacted with a card earlier and booked a reservation, can I still scroll up and interact with that reservation card to change the time? Or do I get a new card with which to interact?

Is texting really the fastest interface?

Expedia has an excellent and intuitive UI for selecting dates.

We’re all pretty happy with our Google-fu shorthand from above. We’re proud of our handle on this pseudo-language, and it saves us keystrokes. Do I want to go back to forming full sentences? Is texting a service the same as texting a friend? If I know I want tickets to a concert do I want to type in “September 18" or do I want to scroll through a calendar showing me what dates fall on a Friday? There are times when an old-fashioned form actually saves time and keystrokes. Other times, it’s nice to be able to search for “a flight home for Thanksgiving”, without needing an exact date. What’s the balance between the two?

Trusting the robot

When it comes to automated systems, users still need to be convinced. How often have you visited an insurance website to “live chat” with an automated persona that is little more than an automated phone tree? Speaking of which, how much do we already despise automated phone tree bots? A lot of early chatting prototypes (read: startup MVPs) evoke a similar feeling — there are commands the system understands; but poke a little further and it falls apart. Users are primed to be disappointed, so we in the tech industry have an uphill battle in regaining their trust.

Robot ingenuity credit to the ever-innovative Simone Giertz.

Furthermore, specifically for hybrid human-AI systems out there, there’s a particular situation I’m still puzzling over. (Thanks to my colleague Tim Sturge for pointing this one out). When a user speaks to a system and gets an immediate response, they switch mindsets. Oh great — I can keep trying a lot of different things here. Let’s assume you are interfacing with a new messaging service for the local movie theater.

“What adventure movies are showing this weekend?” → [immediate answer]

“How about comedies?” → [immediate answer]

“How much are 3d tickets?” → “Thank you for your question. Our ticketing agent will look into this and get back to you in 5–10 minutes.” → [await human to hop on and respond]

Oh nooo! I didn’t mean to spend human time on that! I want to go back to the robot. How do I cancel this request? Start over? Hello? 😔

Looking ahead

There is no shortage of technical and interface challenges ahead before the masses can begin delegating off their errands and research. Most importantly, we have yet to see the killer use case — the visionary glimpse of a service we can no longer live without. That said, I believe a leader will emerge this year. Here are a handful of things I’m keeping an eye on.

Facebook

With two of the top messaging apps in the world in their portfolio and an unrivaled distribution, Facebook’s moves simply can’t be ignored. They’ve been hinting at a couple of projects already, launching M and announcing partnerships allowing businesses to have direct conversations on Messenger. I’m definitely tuning into the F8 conference to see what they have to announce this year.

Platforms

When I was first graduating from college, the Facebook Platform fostered the growth of wildly successful companies like Zynga and Slide. Meanwhile, startups were chasing dreams of success on the iOS App Store for the nominal cost of $99 a year. Since then, I’ve been waiting for the emergence of the Next Big Platform. It could be on the Oculus; it could be through a new dominant standard for the Internet of Things; in the meantime, there’s a race to attract developers to the relatively nascent Slack and Telegram messaging platforms. Facebook could announce something similar for Messenger.

User time is still currently split across a half dozen messaging apps. As such, advertisers, developers and businesses are biding their time to await the platform with validated ROI. Should that happen, I’d expect to see a quick consolidation.

Voice

I should throw in a note about voice as an interface. It’s a big feature on Asian platforms as users effectively utilize their phones as walkie talkies. It’s easier and faster to leave a quick voice message than it is to type out a message. For cultural reasons, this hasn’t taken off in the States. It hasn’t been acceptable to be leaving voice snippets with no context while in transit. Even as Siri and Google Now get smarter, I don’t really see this behavior changing soon in the public domain. That said, I’m intrigued by the Amazon Echo. Friends with the device rave about its abilities, from being able to read stories to their toddlers to placing a quick repeat Amazon shopping order. It makes sense that people are more willing to place voice commands in the privacy of their own home, and I like the idea of the Amazon Echo learning new skills.

At the end of the day, I’m not done with the space yet. Startups and large companies alike will iterate and find creative solutions to the challenges above. The rewards in winning this space are immense and therefore will continue to attract the best and the brightest.

Jarvis, remind me to check back in a year.

(Thanks to Pulah Shah, Jessica Lee, and my parents for proofreading earlier versions of this post.)

--

--

Ivan Lee
The Startup

I enjoy thinking about, designing and building impactful products. I approach life like a game.