An Entrepreneur's Tale and Why the Future of UIs isn't Just Text
I wanted to keep this short and simple. But I just couldn’t resist telling you both my story and my vision of the future. This is the first paragraph you’re reading but the last one I typed. That’s how my mind works. So first things first:
TL;DR Scroll down to the last subtitle for the short version, or the last 3 subtitles for a normal version. Read it all if you have more free time.
Chats are cool, but chats are shit. Texting has become an ubiquitous paradigm in a person’s everyday life. Compared to phone calls, somebody from the 19th century would consider it progress going backwards, rather than evolving. Voice+Technology seems fancier than Text+Technology. And in fact it is, or sort of. Voice data is way harder to transmit than plain text. Voice is even harder to be interpreted by machines rather than text. That makes it feel more futuristic. But why’s text so popular again in the 21st century?
First, a little context. By the end of the 20th century a new form of interaction was born, GUIs: Graphical User Interfaces. Apple made them available to the public on 1984 with Macintosh, but to be fair it was XEROX PARC that came first with the idea of a GUI and a mouse. But Apple stole the idea and broadcasted a Super Bowl commercial, to later introduce Macintosh on January 24th, 1984. And humanity went mad. Or not?
At first, people didn’t quite get it. Typing commands seemed way more efficient than moving a mouse with your hand and clicking some items. Seemed like too much effort to accomplish the same old tasks. But GUIs unfolded an unprecedented revolution years later thanks to faster computer processors, better screens, more colours, ergo, more fluid UIs (it’s common to get rid of the ‘G’ in GUIs nowadays, because it’s now redundant).
UIs rich in content and features became ubiquitous and people no longer conceived text-input-based stuff a valid interaction pattern again. But as everybody knows, too much of something is not that good. So, too much UI became a problem. That’s why Microsoft’s first tablets that had Windows XP (a desktop computer UI) on it, made it fail miserably. And that’s also why iPad was so successful, despite being underestimated and derided at first. It was successful not because processors were faster than those of Microsoft a decade before, or because the screen had more colours, or because it had built-in WiFi. No. It was successful because it optimized its UI for human fingers, rather than doing it for a mouse or a stylus. Fun fact: Stylus was also one of the novelties and reasons Palm sold thousands of devices at first, and then nothing, ironically. Same for Blackberry QWERTY physical keyboards. Double sided swords.
Now, Internet is roughly in a third of the human population’s pockets, if not more. With gorgeous screens and beautiful -or ugly- UIs. And ironically, text as an input interface has got a lot of traction, once again. Firstly because of many messenger services (AIM, ICQ, MSN Messenger, Skype, Whatsapp, Facebook, etc.). Partly because it now looks even prettier than the way it did two decades ago. And currently, especially because of the massification of bots. They’re intangible, but they’re becoming ubiquitous (that word again). It feels like the obvious next step in terms of human-machine interaction, of course. Or does it?
If you paid close attention to the last five paragraphs, you may have noticed that this “evolution” in human-machine interaction hasn’t always been that obvious. So then, what’s the next step? Mind reading. Not literally but almost. Imagine magically controlling things with your thoughts. Just staring at small and big screens and boom! You get sh*t done. Your brain does, to be more specific. Almost no physical effort is required. You don’t move a muscle, you spend way less energy on otherwise complex efforts to coordinate your eyes, hands and maybe other parts of your body. You don’t even speak. And yet, you get things done, just by thinking about them.
UIston, we’ve got a problem
But… And there’s always a but, here’s the thing: we’re not there yet, it’s still an experimental technology, expensive and not very accurate. It’s been investigated and used for years now, but it’s still not ready for mass production. I may be wrong, though, but there’s one thing I know for sure: there’s not a single mind-controlled electronic device that currently has, or intends to have, mass adoption. Not even close, I dare to say. There isn’t an Apple iBrain, or some kind of brain waves sensor embedded in the iPhone, or anything like that on the market.
And yet the question remains. What’s the next step? In chronological order, since the 19th century, communication and interaction technologies have been: text in the form of telegrams; then voice thanks to the telephone; then text once again starring as the preferred input method for machines; then the GUIs, a radical change in the way humanity interacted with something other than human beings; then voice wanted to star again, thanks to speech recognition software, that basically transcribed spoken words into good ol’ text; then text starred once again thanks to messenger and chat services. And now we feel like it’s time for sci-fi to become reality and have mind-controlled interfaces. But you know what? Seen from another perspective it’s just the same story as the classic, but still valid, point-and-click method from the 70s~80s, plus text input.
Read that last sentence again and focus on the word plus. Let me tell you, fellow reader, that’s the key. Or so I think. That’s where the current holy grail of usability lies. Addition. The simple sum of input and interaction methods might be the answer to our prayers.
Now, let’s assume you’ve got this far in my pretty repetitive text, which could be easily called: “Interaction and input methods in a nutshell”, “Obvious history of input methods” or “Usability for dummies”. But bear with me. I’ve got a point, I promise. Yes, we’ve been adding these methods one on top of another, and been using and perfecting them for years or decades now. But we also don’t want to just sit and wait until mind-control or some other form of magical interaction appears before our eyes (or thoughts).
A late preface
You may have noticed I’ve mixed pretty different concepts, and even used them interchangeably. Specifically: UI, usability, input method and bot. And it’s because that’s where I’m headed to. Focus on the last one. 2016 looks like it’s the year where thousands of bots will be born, whereas 2008–2014 was the period where millions of apps were born. 2015 was more of a transition year. People got tired of (un)installing dozens of apps that eventually became useless or were never even opened. Zombie apps is how they’re called now. Slowly eating your phone’s memory and resources, but useless. That’s why most of the apps people actively use, belong to just a handful of big companies.
It also looks like artificial intelligence (AI) is taking over, and is present everywhere. Self-driving cars, devices as waiters in restaurants, programs analyzing the stock market, customer support bots, or bots just for fun. Even more advanced stuff such as IBM’s Watson that can play Jeopardy! and do some other cool stuff, or the recently launched Amazon Web Service for Machine Learning, which kind of puts a piece of a sci-fi future in developers’ hands.
Most of us, or better, most of the time, an average person behaves rather… well, average. With that I mean, an average person’s daily tasks do not involve complicated calculations or analysis of huge amounts of data, or taking well informed decisions that affect other people’s lives, etc. Imagine Bob, an average human being. He might be a data scientist, a musician or a taxi driver. Your call.
Aside from work, Bob and many more people in the world have about the same needs and might share a lot of preferences. Like millions, he needs transportation, he needs to buy food and groceries and he keeps forgetting essential information that he ends up googling, such as subway or bus routes and timetables. Occasionally he also has food or groceries delivered at his doorstep, requests an Uber, uses Waze to avoid traffic, books restaurants, buys tickets for concerts or sports events, and whatnot.
Bob can easily accomplish the majority of these tasks thanks to his smartphone. And most other people are able to as well. It’s thanks to a huge amount of apps, services and platforms, that we can accomplish so much. Just with an internet connection and a mobile phone, tablet or computer. Of course, it depends on the infrastructure and reliability service providers can offer. But, just like tap water, we all take that for granted.
It is also this huge amount of apps, services and platforms, what represents a problem. We’re presented with an overwhelming amount of options, sometimes for the same task, and we don’t even know what to choose. Or how to actually use them. But deep down, in the bottom of our hearts, we all know that we just end up using no more than a handful of services 90% of the time, and the rest might be pure luxury, or garbage. Sadly, companies now spend thousands and millions of dollars in apps development. Mobile apps seem to be their own version of a holy grail. It’s as if somehow, millions of users would download and magically become addicted to their apps, no matter how boring or useless they are.
So it’s 2016 and we have millions of incredibly unused apps. But it’s not their fault. It’s the ecosystem’s fault. Nobody predicted such an enormous growth of the app industry, and along with it, services and platforms. Unicorns now do exist. But “regular horses” are just gone. Or are they?
Why I’m telling you this
Much has been said and written about AI taking over everything, about the Singularity. And this all leads to a new concept, the “no-more-UIs” concept. A lot of money has been thrown into projects like Magic or Operator, that on 2015 promised to super-power their personal assistant / concierge services with AI. That it would somehow revolutionize the way we ask for the stuff we need or want. And they said the answer was: text. Yeah, once again. And we waited, for over a year now, and the only major milestone Magic has achieved is to launch an incredibly expensive service that basically does the same things it did a year ago. The only difference is, they realized a hype-driven business with no clear business at all, is definitely not a sustainable one. Why am I stating this? Because I experienced it first hand. On a smaller scale though, but you don’t need that much to come to that conclusion.
I made part of a team that launched an almost exact copy of Magic, called Kiwi, with a few key differences. First of all, we launched in Bogotá (Colombia). Second, we used Whatsapp as our channel to receive requests, instead of SMS. Actually we built our MVP based on an unofficial version of Whatsapp’s API that got us banned at least four times, as far as I can remember. So eventually we ended up coding some stuff on top of Whatsapp web (which is an official client) and stuck with that a few more months, before developing an iOS and Android App. Why Whatsapp? Because it’s widely used in Colombia. SMS are still incredibly expensive here. By that time, we also ran a groceries’ delivery startup called Lulo, so we had some valuable experience with logistics, especially valuable because of the way it works in our city, very different from US cities.
Our value proposition was the same as Magic’s: ask for whatever comes to your mind, we’ll try to get it done or delivered. And we had fun, for a while. Got all kinds of crazy requests like having dwarfs delivered to people’s doorsteps, or us -the cofounders and core team- singing “Happy birthday” for a girl just because his boyfriend wanted us to, etc. And it worked pretty well. All of a sudden, we started receiving dozens of requests per day and were able to translate those into revenue. Which is the main reason startups don’t die. Not why they survive.
We got a lot of free press that rocketed our active users and gave us a lot of visibility. But that’s where problems started, for me at least. Even while I never spoke with them, I felt Magic was facing the same problem we had. There was just no clear business model at all. Some order would be really profitable, others just weren’t, because people were not willing to pay that much sometimes. Most of our orders involved food deliveries and courier services. At first we did them all, we were in charge of everything, which eventually became unsustainable. Then we partnered with a city courier service that helped us leverage the amount of work. Then another player appeared in the market and began taking customers that wanted food delivered. Especially from restaurants that didn’t do deliveries, so we went ourselves, bought the stuff and took it to the customer's address.
As we didn’t focus on one thing, it became harder and harder to keep user satisfaction on high levels, and deliver. Deliver as in Silicon Valley jargon. Everybody told us focus was the first and most important item in the startup checklist to be successful. But somehow we forgot about it. At first, I was thrilled, but panicking. So many users, but yet so many of them told us they were disappointed. But then, we came up with an idea, that had been on our radar since the early days of our company. It was AI. We definitely had to put AI in the middle of the process somehow, we weren’t sure how we were going to do that. And we thought of it, even before TechCrunch or somebody told us Magic was also thinking about it. And we felt we were getting somewhere.
None of us had prior experience with AI. Not even close. At the time, there were two developers in the team. Just me and another dev. And, as a programmer, I’ve always seen my coding skills as superpowers. So I felt we could do anything we wanted. And we were on a mission to create the next generation chat, the future of user interfaces, or better yet: no UIs. So together with the team we agreed that the future of Kiwi was going to be AI-based. Our plan was to integrate as many services as we could to our system, especially on-demand services. And with the help of a third party service, we used NLP (Natural Language Processing) to interpret our customers requests. Of course, not an easy task. While getting a user’s intention from one sentence or two was plausible, automating the entire process became a lot harder. And since we knew almost nothing about the subject, this time ignorance was no bliss at all. And context is a bitch.
Imagine having people request the most ordinary things, while other people want crazy things. Add up some ambiguity to the mix, some random people not taking the service seriously and spending our agent’s precious time on nonsense. Agents were the people behind the scenes that made it all work. We, the developers, were in charge of keeping systems up, payments working, etc. They were in charge of magic (wink, wink).
We had so many difficulties fulfilling our customers’ needs properly, that most of the time it was all about fixing problems, user complaints, issues, bugs and other technical problems. And because of that, materializing our AI-driven service seemed more and more like a far fetched possibility, more and more. By the end of the year I felt my vision was definitely not aligned with what was happening and felt like we were heading everywhere, but nowhere at all. I gave myself a lot of opportunities to just focus again, so I didn’t feel “derailed” from the so desired path to success. But I couldn’t keep doing that forever, so I quit.
It was probably the hardest decision of my life. Three full years as an entrepreneur. Three products in total that my team and I had launched together. So many 100-hour weeks. So many good moments, so many hard nights. Some celebrations, some difficult moments. As much as we struggled sometimes, I always liked it. The rush, the excitement, the thrill. The opportunities we faced. The things I learned. The people I got to talk to. The places it took me. Silicon Valley is my own version of Disneyland, and being an entrepreneur got me for the first time there (for free). I was there not just once, but twice, and had the honor of being interviewed in one of the most inspiring places on earth: Y Combinator. I even got the chance to personally speak to Sam Altman for a while. And those are the kind of people I’ll always want to be surrounded by. Not bullshitty investors or speakers -there are good ones-, but doers.
I always wanted to feel like I was part of something awesome. I’ve always had zero interest in being an employee, and I knew the only way to make a significant impact in the world, was by building a company. I knew it was going to be hard. But it was even worse than I imagined. As Mark Suster says, it should be called “entrepreneurshit”. I can build stuff, thanks to code. I love it. And thousands of lines of code later, all I can tell you is that it was definitely worth it. By the way, Kiwi’s business model has pivoted into something more specific now, based on previous experience and experiments, and I think that’s where the key to its success will be: focus.
I don’t speak English as my first language, as you may have noticed. My first language is Spanish. But I love the “baby steps” expression. It says a lot. If you’re super excited about something, and just need to prioritize and put things together, then tell yourself “Hey! Calm down, baby steps”. And while you’re working on it, do it again. And what seemed impossible, suddenly becomes the obvious sum of tasks that will accomplish it.
Back to the initial topic, and getting rid of #StartupDrama, let me finally tell you what my vision is, for all of these not-so-successful attempts to build an AI super-powered future. As I wrote before, it’s not about the what anymore but the sum of the many whats. Imagine an interface that alternates between plain text and more complete UIs, just when you need it. You just type what you need and this new GUI just answers back and lets you answer either with text, or another type of user input, depending on what it needs from you. That’s about it. Simple huh? Let’s dig deeper.
Did you notice how I used “GUI” instead of “UI” again? That was intentional. And it is because G no longer stands for Graphical, but for Generic. So, Generic User Interface, as dumb as it sounds. Do not consider it a chat, do not consider it an app either. Yes you can type, yes you may picture it as a chat with “message bubbles” on each side. But each of those bubbles is magical. And the GUI will limit you in the way you interact with it. Here’s where the expression “baby steps” means a lot. You don’t just type like a maniac and ask for random stuff like “Hey! I want a car, buy me one”. No, that’s why you can conveniently text your mom anytime. A GUI is a place you go and ask for specific things, no matter if you need them or just want them, like Bob’s tonight’s baseball tickets, for instance. Once you ask, you can’t keep on typing messages, otherwise you would be texting your friends on Snapchat or Whatsapp, right? But this is not what a GUI is for.
GUI will only accomplish what you’ve taught it to. Or specified. Let’s say you type “I want an Uber”, and GUI answers by showing you a map, so that you choose where do you want that fancy Uber to pick you up (or your girlfriend). At this point you may say ordering an Uber that way is even harder than just opening the Uber app and doing it. You may be right, or not. Notice something, the only prerequisite for a GUI to work well with Uber’s service is that you’ve previously signed up for it. Maybe on their website or mobile app. The thing is, thanks to a GUI, you could just uninstall the Uber app, or never download it at all. It still might sound like a stupid use case, but again, bear with me.
Now imagine you’re at the cinema, watching Batman vs Superman, and you forgot to buy some popcorn, but you also rather starve to death than miss 5 or 10 minutes of this awesome movie. Suddenly you remember that “there’s an app for that!” And of course, you can go to the App Store, search for it, download it, sign up and, assuming you’re paying in cash so you don’t bother entering your credit card details, after at least 5 minutes you finally order some popcorn and a tasty hot dog. And it gets delivered to your seat 10 minutes later. Seems convenient right? Kind of. Let’s be honest now, that app might never be used again, or a couple of times at most. Yet another zombie app in your phone.
Now imagine this: same situation, you’re hungry but not willing to stand up and go for food (first world problem). But you’ve got this great GUI that you open in your smartphone, and it automagically detects you’re at the movie theatre, so it suggests you use their food ordering service. You, obviously say “Yes!” to that and it displays a small menu which features the list of things you can buy and have them conveniently deliver those to your seat. You can pay in cash or GUI can have your credit card information beforehand. Ten minutes later and boom, you get what you want. It's that easy. You had to install zero apps, you might use it again in a couple of months or years, and it may have improved over the time with Bitcoin payments and more beverages, if you’re a frequent customer they might offer you special promos, who knows. And you just had to use one app.
Imagine how product development could speed up thanks to this “Generic revolution”. Everybody would focus on perfecting their service rather than on multiple UIs and OS specifics. That would be a GUI’s task. And I truly believe in a future that looks like that. You can text, you can navigate through all sorts of menus, galleries, slideshows, Tinder-like UIs, 3D Touch enabled stuff, etc. Even new cool UIs and voice-commanded input methods flourishing. And of course, it should be AI powered, but not AI dependant. Almost everyone is on board, and almost any business model fits there. It’s an unbelievably big task, but I’m hoping to accomplish it someday. I just have that dream of an ubiquitous, intangible entity that presents me with all the stuff I need, sometimes actively using it, sometimes it passively suggesting ideas.
No mind-controlling stuff I know. But still pretty cool. And again, no extra apps involved. Companies and services in general would stop spending millions in apps and expensive UIs. They’d have to focus on less tasks, and the less the better. Many apps and platforms will still be device or OS dependant, for sure. But not everyone. No more iOS, Android, etc. apps. No more websites or desktop apps even. Just one UI to rule them all.
Yes I could’ve just published this last section of the post, I know. But again, I just couldn’t resist telling the world this story, my way. And possibly the beginning of something quite cool.