Iron Man’s Jarvis — is it still a fiction?

Mirek Stanek
5 min readOct 11, 2016

Who doesn’t dream about Iron Man’s suit? Infinite power source — Vibranium Arc Reactor, ability to fly and dive thanks to Repulsors and oxygen supplies, almost indestructible single-crystal titanium armor with extremely danger weaponry.

Since we’re still years or even decades (are we?) from having at least prototype of flying metal suite, there is one piece of it which can be closer than we think.

JARVIS

While Vibranium Arc Reactor is a heart of Iron Man suit, the equally important thing is its brain — Jarvis.
Jarvis is a highly advanced computerized A.I. developed by Tony Stark, (…) to manage almost everything, especially matters related to technology, in Tony’s life.” Does it sound familiar?

On the latest conference big G showed us a couple things #MadeByGoogle. At the beginning Sundar Pichai spoke about a new concept: AI-first. Living in a times driven by mobile-first strategies many of us got used to using pocket device in many aspects of our life — calculations, notes, informations, entertainment, communication, navigation and much more. But mobile device is just a tool in our hands — versatile, but still a tool. In most cases it just increases our capabilities — we can travel faster (navigation, traffic informations), speak with people far away from us, cleanup our brain by making notes. But order of acting is always the same — we have an intention and then mobile device does the thing for us. We have a query and then search engine returns the answer.

AI-first

Now we’re slowly moving to another concept after mobile-first: AI-first. It means that our tools like mobile devices (but not only!) are able to learn our habits, plans. And not only from us but also from tho whole context around: current location, time of the day, weather. That’s why Google is able tell us to go out earlier to planned meeting, because of traffic jams. Or Nest can set the best temperature to make us feel comfortable but also to save as much energy as possible during the day. Reminder app tells us to buy a bread when we’re in bakery and Google Now reminds us about umbrella when it’s raining. And we don’t need intention for all of this. Our devices, or maybe better say: personal assistants, know us very well.

“Ok, Jarvis…”

Voice Input is with us for a long time now. Your current mobile device has it for sure, no matter if it’s iPhone, Android or Windows Phone. There is a pretty big chance that the previous one, or even the earlier one, which lie in your desk also support voice commands.

In older generation devices commands were just the commands, e.g. “call mom”. But currently thanks to Siri, Cortana or Google Now we can ask our mobile device for a weather (no need to provide a location — your phone already knows it), schedule meeting in your calendar or take a note. And you can do it more naturally than ever before: “Hey Siri, should I take umbrella tomorrow?”, “Ok Google, wake me up in one hour?”. Those and similar sentences will work thanks to Natural Language Processing. Sure, some of sentences (and their variations) are hardcoded somewhere on the servers, but some can be dynamically interpreted by Speech-to-text and intention recognisers like api.ai or wit.ai.
“Find (buy) the cheapest flight from New York to London in 3 days” will give us:

  • Source, destination: Now York, London
  • Date: 3 days from today (+ today date taken from current context)
  • Criteria: the cheapest
  • Intention: Find (buy) a flight

Having all of those informations will probably result with just 1 API call, e.g. to Skyscanner. What does it mean? That technology doesn’t limit us anymore. We are able to schedule a flight with just a voice commands.

For end user it means:

  • Tell what you need to do
  • Don’t open your mobile device/computer
  • Don’t open (travel) search engine
  • Don’t adjust the query to your needs
  • Don’t compare results
  • Enjoy your flight.

Personal assistant, not a device

Speech-to-text processing becomes better and better now. Also thanks to us — the users. Having conversations with Google or Siri we feed neural networks with a new samples of data what makes them better and more accurate. Sure, we still don’t want to talk out loud with our phones or tablets. Fred Wilson mentioned this and published poll on his Twitter:

Those results are probably even worse for non-native English speakers (me included).

What is the reason? It’s still not natural for us. Over the years we used to use our phones on tablets by touching them. Sure, we always talk by the phone, but talking to it still looks wired.
But now new kind of solution appears on the horizon. Google Home or Amazon Echo — the devices for which voice interface is the default one. Finally we have devices which we will tell to switch on the lights, play our favourite movie or get some informations from Wikipedia. And finally speaking with a devices becomes natural for us. Actually it won’t be just a device, but your assistant. Your personal Jarvis.

Take Jarvis with you

Let’s move back to mobile device for a while. After some time and a couple conversations with Amazon Echo or Android Auto, your car system we’ll see mobile devices from completely new angle. They won’t be just a phones with voice control, but more like devices to take your home assistant with you. And talking to them won’t be weird anymore.
Moreover, we’ll not need to pull the device out of our pocket. Thanks to AirPods and dozens of others earphones (which even doesn’t require Jack headphone port) we’ll be able to talk to our assistant with constraints. And the device hidden in your pocket, backed by cloud computing services will do everything for you, confirming with “It’s done, Tony”, and the end.

The future of holograms

At the end there is still one big piece missing in our personal Jarvis — visual feedback. We still haven’t invented technology which would be able to draw 3d projection on the air with just a light. But no worries — there are projects like Hololens which will mix the reality with holograms, so you will be able to see the whole universe on your table, projection of missing parts when you build an engine, or even people who are thousands of kilometers away from us. Sitting on our coach.

And what is the most fascinating in all of those solutions and technologies? That we all have access to them. We can all build them — by creating bots for messaging apps, integrating natural language processing APIs, building apps for: mobile devices, home assistants, cars, even Hololens. This is not a secret knowledge anymore — those are just pieces which needs to be connected properly.

There has never been a better time in the whole history of the world to invent something (← I really recommend this post!).

--

--

Mirek Stanek

I empower leaders through practice 🛠️💡✨. Site Leader and Director of Engineering at Papaya Global. Ex-Head of Engineering at Azimo.