(This is the second part of Becoming machines)
We’ve left our users, ranging from grandpa to the native conversational, ready for the next big thing in human-machine interactions.
Our speech recognition algorithms are all ears, natural language understanding libraries are fired up, the best text-to-speech of the industry just ended their mic check.
The problem is: what’s next?
Designers can’t sketch a dialogue with Photoshop. Developers can’t code a thought. Users can ask everything naturally but to a perfect faceless and potentially naked machine stranger.
It is a bittersweet impasse: we’re going to play gods and shape a new sociable life form while we still blush at our first dates.
Time to roll up the sleeves.
Bringing humanity to the machines is a quite hard job, mostly because we worked on making machines as they are for years.
Common everyday software development tools and languages are shaped for telling to a machine what to do in a very precise way.
Let’s take some of them.
Boxes diagrams are still alive and kicking: you draw a labelled rectangle which does a thing (i.e. “prepare a pizza”) with some arrows going in (i.e. “mozzarella”, “tomatoes”…) and some others going out (i.e. “A tasty pizza”).
Then draw more rectangles (i.e. “Having dinner”) in which previous arrows goes in (i.e. our freshly baked “tasty pizza”) together with new ones (i.e. “A good beer”) and something more goes out (no example needed, right?).
We can say that everything works well when every part of the scheme have the adequate input for providing the expected output. Let’s put this sentence apart for later.
Message sequence charts are our personal favorite: is some sort of weird comic strip (more like Yonokoma) in which people (or machines) are represented by vertical lines and speech bubbles by horizontal arrows. As in comic strips, you’ve to read them from the top to the bottom but sadly, in computer programming, stories aren’t very interesting nor funny.
This time, everything happens mostly in a strict chronological order. Another sentence for later.
Lastly, something you can check in everyday life: interfaces are structured in a readable uniform way as much as possible.
Contacts, posts, notifications, timetables are always structured in lists, whose elements differs only in their contents and rarely in their structures. These design guidelines were built for improving readability and reducing learning curves.
Enough. Clearly humans don’t work like this.
Have you ever been in an Italian restaurant in Italy? Specifically, have you ever been in an Italian restaurant in southern Italy?
Coming in, walking around in a maze of rough wooden tables, under a faded colorful veranda close to the sea, sitting down and experiencing the absolute absence of a menu. Something great, we guarantee.
We swear that you can go there, sit down and tell the waiter that you want fried squid or something good: in both of the cases most of the details are missing and still you’ll be served.
Do you remember our ”everything works well when every part of the scheme have the adequate input for providing the expected output” machines? What is the secret of this marvelous Sicilian ordering machine, that doesn’t work as an ordinary machine in any way?
Our guess is: contrary to what we may think, you’ll sometime get the result given by the most detailed command but you’ll always get the one of the most generic. Which, in our restaurant example, is surprisingly ”something good”.
So, the real challenge is understanding what is good evaluating the least amount of explicit details: our waiter will bring us something based on ingredients freshness, the current season and how the customer looks, like in a hurry, ready for a romantic dinner or in a middle of a crowded riunione di famiglia. (our beloved family meetings)
Are we saying that thinking with parameters is not good when all parameters are optional? Kinda.
Already leaving? Please, stay in our restaurant a little longer and enjoy the seaside breeze! After all, what is the meaning of the time? Isn’t it something humans defined and not something natural?
When asked on what they want to order, in the same very restaurants, kids usually answer “potato chips”. In Mediterranean diet potato chips are a side food and, moreover, it is aside to the second course. For a machine in which everything happens mostly in a strict chronological order, this is a major issue.
People have intimate priorities; absolute time doesn’t exist for them for real. Following that is something that requires some effort, that isn’t natural at all.
And now the chef favorite question: how informations are delivered back to the user, in our beloved Italian restaurant? You know what is coming, you bet.
Interfaces are structured in a readable uniform way is something doesn’t exist here in Italy. Italians are chatty: our waiter will flood us with a massive amount of random details about living in Italy, the history of Sicily from 8000 BC to few days before and — sometimes — about the food you’re ordering.
It is awkward but we act like that for a reason: we think that information have to be always formative and/or entertaining, in order to build a strong bond of trust, to spread a positive vibe between the customer and everything around him and to create good and unforgettable memories.
We aren’t saying that bots have to speak Italian or to be annoying: human attention span is rumored to be only 8 seconds long nowadays and we aren’t always in the right mood to be lectured about the history or local rumors. But is fair to point out that there is an interesting range between short structured sentences, in which we listen just we need (i.e our name, our destination in station announcements) and our noisy waiter.
The challenge is figuring out how to be entertaining while matching person attentional control and fall into its attention span.
A sip of coffee and it’s time to go. Ultimately, what usability will be in the future of conversational interfaces?
Are we going to become robots once for all, learning a standard structure for communicating with machines (and each other), using strictly sequential dialogues and getting algorithmically generated answers — warmed up with just a pinch of humanity for making our nostalgic ancestors happy?
Or are we going to take our humanity back, working hard on detecting informations in streams of words, providing answers from sketchy questions and enriching machines answers smartly and selectively?
We lied. We don’t have the right answer. We’re here just for making you the right questions — the destiny of what humanity means lies in your own hands.
So, what you’ve just watched is the beginning or the end of the conversational interfaces?