Opening ceremonies for the World Cyber Games 2019 in Xi’an, China

How Voice Technology is Transforming Gaming

My talk at the World Cyber Games 2019 in Xi’an, China

Jason Shen
Jul 28, 2019 · 11 min read

The World Cyber Games is a historical esports franchise that was huge in the 2000’s and 10’s, before running into some financial and organizational difficulties and closing in 2014.

The event was revived in 2019 and they hosted a massive tournament featuring 27 nations (though the Chinese contingent was the largest by far) and held competitions in Clash Royale, Crossfire (a Counterstrike-like mobile game), Dota 2, Hearthstone, Honor of Kings (a MOBA mobile game), Warcraft III, Startcraft II, and several others games.

Beyond the tournaments, they hosted other events like a cosplay competition, a robot battle event, and an esports / gaming conference. I was one of the speakers for that conference and gave a talk on how voice tech is transforming gaming.

Games have always been about creating immersive experiences.

Over the years, they have become a dazzling feast for the eyes, thanks to incredible advances in graphics and 3D processing. New developments in augmented reality and virtual reality promise to take that visual experience even further.

But today I want to talk about less visible, but no less important revolution in gaming. And it starts with our voice.

Imagine a world without voices. Most of us would struggle to understand and be understood.

We could still gesture and write and the deaf and mute communities have shown you can fully function in society without hearing or voice, so I’m sure we’d find a way. But there’s still something so powerful and personal about having intelligent dialog with another being.

Before I became a tech entrepreneur, I earned two degrees in biology from Stanford. One of the things scientists have identified as separating humans from other animals is our mastery of language & vocal communication. Our species evolved this ability through three distinct developments

The first is fine-tuned control over of our lips, tongue, and vocal cords, something that other primates lack.

The second is a desire to share nonessential information. Animals will communicate primarily around food, danger, and sex. Humans on the other hand, will share and ask all kinds of thoughts and feelings.

For instance, “You just died five times in the last 2 minutes. Do you even know how to play this game?” (Definitely not a personal example)

The third is the cognitive ability to understand syntax and complex phrases that depend on context.

When someone tells you: “I’ve picked up dozens of kills this weekend” and you’re at an esports tournament, it’s probably just some friendly banter. But it’s a whole different story if you’re on a battlefield or a prison yard.

Catherine Mohr speaking ahead of me

This is a conference about technology and one rule about new technology platforms is that we inevitably invent gaming experiences for them. This was true for TV, computers, the web, and mobile devices, and it will continue to be true for voice platforms.

The distribution of high speed internet, advances in speech recognition and natural language processing, and the ubiquity of microphones on internet connected devices have already started to reshape gaming. Here are three ways it’s happening.

10 or 15 years ago, if you were talking to someone while playing a game, chances are they were sitting next to you. Think about the popular online multiplayer games at the time, Starcraft, World of Warcraft, EverQuest, Counterstrike, none of them had built in voice communication. They were meant to be played effectively with just text chat.

These days, games like Fortnite have become the new shopping mall, a place where people can talk and hang out while battling other players online. Discord has gathered hundreds of millions of users who use its text and voice chat features to connect with gaming communities all over the world. Some games now even support voice chat natively. The bottom line is that voice communication is quickly becoming a core part of the gaming experience.

When we see high level esports teams compete like we have these past few days, we’re seeing players with incredible reactions, game knowledge, and mechanical skill. But if you look underneath the flashy action onscreen, you’ll also find a frenetic conversation between all the team members. They’re calling out targets, tracking enemy spells, and making decisions about strategy, all in real time.

My company Midgame works with some of the biggest names in esports, helping top professional and collegiate teams better understand and analyze their team communication. Let’s take a look at one organization we work with called Roman Esports which has a masters level League of Legends team.

Note: we take team privacy seriously and got permission from Roman before analyzing and sharing some of their comms.

If you’re not familiar with this game, it’s part of a genre known as multiplayer online battle arena or MOBA for short. Two teams of five players start at opposite corners of a large map, where they look to advance onto the other’s territory, kill their enemies, and eventually destroy the opposing team’s base to win the game.

Here’s a clip from an actual game: Roman, the Red team, is facing off against L9, the Blue team. Without the player communication audio, it may looks like each player is just making independent decisions about what to do or who to attack.

Let’s pause there for a second.

In a minute, I’m going to play some more of their game with the team communication on. This is what I want you to listen for

In this clip, we’re about 20 minutes into the game and Roman has a strong but not yet decisive lead against L9. They’re waiting near Baron, a jungle monster that spawns late in the game and provides a powerful boost to whichever team is able to defeat it. This is a major objective that both teams want to go after. Roman is patrolling the area to see if L9 tries to go for Baron.

Roman’s game plan is actually to first target a minor objective called Dragon before they go for Baron. Meanwhile, they’re keeping track of enemy players and sharing that information with their teammates. Roman gets some pressure from L9 but actually avoids a big team fight at first and tries to stay the course.

They reiterate the plan to not engage but begin to discuss how they might respond if a fight becomes inevitable. They notice that that several L9 players have wasted big spells that didn’t land and that they’re now vulnerable to counter attack. Finally one of Roman’s leaders sees an opening and decides to tunnel through the wall and attack L9 while they’re weakened and grouped up. Let’s watch the clip.

Jeff Bezos, CEO of Amazon, is famous for the idea of disagree and commit. The idea being a team doesn’t have to be 100% in agreement to work in a coordinated fashion and sometimes even if you don’t agree, the best thing to do is to be decisive and follow through.

And that’s what Roman does, they wipe out most of L9 and rather than going to the lower tier objective of Dragon, they’re now free to take the more valuable Baron without interference — which all but secured their final victory. This is an example of great team communication: planning, sharing of information, responding to an opportunity, and encouragement and follow up.

The work we’re doing with these teams is still very early but we have found that helping teams understand their communication has allowed them make better decisions and play more competitively in high pressure situations. It’s an example of how important voice has become in these team games

The next trend I want to share is voice-first games. In the United States, more than half of all households have a smart speaker like Amazon Echo or Google Assistant. Apple has Siri on all their devices and now you can get Siri access on your AirPods. Baidu’s own smart voice assistant DuerOS is now installed in over 400 million devices. So these products are in our homes, our cars, and really everywhere we are.

These smart speakers of course now have games. The first games are pretty nascent: one popular genre is trivia, often of music or pop culture questions. Other games are similar to old school text based adventure game, where you have to navigate the world based on verbal descriptions and provide voice commands to advance the action.

One interesting spin on this idea is a game called Yes Sire, where you play as a feudal lord making decision one after another that affect your fiefdom while trying to avoid exile by the King. This game is on Amazon’s Alexa platform, but I’m sure there is a game like it on DuerOS, if not now, then in the near future. Here’s a short clip of me playing through 4 tricky decisions. Unfortunately there’s not much to see since it’s audio only, so listen carefully.

It turns out that being a feudal lord is hard work. Who knew?

While we may find this game rudimentary and simple today, remember that people used to say that about mobile games and look how far they’ve come. Millions of people play single and multiplayer games like Angry Birds, Clash of Clans, and Honor of Kings all over the world.

Max Childs runs a game studio called Volley that created Yes Sire and many other popular voice games. He believes we’re just at the beginning of this journey:

“Voice enables a live multiplayer experience in a way that anyone from five years old to 85 years old can pick up immediately”.

The voice games we have today are just an exciting and early indicator of where they might go, and something worth following.

I’d like to share one final trend that is just starting to emerge, and that’s voice enabled tools.

One of the joys of modern gaming is the intensity and focus required. Even if you’ve got a lot of stress in your life at school, work, or home, firing up a good game clears all that away. But you have to give the game your full attention if you want a chance of playing your best. Which means that anything that distracts you or takes you away from the game can be frustrating. As games have become more complex, they require a lot more knowledge and skill, and many sites like or ProGuides exist to track win rates, document new techniques, and keep up with game updates. How will technology transform this experience?

Well, just as we have voice assistants to help us figure out how to get to work, or cook a recipe, or find a piece of information, why wouldn’t we also have voice assistants for our games? Whether it’s looking up the latest patch changes, getting advice on the right champions to pick, or tracking information during gameplay, voice intelligence is a natural extension of gaming technology. Rather than having to go to a separate site or mobile app, it can be as easy as asking your gaming assistant.

In 2017, Destiny 2 launched with an Alexa skill called Ghost that allowed you to call your teammates for help or swap out the weapons you were using. While the skill had some issues, it was a great proof of concept and voice companion apps are continuing to evolve.

My company is working on voice tools to help players stay in the game — from single player ones like Stardew Valley — and multiplayer ones League of Legends. Here’s a clip of what we’re imagining:

Where else could voice technology take us? How about as an added dimension in playing the game?

One of my favorite types of games are story-driven single player adventure games. Horizon Zero Dawn was a game for Playstation 4 that offered not just great graphics but a compelling story of the future and a heroine you could root for. The dialogue system allowed players to choose between one of several options when speaking with other characters in the game. There was a aggressive response, a more witty response, and a more heart-felt response. By selecting among the options, you could shape how the conversation would go.

But what if instead of simply selecting the dialogue approach, I could speak into my headset and allow that option to be chosen? What if I could use my voice as a way to navigate a game just as I use my hands? This would add a further level of immersion into the game and making the playing experience that much more enjoyable.

At their core, games are a way for us to challenge ourselves and our friends, explore new worlds, tell compelling stories, and exercise our creativity. Technology alone does not guarantee a great game, but like a video camera, a paintbrush, or a typewriter, it can be a tool for creating magical new experiences.

We moving to a world where machines can listen, understand, and respond to human voice like never before. We need to make sure we harness this technology ethically and responsibly because it’s already transforming the way we work and live. I can’t wait to see (and hear) how this technology can help us play better together.

Audience Q&A

After the talk, there were a few questions from the audience. I’ve included them and my answers here:

Q: How do you see AI evolving and further impacting gaming in esports?

A: I mean, we do have AI that are playing, not just go but Starcraft data, you know, the we are training them in that way. But what we see with chess is that the most exciting chess now is happening with human aided by computer. So it’s a team, the human and the AI work together against another human and AI. And you could see that being a potential, it already is happening in certain ways. Having that go further would be really interesting. So that’s my thought.

Q: What suggestions do you have for gamers were interested in turning their love of esports into a career?

A: I was a competitive gymnast growing up. So I actually competed in gymnastics in the NCAA. And that’s how I relate to a lot of esports is the same focus, discipline, training, learning, dealing with setbacks and injuries and losses, is tremendously valuable. So my advice is:

  1. Recognize that what you’re learning through being a competitive player can translate into so many other parts of the world. And that’s something to value cherish, even if you don’t go on to become number one in the world at your game.
  2. Understand that esports is an industry — it’s a business enterprise, like many other things, and there are many different roles, positions you can play. There’s media that covers it, there are analysts that study the game, there are coaches, there are marketing people, there are sales people, there are designers, engineers, all different kinds of roles.

You can be a proplayer, but you can be so much more than a player and you know, get out there and develop those skills that so you can contribute because these teams need your assistance, they’re not making money, if you can say, “Hey, I have a skill and I’ll willing to help you for free”, they will take it.

(Note: that we’re not saying that people shouldn’t be paid for their work or skills — Midgame has a paid internship program—we’re just acknowledging that a lot of esports is powered by volunteer/unpaid efforts).

Q: What does “Level Up”, the theme of our show, mean to you?

A: So what “Level Up” means to me is facing things that you’re not familiar with, that are hard for you. Because it’s easy to do things that you’re comfortable and familiar with. But it really is when you challenge yourself, whether that means socially, whether that means physically, whether that means intellectually. And that’s what where the growth is.


Your trusty gaming sidekick

Jason Shen

Written by

Serial entrepreneur & Asian American advocate. Co-Founder and CEO of - esports analytics co. TED, Etsy, Stanford, Y Combinator alum. BOS ✈ SF ✈ NYC.



Your trusty gaming sidekick

More From Medium

More from Jason Shen

Also tagged Voice Technology

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade