The Modularization of Real-Time Communication

Interview with Agora.io CEO Tony Zhao

Tony Zhao, Agora.io CEO

These days it’s getting increasingly easy for people to build full featured products and companies. Why? Well, fundamental features are becoming modularized and can be readily incorporated into existing software. That way, companies can focus on developing their core capabilities while out-sourcing other requirements to 3rd party providers. For example, use Strikingly to build your own website within an hour, activate payments with Stripe, and host everything on AWS. Or in the case of most mobile apps, incorporate Agora’s SDK or API suite to active voice and video calls over a public internet.

Today we have joining us Tony Zhao, who is the founder and CEO of Agora.io. Agora’s SDKs enable mobile and web users to make high definition voice & video calls, share screens, whiteboard, and use data channels to exchange files.

Tony helps us understand: what are major trends in the global VoIP market and what are key success factors? Who are top competitors in this space and how do they differentiate? Why is Agora rolling out a virtual lens SDK on top of its classic VoIP offerings? Why does Agora have extensive operations in both Silicon Valley and China? What are examples of Chinese innovation, especially around video based social engagement and networking?

Listen to the podcast on SoundCloud (here)

Audio transcribed by Shaolong Lin


An Introduction to VoIP

Adam: Today we have joining us Tony Zhao, who is the founder and CEO of Agora.io. Agora’s SDKs enable mobile and web users to make high def voice & video calls, share screens, whiteboard, and use data channels to exchange files. Agora’s proprietary, proven technology handles billions of voice minutes every year, which allows developers to support as many over 2,000 users in a single call.

Before founding Agora.io in January 2014, Tony was the CTO of YY.com, a $4 billion NASDAQ listed social, gaming and entertainment company, that supported over 400 billion VoIP call minutes a year in China. Before YY, Tony was the founding engineer and first VoIP & video communications engineer at WebEx, acquired by Cisco in 2007 at $3.2 billion.

So Tony, welcome.

Tony: Thanks for having me here.

Adam: It’s our pleasure. Today we’ll cover the VoIP (voice over internet protocol) market, Agora’s value proposition and strategy, and how it compares to well-known companies such as Skype. But before we do that, can you tell us a bit more about your experience from WebEx to YY to Agora?

Tony: I come from a technology background and have been a programmer since high school. For WebEx, that was the first time I started to program something that transmitted real-time audio / video through the internet. Initially, I really hated the job because every time I made a version of the audio or video call, there would always be people coming back to me asking, “why is the quality not good?” Eight times out of ten I would eventually prove to them that it was a network or device problem.

At that time I felt frustrated because if it was a network issue, then how could I, a software module programmer, possibly solve that? It was really hard to explain to people, and it was hard to fix from my end. Later, with my experience at YY (gaming communication platform), for the gamers there really is no time to explain to them that it’s a network problem as opposed to the software. I had no choice but to focus on improving the software implementation algorithm. That was when we started to build new technologies / platforms that could improve the audio and video call quality over a public internet.

Of course, as you’ve mentioned, YY eventually grew very big. In 2012, it served 400 billion voice minutes in that single year. While at YY, I felt that this technology, or this direction of calls made over the public internet, was much more valuable. It could be widely adopted by many mobile apps or new types of applications in a lot of verticals. That’s when we started to build Agora.io to make a simple API and to help developers integrate audio and video call features.

Adam: That’s very interesting, and actually quite sounds like the story of Slack. They started off doing gaming as well but eventually switched to building an internal messaging app which turned out to be very valuable. And now they’re worth billions!

Before we get into Agora, can you actually take a step back and help the audience understand the market itself. What does the VoIP market look like, what are key segments and where do you guys play?

Tony: In the past there weren’t any professional service providers like us. They were all direct products, like WebEx, Skype, Slack, Facebook or Wechat. They built something on their own. With growth in the market of third party professional service providers, it doesn’t make sense for every company to build an entire technology stack to support a single feature, as it might not be the most important feature of that company. That’s where we started to grow.


What is Agora.io?

Adam: Right, and we see that trend happening across the market. You see all these different services and technologies that become modularized, initially addressing start up and SME needs and moving to larger corporates as well.

But let’s get back to Agora again. I provided a quick introduction, but I don’t think it does your company justice. Can you talk us through your product and how it works?

Tony: Sure. We build a very simple API and make it user-friendly, as our goal is to help developers integrate real time video / audio or interaction features into their apps so that they don’t need to do it all over again from scratch. And the value we provide is not just to save them time and money, but also to provide them with an exponential quality advantage.

The VoIP algorithm side is already complicated, but the customer might also run into diverse network issues and all the last-mile issues. If the call happens to go through long links, like from China to US, or from India to Europe, a lot of time those connections experience packet loss, so if you want to work around those network issues it becomes extremely difficult. We create this virtual network with around 100 distribution centers across the globe, and between these distribution centers we have our own transmission routing path which helps us to dodge around those bottom lines and packet loss. That way our customers will enjoy a premium quality of VoIP call from anywhere around the world.

Adam: As you’ve mentioned, there are lots of underlying infrastructure that require not only the software side but also the data centers and the global network. So how much of that do you cover? How much is really under your control, and how much is not?

Tony: We cover pretty much every continent, and every large population center. Our infrastructure grows with our customer expansion. As the volume grows, if we identify any specific problems in that region, we would come up with improvements including on the algorithm side or the deployment side. On the deployment side, we might test and pick one of the distribution centers that can cover that region better so that the region’s quality will be up to standard.


Mobile VoIP Competitive Landscape

Adam: Is that something other companies are doing as well? Are comparable companies providing the full service including global data center coverage, or are they more focused on algorithms, for example?

Tony: There aren’t many comparable companies that are just doing video / audio call. What we do is essentially a software defined network, a proprietary network. We organize all the routing paths. It might not be just a simple reply, but rather a more complicated routing architecture. For any call, it probably routs through not only one distribution center, but two or three so that the quality is insured across the data centers.

The part we don’t have control on is the last-mile side. Say you’re at a Starbucks, where the packet loss can be extremely high, and the link can be unstable so that you fall off constantly. That part is really hard to fix, but we do have some algorithms to help manage that. For example, we can use minimal bandwidth and minimal transmission to still allow the other end to hear clearly, and on the video side there are similar algorithms, making packet loss not as bit of an impact to the overall user experience.

Also, remember that we specialize on VoIP for the public internet. Let’s say for some traditional corporate high-end conference systems, they focus on improving the video quality, but they’ll run those in a corporate environment where the network is a lot better than the public internet. Our design really deals with all varieties of connectivity. For example, you could even have an Indian cellphone hook up through a 2G network!

Adam: And is that your core strategy, to focus on the mass market as opposed to the high-end corporate market, or do you also deal with some corporate clients as well?

Tony: We also deal with corporate clients as well, but it’s true that we focus on what we call “mobile first”. Smartphones are used by everyone for everything. We believe that smartphone usage is most important, so we try to optimize the experience for that.

Adam: If you look at the competitive landscape, there are some corporate providers, but also other mobile-first players. The ones that come to mind are Twilio, which went public last year, and Plivo. They also seem to provide similar SDKs for a lot of start-ups and increasingly some corporates as well. Talk to us a little more about those competitors and how you guys compare.

Tony: Twilio is a big company, and they went public last year. Their strategy is more on SMS messages, phone calls and other talk APIs. Every mobile app needs SMS registration or authentication features. Twilio’s started to provide video features as well perhaps a year ago, but I think what they provide is quite different from a lot of our implementations. For example, we have this virtual network, our own codec, and we focus on mobile experience instead of relying on WebRTC standards.

Adam: Let’s say I’m working at a startup that wants to use these SDKs in my own app. How would I go about choosing amongst the different options? For example, would I choose Twilio, because they are pretty big and well known, with a good customer service, or would I choose Agora because of the superior user experience.

Tony: It definitely depends on several different aspects. Number one is: what do you prioritize on an audio / video call in your product? Is it stability, e.g. can we hear each other clearly, or is it accessibility or pretty UX? For a lot of developers, I would say quality is most important, followed by simple and easy integration.

Adam: Last bit about integration and other usage, I imagine customer support is quite important (as a differentiator). What does Agora provide when it comes to that?

Tony: Customer support is definitely important. I’d like to think that the Agora product is simple and neat, so that without even needing to call support, most developers should be able to work through the steps. If there are special needs or specific questions for us, we have a support team available to help them, 24/7.


Move into Virtual Lens

Adam: That’s really good to hear. I won’t name some of the other companies, but after doing some research, it seems that while they provide a quality product and ease of SDK or API integration, sometimes the customer service can really be lacking. I think for Agora you not only provide good customer service but also own some of the data centers as well to facilitate in different regions.

Now, before we get to China vs. US, one more question about Agora. I noticed recently that you’ve launched Agora virtual lens, which is basically another new SDK that allows developers to add face-tracking and special effects to real time videos and even live streaming apps. Could you explain that further, why the focus on that?

Tony: Because we saw a lot of demand in social and gaming area, like Pokemon! For these capabilities, for example facing tracking technology, it’s part of the deep tech stack and not something a start-up of 2 to 3 people can build from scratch. We pack one of the best implementations into the product so that developers can have fewer lines of code. They can easily could add such features into their social app or gaming experiences.

Adam: What’s the traction look like thus far for Agora. Are a lot of folks jumping on to try it out?

Tony: We’ve been growing very fast. There are more than 30,000 developers already on the platform, and we are still growing. Virtual lens and those new features also attract a lot.

Adam: I quite like what you’re doing here. I think the thing about AR is that creating a consumer app with relevant use cases based on special effects is actually pretty hard. You have to make sure it’s focused and really test it out to make sure there’s good traction… not just initial engagement but longer term retention. But if you are creating the underlying tools itself and you open it up for other people to figure things out, then it’s a much smarter business model. Hopefully that works out for you guys and a lot of new cool apps are created.


Is Agora an American or Chinese Company?

Adam: A slight pivot now, and let’s talk about China vs. US. Our podcast is very much focused on learning about Chinese technology developments. First of all I noticed that you come from a Chinese background, but have experience on both sides. Agora is currently located in Santa Clara, but there’s significant operations in China as well. Could you explain that, is Agora a Chinese or a U.S. company, and how does that work?

Tony: I would say it’s a U.S. company, but not a traditional one. It’s really a Silicon Valley start-up, but also a distributed team where we try to find the best team we can in each region. For the engineering team, we put lots of people in the Shanghai office because we can build a large and effective team around there, and it helps that I know a lot of audio / video talent in that area.

As you mentioned, I am Chinese, but I have also worked in Silicon Valley for WebEx, and in my seven years saw the company growing from 10 people to a public company. It planted the notion in my head that a lot of great companies in Silicon Valleys are actually created by immigrants. As long as you focus on creating differentiated value for your customers, you will be able to penetrate the market and grow. I see us as a new type of start-up. We’re based in the U.S. market because it’s still the leading market, but we do have a globally distributed team.


China vs. U.S. Tech Environment and Innovation

Adam: Having worked on both sides of the market, what differences do you see in terms of talent, mindset, management requirements, and how do you accommodate different requirements on both sides?

Tony: There’re definitely a lot of differences. The China side is changing a lot. The engineering teams there are getting better and better, much more professional than before. The market is very interesting also, for example in the social and gaming space. China’s mobile usage is higher than that in the U.S., yet innovative new products still come from the states. Even in social, apps like Houseparty, Monkey are invented there.

The U.S. still has better 4G coverage, and average device quality is higher, so a lot of use cases can come out first in U.S. But with China’s population and its young generation’s love for social applications, expect China to come up with some very engaging, interactive use cases. These differences actually give us a lot of insight into what’s going to happen in the other market. There will be a lot of crossover.

Adam: Can you give us a few more examples that you see. On past episodes, we’ve covered mobile payment in China, Wechat pay, and bike sharing with Ofo and Mobike. We’ve looked at musical.ly, a Chinese company that went to the US. We looked at some mobile cleaner apps, Cheetah Mobile, etc. Given your breadth of experience and what you’ve seen, anything else you would like to share, anything interesting and innovative we should keep our eyes on?

Tony: On social side, you’ve probably heard about Momo, a social discovery app. Earlier this year, they started to talk about their new strategy of a video based social platform. This essentially means that all people start to do social activities based on video communication. If you look into their product, there’s interactive broadcasting, there are party games, there’s multiple video chats in interest groups, and of course there’re one-on-one video chats as well.

If you think about it, in the real-world people run into new friends in some social setting, for example at the bar or on the street. If you run into someone and start to chat on certain topics, you get to know each other and eventually you might become friends. For online social, before it used to be about adding someone as a friend, sending text messages and sometimes pictures or stickers. It’s not what people really do in the real life. Now with video call, you can replicate that offline experience online. I think that is one example that’s the trend.

We are seeing similar things happening in the US market as well. For dating apps, in the past you just used mobile apps to browse and connect with people. All you could do was send text messages and maybe schedule for an offline meet. But now you can save all of that hassle. You can jump onto a video call and start to get to know each other immediately. That’s definitely happening in the market.

Adam: That’s very interesting to note. I should give Momo a try! I know originally they were almost a Tinder-equivalent in China but it seems that they’re doing more live-video and social features as well. Sounds very exciting. Thank you so much for your insights Tony, and we’ll catch you another time.