A closer look at the APIs fueling the new wave of video chat apps
In January 2014, a former WebEx engineer launched a video communications company that would go on to raise $125m from VCs and scale to serve hundreds of millions of end-users. Eric Yuan of Zoom? Try again.
Tony Zhao is the Founder and CEO of Agora, a video chat/streaming API that powers applications like Bunch, Run the World, Airmeet, Pragli, and Talkspace. As video chat boomed during COVID, Agora’s network saw a 300% increase in signups, a 60% increase in usage, and 40 billion minutes streamed on 1.7 billion devices in April.
In “The Verticalization of Zoom,” we looked at the 250+ companies carving out their own verticals of video communication. Now we’ll explore the evolving API landscape that is fueling this growth by making it easier than ever to build video chat applications.
API Growth Waves
Infrastructure APIs are important because they can handle highly complex processes at scale with just a few lines of code. They commoditized value centers for industry incumbents and lower the barrier to entry for new entrants.
The most successful APIs have benefited from (and enabled) massive paradigm shifts. Twilio was launched just before the growth in mobile messaging, Stripe grew during the shift to online commerce, and Plaid benefited from the proliferation of personal finance applications. In 2020, it’s APIs like Agora that are seeing rapid growth in response to the surging demand for video chat and streaming.
The Video API Wars
Most video chat APIs are built using WebRTC, an open-source real-time communications protocol that enables video chat without plug-ins or downloads. Popular Video APIs like Agora remove the complexity out of WebRTC implementation, offer additional features, and provide more scalability.
Video API providers look to differentiate through the ease of implementation, customization & features, reliability, and price. Larger companies have increasingly been entering the space through M&A:
- Twilio acquired Tikal Technologies (2016), the team behind Kurento, in order to launch its video calling API
- Vonage acquired TokBox (2018) to expand its communication offerings
- 8x8 acquired Jitsi (2018) an open-source video chat from Atlassian
- Enghouse Systems acquired Vidyo (2019), which had previously raised $170 million in funding but struggled due to new competition
- 8x8 acquired Wavecell (2019) to broaden its CPaaS segment
- Dolby acquired Voxeet (2019) to offer 3D audio and HD video APIs which recently relaunched as Dolby.io
Even though it’s a crowded segment, startups are still launching new video chat APIs. Sendbird launched its Video API in early 2020 and acquired Roundee to further advance its video capabilities. Daily.co recently released its video chat API in 2019. The company originally sold hardware/software subscription for conference rooms when it went through YC in 2016. However, they pivoted and raised $4.6m to power emerging video apps like Tandem.
Beyond video communication, companies are using streaming APIs to host live events, concerts, and performances. Well-known streaming applications like Periscope, Kaltura, and Vimeo were built using Wowza. Another YC grad, Mux, has raised over $30 million and provides video streaming and analytics for Crowdcast, Udemy, and Wistia.
APIs Beyond Video:
There are even more categories of APIs that provide more advanced features and AI capabilities around the video chat:
Video AR & Avatars
DeepAR, ARGear, and Banubu provide AR masks, lenses, and backgrounds for users to customize their appearance and surroundings. Video AR is not just for self-expression, but can provide users with more privacy and comfort on calls. Loom.ai and Avatar SDK give users even more control of their digital identity with custom Avatars that mimic facial movement and gestures. Avatars may be an ideal compromise between the privacy of audio calls and the “presence” of video.
Audio Improvement
On a video chat, poor audio quality is often more distracting than poor video quality. Dolby.io launched APIs that automatically adjust loudness, audio levels, and voice clarity. Companies like Krisp and BabbleLabs use AI to mute background noises from conversations— such as dogs barking, babies crying, and cars driving by. Discord recently integrated Krisp’s noise suppression to improve the clarity of communication for its 100m+ gamers.
Captioning & Notes
Transcription and translation APIs enable real-time captioning, meeting notes, and searchable archives of conversations for compliance. Large companies like Google, Microsoft, and Amazon offer their own transcription engines but there are still startups like Assembly AI and Deepgram (who just raised $12m) that are focused on increasing accuracy through more custom ASR models.
Conversational Insights
Other APIs leverage AI to provide deeper insights into conversations or participants. Kairos provides facial recognition APIs that estimate a user’s age and demographic. Banuba is able to track facial micro-movements to detect a user’s emotions, heartbeats, and gaze. nVISO’s API reads faces to help medical professionals assess a patient’s pain level. Oto uses “Acoustic Language Processing” to analyzes both words and intonation, which prove more effective than NLP alone for analyzing sentiment. Similarly, Voicebase measures conversational insights such as overtalk, silence, and sentiment to predict a customer’s likelihood to churn or openness to a new sale.
Collaboration
Screensharing can be done through WebRTC, but APIs like Surfly offer more powerful sharing features like control switching and faster performance. To improve remote collaboration, companies like Miro offer whiteboarding APIs that can be embedded into applications.
Next… no-code video chat?
Infrastructure APIs have made it much easier for developers to design new video chat applications, but we’re also interested in a future where the end-user is empowered to customize video chat for their particular use cases.
Today, non-technical users can build a website on Webflow, a database on Airtable, a web app on Bubble, and a voice app with VoiceFlow. Who will build the no-code for communication apps?
To paint the picture, here is an illustration of how I envision a customized video conferencing app for one of my own use cases — intro calls with startup founders:
When I log in, a prepopulated profile would appear with helpful context on the founder/startup (bio, recent Tweets/news, funding rounds, etc.) Below that would be our recent email exchange and the attached investor deck that I can reference within the app.
On the other end, the founder sees a bio of myself and TechNexus. They can scroll through our portfolio to get a better idea about our investment focuses. Ideally, they would have done this ahead of time, but I know everyone is busy.
Embedded meeting notes would appear on-screen (no switching between apps!). These will sync with our CRM.
Throughout the call, an AI bot provides real-time insights based on our conversation. For example, when the founder mentions a competitor, a mini-profile of that company appears for me to take a glance or save to my notes.
The same AI bot reminds me that I have another call in 5 minutes so I need to wrap it up. Fortunately, the prominent countdown timer has helped keep the meeting focused.
At the end of the call, I can select a “quick action.” In this case I’m really interested in this company, so I pull up my calendar within the app to schedule a follow-up meeting.
Following the call, I immediately get coaching feedback from our AI bot — did I dominate the conversation? Cut them off too often? Sound disinterested? etc.
I imagine there are endless possibilities as we consider the needs of fitness coaches, teachers, financial advisors, etc. The endpoint for the verticalization of Zoom will be a no-code tool that gives users the power to design highly custom video apps for their particular needs.
If you’re working on this, I’d love to learn more 🤙
Are you building a startup? Reach out 👋 @oslundjj
Credits to @itsalisonh for the logo charts
Note: Several companies discussed are portfolio companies of TechNexus, including Voxeet (acquired by Dolby), Krisp, and Assembly AI