How Headroom is Using the Latest Innovation in AI to Build the Future of Remote Collaboration

5 min readFeb 17, 2023

I worked for many years at big tech companies in offices around the world from Moscow to London to Mountain View, California. These global roles required that I spend much of my time on virtual meetings with participants located across the globe. In these varying roles, I witnessed first hand the evolution of remote work and collaboration.

Interestingly, while many other aspects of our work life have evolved dramatically over the last couple of decades, the way we meet and interact during meetings has not changed much. At the start of my career, I would sit on conference calls and take notes on a notepad. Now, I can connect on a video conference and take notes in a word document, an email or a note app. While now digital, this progression has not changed much in terms of how I work with my remote colleagues. I am still relying on real-time audio and my ability to capture the important moments of the conversation while also participating in the discussion.

This is a problem because humans are generally pretty bad at multitasking. Taking notes and writing follow-ups while also trying to listen, participate and retain information is a tall order. We typically end up doing one of those things well to the detriment of the others. Solving this problem, in part, is what attracted me to Headroom, a first-in-class video conferencing technology powered by artificial intelligence (AI) that provides embedded experiences to collaboration platforms.

The other main factor in my move to join Headroom as the Head of Partnerships has to do with the recent exponential growth in the field of AI and machine learning (ML). As a professional with a background in adtech and the gaming industry, this wasn’t necessarily an obvious jump, but the progress and potential of AI was too intriguing to ignore.

According to Grace Chang, founder and CEO of Kintsugi, in Forbes “We’ll likely mark 2022 as the year when AI became truly accessible to the general public.” But the innovations that allow us to leverage multi-modal AI to power smarter remote meetings started long before 2022.

In order for AI systems to learn and make accurate predictions, they require vast amounts of data. That data is gathered and organized in a way that large language models (LLMs) can learn and generate from. Two such advancements in data acquisition are key to enabling smarter meetings: audio-to-text (training a model to convert spoken audio to an intelligible text format) and pixel-to-action data acquisition (training a model to make decisions based on visual inputs, such as video or pictures).

These advancements allow us to use large language models (LLMs) to extract useful information from virtual meetings, such as action items, summaries and highlight reels. At Headroom, we aim to bring the best parts of human interaction to virtual meetings, while automating the parts that detract from connection and collaboration. By leveraging the latest innovation in AI and ML, we can capture meeting context, generate useful notes, assign action items and provide rich text summaries. We then make all that information searchable and shareable so meetings are not just moments in time, but contribute to a personalized knowledge base.

I was excited by recent news from Microsoft that Microsoft Teams, based on their investment in Open AI, now integrates with GPT-3.5 to power automated meeting notes, recommend tasks and generate personalized highlights to help their users extract the most important information from their meetings. This confirms there’s an appetite for what we’ve been building at Headroom for the last two years: a communication platform that captures, analyzes and surfaces meaningful moments of collaboration.

It’s not just Microsoft that’s applying generative models to the problem of unproductive meetings. There are swaths of startups tackling bad meetings. But at Headroom, we’re not just offering an alternative to traditional video conferencing; we’re bringing AI-powered, embedded video conferencing to collaboration tools, from whiteboards to task managers, to increase collaboration effectiveness and lead to higher productivity by teams. And there are two primary concerns these platforms should consider when determining how to augment their services with AI: safety of user data and cost.

With LLMs, an important consideration is the handling of private content, as they can easily leak information in the process of training or fine-tuning. For example, Schneier reported a tip that Amazon suspects ChatGPT is ingesting proprietary corporate information when employees use the service as a coding assistant.

When you think of the information discussed in meetings, it’s easy to see why it’s critical that sensitive material is not used to train any model that is accessible outside your organization, or your users’ organizations. That’s why, at Headroom, we employ a hierarchical model structure to ensure that organization-level content is being used only to fine tune an organization-level model, available only to those inside that organization. Such an approach allows you to be confident that your private and proprietary information will not leak accidentally to users outside that organization.

Second, you must consider cost. Looking just at information extraction (though Headroom trains on multi-modal AI) it is possible to utilize the powerful LLMs on the market to extract and summarize meeting information. But it will cost you; extracting important material from a one hour meeting by a fine-tuned davinci model will cost you around $1.50 (based on 9000 words per hour estimation).

When you multiply that by the number of users you serve, the amount of meetings they may have and divide by the average number of people in the meeting, you will come up with a considerable sum. For example, an organization of one hundred people each attending ten hours per week of meetings with an average of two people per meeting incurs a $2000 per month cost, just for the LLM. Any lower-cost solutions are either using less powerful models to process your data, resulting in inferior results, or likely selling your data for the training of better models. Neither are optimal outcomes.

Instead, at Headroom, we utilize a number of smaller, specialized models to achieve specific tasks such as visual engagement tracking, live action item detection and summary generation. By breaking our models down according to their precise tasks, we are able to offer secure, personalized, state-of-the-art discriminative and generative features at a fraction the cost of larger models.

Meetings are simple; productive, collaborative and effective meetings are hard. That’s why we’re utilizing the latest innovations in AI and ML to amplify virtual collaboration in all the places teams love to work, without compromising security or cost accessibility.

How Headroom is Using the Latest Innovation in AI to Build the Future of Remote Collaboration

Written by Seva Leonov