Why We Built AI-Powered Note-Taking for Virtual Meetings

5 min readMar 20, 2023

If asked to list your favorite responsibilities of your job, I am willing to bet taking notes in meetings is not going to top your list. It’s a necessary part of working with others, particularly in a remote, global environment. We meet, whether in person, virtually or hybrid, to discuss, collaborate, share ideas and plan for the future. We take notes in order to capture all that important information and next steps. But the point of meeting is not the notes, per se; it’s the connection between people, the shared understanding and what you do with that information. Notes are only important because memory is a fleeting thing and we rely on records to surface what we forget, or miss.

The problem with this method is that, in addition to not being great at remembering, we humans are mostly not very skilled at multitasking. According to a study by a neuroscientist at Stanford University, multitasking can reduce productivity by up to 40%. If you’re in the large majority that struggles to juggle simultaneous tasks, you may resonate with the experience of attempting to take notes in a meeting, only to miss a chunk of what was said and find your notes lacking crucial context when you review them later. This is particularly true in a remote context where even basic communication takes an extra measure of focus and attention.

At Headroom, we believe that innovation in Artificial Intelligence (AI) and Machine Learning (ML) is perfectly poised to solve this problem and make meetings more productive, effective and engaging. Foundational models are able to understand large amounts of data (such as words, sentences and paragraphs) and generate new data that is similar but not exact. In recent years we’ve witnessed growing popularity of these models; they are particularly good at (and explicitly trained to do) interpreting text.

The power of Headroom is that we go beyond processing text and have instead built a truly multimodal model, one that builds on the innovation of these foundational models, combined with our proprietary models that understand video and audio, to create models that don’t just summarize text, but understand context and emotion, and use that information to inform the output.

Practically, this looks like producing automated meeting notes in the form of text recaps and action items. When you meet in Headroom, we record the meeting and leverage multimodal inputs, including transcript and video, to produce a text summary (recap) and a list of action items with assignees. We use computer vision to track eye gaze, detect head motion, and facial expressions, among other inputs, to provide greater context for the summarizations. As a result, the “essence” or “energy” of the meeting is also conveyed. These recaps and action items don’t only reflect what happened in a meeting, but also the relative importance the group associated with those topics based on engagement metrics.

With AI-powered meetings, humans can do what they do best: converse, emote, gesticulate and engage with each other while discussing important topics. In essence, they can be present and leave the rest to AI. As a Product Manager, this has numerous practical applications for my daily work. First, recaps allow me to not worry about attending every meeting my team holds. I can miss a meeting and have confidence that at least the major points will be communicated immediately after the meeting ends in the form of a generative recap. If I have questions or want to dig into a specific point, I have the full meeting recording, as well as a video highlight reel, at my fingertips. This frees me up to have more focused work time and direct my meeting time to the discussions where I know my input will be most valuable.

Second, I can be more attentive in the meetings I do attend rather than scrambling to submit tickets for issues as they’re raised. Every meeting is followed by a list of what happened in that conversation and what needs to happen next. I use the recap and action items to create tickets, adjust roadmaps and update stakeholders. In the future, we plan to automate the process of getting these items directly in the systems where you track projects and issues for a truly seamless experience.

Third, recaps and action items make it easy to share knowledge gained in a meeting around my organization. Instead of sending a memo or meeting to share an update with my supervisor or any other stakeholder, I can simply share the text recap. This has the dual benefit of saving me time and making the content of my meetings a shareable (and searchable) knowledge repository.

These developments are exciting and already changing the way we work, but it’s just the beginning of our vision to increase team productivity and effectiveness through connection, understanding and knowledge sharing. Our recaps and action items are generative, meaning they are not pulled directly from a transcript, but rather created from multimodal inputs. Because of their generative nature, they are not perfect — no model is! But we’re constantly improving our models, adding and experimenting with new inputs to produce the best possible result and protect against hallucination.

We’re also interested in providing more detailed statistics about how a meeting went, beyond just a gauge of group energy. Emoting is one of the primary ways humans communicate and it’s rarely done with words. Training our models to better understand human emotion, particularly on an individual, personal level, will improve the accuracy and value of generative outputs, such as recaps and action items.

Last, and most importantly, we are building intuitive ways to leverage these generative elements in the ways you already work. Our goal is to make it easy to get recaps, action items and replays into whichever systems you use to collaborate and manage your projects. Essentially, we view embedded intelligent video conferencing as the future of collaboration for tools such as visual design, white-boarding, and project management, so your discussions about your work are no longer divorced from the work itself.

There have long been fears that as technology advances, we become less human, less aware of one another and less capable of genuine connection. And in many ways, this is true. But innovative technology, particularly in the area of AI, also has great potential to make us more connected by offloading the tedium of distracting tasks, like taking notes in a meeting, so we can focus on one another and the interesting ideas fueling our work.

Try Headroom today to experience the power of AI to automate note-taking for your virtual meetings.

Why We Built AI-Powered Note-Taking for Virtual Meetings

Written by Jon Pappas