Transforming Meetings and Docs with AI Speech-to-Text
As a technical guy, I hate meetings. Even more, I hate taking meeting notes. It divides my attention during the meeting and exhausts me greatly.
At some point, I decided there may be a way to improve this process, make it easier, or even make it better.
That’s how I started experimenting with AI speech-to-text (transcribing).
Why AI speech-to-text is a Game-Changer for Meetings
In addition to never ever creating meeting notes, I now have the following bonuses as well:
- Comprehensive Records: Easily searchable and organized records of every meeting.
- Conversation History: Referring back to specific discussions.
- Improved Accountability: Simplified tracking of commitments and action items.
- Increased Focus: Engage in meetings without the distraction of writing everything.
- Faster Decision-Making: Instantly usable documents for quick implementation.
- Easy Documentation: Turn verbal onboarding into written tutorials.
- Multilingual Support: Effective speech-to-text for meetings in other than English.
While the benefits were compelling, the journey to achieving them wasn’t straightforward. It all started with an experiment.
How I Experimented with AI speech-to-text
Initially, the challenge was gathering the recordings. Of course, I can always record something on the spot, but I needed more. I needed audio recordings that are both concise and meaningful, recordings that actually showcase a real-life scenario.
Then, I remembered that I have been a technical trainer for the past 15 years. I have an almost unlimited amount of video tutorials. They presented a perfect use case for using them to create written tutorials.
Later, I tried this with recorded meetings, podcasts, and more.
Tools and Solutions for AI speech-to-text
After obtaining the recordings, I was ready to proceed to the next step, i.e., using AI speech-to-text on the recordings and producing a result — meeting notes, tutorials, etc…
I used a combination of tools:
- Locally running OpenAI’s Whisper
- There is a service available, but as a tech guy, I wanted to try it out myself. - ChatGPT, Claude, or any other AI chatbot.
- I prefer Claude for now, but others work well, too. - FFmpeg to extract audio from video recordings.
The results were great! After the initial experimenting overhead, I was able to generate the needed output in like 15 minutes!
Now, I use it daily.
Practical Applications of AI speech-to-text
Create Technical Documentation
A couple of weeks ago, Joro, the team leader of SoftUni’s Judge system, held an onboarding session to introduce new team members to the project. He gave an extensive overview of the system’s components and functionality in an audio-recorded meeting.
To repurpose this recording into documentation, I took the following steps:
- Extracted the audio from the video recording using FFmpeg
- Fed the audio through Whisper to get an accurate transcription
- Provided the Whisper transcript along with some context about Judge to a chatbot (ChatGPT/Claude/other)
- Instructed the ChatBot to generate technical documentation based on the transcript, summarizing the key information about the system
- Edited the ChatBot’s output to polish the documentation, correct any unclear sections, and format it properly
- Shared the documentation with the team as a reference for ramping up on Judge quickly
This demonstrates how AI transcription and writing assistants can transform meeting recordings into usable artifacts like technical docs, user guides, readmes, and more. The combination of speech recognition and natural language generation makes it possible to unlock trapped information and create high-quality documentation with minimal manual effort.
Automated Meeting Notes
It is pretty straightforward, and I use it all the time:
- Record the meeting
- Transcribe it with AI speech-to-text (Whisper)
- Provide an AI chatbot with the context of the meeting, i.e., “Meeting to discuss feature X for client Y.”
- Feed the AI chatbot with the transcription
- Feed the AI chatbot with the agenda, if available
- Feed the AI chatbot with the attendees — this is usable for accountability later on
- Ask the AI chatbot to write meeting notes
- Cleanup the notes
The results are great. The AI chatbots make some mistakes with non-common names and nicknames(in Bulgaria, Gogo, Joro, and Gosho are common nicknames for Georgi), but with a few corrections, everything turns out great. In addition, now we have easily searchable content from meetings.
Transform Podcasts into textual assets
I collaborated with Georgi Nenov, host of the famous Bulgarian podcast “The Superhuman” (“Свръхчовекът с Георги Ненов”), to set up a workflow for transcribing his podcast episodes.
The steps included:
- Installing Whisper locally on his computer to control the AI speech-to-text process.
- Showing him how to use OpenAI’s Whisper to automatically transcribe the audio from his podcast episodes with great accuracy, even in Bulgarian language and dialects.
- Demonstrating how he can take the Whisper’s transcripts and feed them into AI chatbots like Claude and ChatGPT to generate overviews, key takeaways, citations, and other derivative content.
- Providing tips on how to edit the AI chatbot outputs to clean up unclear or incorrect sections.
- Discussing workflows for repurposing the transcripts into written blog posts, social media snippets, and more.
- Advising on best practices for AI speech-to-text and repurposing podcast content using this AI-powered approach.
Georgi was already on the path of creating transcriptions from his audio content. As we met, he mentioned that this proved time-consuming and expensive. Using AI speech-to-text tools like Whisper enabled him to get the transcript quickly and lowered the cost.
Tutorials from Lectures
As a technical trainer, I frequently give live coding demonstrations during my programming lectures. While this helps students learn interactively, it can be difficult for them to replicate the examples on their own later.
To address this, I have started using AI speech-to-text to turn my lecture recordings into step-by-step coding tutorials. The process involves:
- Recording my lectures with audio and screen capture
- Using FFmpeg to extract just the audio track
- Feeding the audio into Whisper to get a transcription
- Selecting relevant code snippets from my demo to provide context
- Instructing a chatbot to generate a tutorial in Markdown based on the transcript and code
- Editing the Markdown to polish the tutorial content and format it nicely
- Publishing the tutorial on GitHub/GitLab for students to practice the concepts at home
The results have been fantastic. By combining Whisper’s accurate transcription with an AI chatbot’s natural language capabilities, I can create custom tutorials that walk students through the live examples at their own pace. That enhances learning outcomes as students can solidify their understanding by coding the solutions themselves. The ability to do this with minimal effort has transformed my workflow and teaching approach.
Here are some examples:
Are you wondering how to implement this in your workflow? Allow me to break down the process for you. It’s pretty straightforward.
Step-by-Step Process to Use AI Speech-to-text
Requirements:
- A good microphone is always better, but bad recordings still work relatively well.
- A PC with a good, but not all-powerful, GPU
- Optionally, a software that can extract audio from a video recording. I use FFmpeg
- Recording software. I use OBS, Google Meet, or Zoom, depending on the meeting or presentation.
- Record a meeting.
- (Optional) If your recording is a video recording, extract the audio
- Feed the audio to OpenAI’s Whisper. It works surprisingly well. It even understands Bulgarian with dialects!
- This will take some time, depending on the machine. But this is done in the background, so you can do something else while it’s transcribing
- My MAC with M1 MAX processor transcribes a 1-hour recording in around 40 minutes. - Feed the transcript and some context into an AI chatbot.
- The context can be code, meeting agenda, attendees, etc.
- I am still deciding whether ChatGPT or Claude is better. - Ask the AI chatbot to output.
- It can be a tutorial, technical documentation, issue description, meeting notes, etc. - Cleanup the output, i.e., fix unclear/wrong results from the AI chatbot
- Voilà! It works like a charm.
Considerations and Future Prospects
All this is very exciting, but I still am unsure about some things:
- Privacy: Using Whisper locally is great as the audio recording stays internal this way, but using Claude or ChatGPT can lead to a potential leak of the transcript?
- Maybe introducing local LLMs can solve this? - Data Integrity: Errors in speech-to-text and AI bot-generated text could propagate misinformation.
- Introducing validation processes?
- How can the risk of inaccuracies be mitigated?
- Maybe this cannot be fully automated? - Improving processes: Now, as a direct consequence of the processes, we keep meetings more structured, as this proves to be better for the final output.
- Leading to shorter and more concise meetings? - Speaker Diarization: This can help with recordings that have multiple speakers.
- Is this really important? Maybe for some cases
These considerations lead us to ponder the broader implications and future potential of using AI professionally.
Conclusion
I am very excited about the potential of AI speech-to-text. It has already helped me create meeting notes and documentation much faster. But this is just the beginning.
There are so many possibilities to explore. Analyzing meeting transcripts could uncover insights about project status and team dynamics. Integrating AI speech-to-text into more processes will likely lead to better collaboration and productivity. I will continue experimenting. We can transform how we work with the right balance of human creativity and AI capabilities. Meetings and documentation are just the start.