AI-Powered Media Processing: Smooth automated annotation and poster creation for audio content

Andrew Zaikin

Published in

firstlineoutsourcing

3 min readSep 25, 2023

This article is the last article from the series of two. Subscribe to our blog to be notified about new articles.

Last time, we described how to make an annotation and a poster with OpenAI API and Midjourney.

Some time has passed since the publication of that article, and during that time, changes have occurred in the tools we use. For example, OpenAI introduced their updated 3.5 class model: gpt-3.5-turbo-instruct. GPT-3 models aren’t trained to follow user instructions, but InstructGPT models (highlighted) generate much more helpful outputs in response to user instructions. The Completion API has become a legacy. Chat Completion API should be used instead. GPT-4 model is available to use with API, but it is 20 times more expensive than gpt-3.5-turbo.

This solution provides an automated and streamlined process for annotating audio content and generating posters. It is designed to simplify the annotation process and increase efficiency in managing audio archives. By leveraging this tool, users can quickly and accurately create side files and metadata for their audio content, as well as visually appealing posters that can be used to promote or showcase the content. It is a valuable asset for businesses and academic organizations looking to improve their audio content management workflows and enhance the discoverability of their audio archives.

What kind of cases this automation can be used:

Podcast Production
Audiobooks e-shop
A public platform for digital content
Social networks
Media asset management (MAM)

Let’s consider the last example. We at First Line Outsourcing help media and broadcasting companies with challenges in their daily workflows. Most of them work with MAM systems like iconik to handle terabytes of content locally and in cloud storage across people in teams and continents. It requires proper management and automated processes if you want to move fast. You can see a typical workflow in the picture:

In this case, we process all ingested content with automation that works with Lambda functions in AWS integrated with OpenAI and Midjourney.

It's time to move back to our solution. The goal is to build a stable process using the maximum opportunities of OpenAI API and Midjourney. As a reminder, we have an archive with 1600 MP3 files to process. Here is the resulting design of the process that should be moved to the code:

Let me highlight the bottlenecks here:

OpenAI can handle just 50 requests per minute and 6000 requests per day. It’s enough for our case, but can be an inflexible limitation for real-time workflows.
OpenAI API can return error statutes, which should be handled with while-do loops.
The limitation of tokens per request can be managed by switching a model, but it will affect the cost.
Midjourney doesn’t have API and can be utilized with Discord API, which also has limitations.

My results:

The script found 89 stories without annotation.
For all files, I spent around $60 on OpenAI.
The whole process took around 2 hours.

Generated images for the audio story “The Letter from Earth.”

Did you face use cases of such workflow in your work? Tell me about it in the comments! 😉

Did you enjoy this article? 👏 Clap for the story!
Do you have any thoughts about the article? 💬 Leave a comment!
Want to stay updated on future content like this? 🙌 Don’t forget to follow us on Medium to get notified about my latest articles and insights in AI, machine learning, and more.
Do you have a similar case to automate? ✉️ Mail us, and we will help!

AI-Powered Media Processing: Smooth automated annotation and poster creation for audio content

Written by Andrew Zaikin