Use Azure OpenAI to get a transcript of any MP3 file using C#
Microsoft enables the options to use the Whisper API by OpenAI to transcribe any provided MP3 file. In this post I will show you how you get the transcript using a simple C# application.
Introduction
OpenAI introduced the new Whisper API and Microsoft added this model to their Azure OpenAI Service. In this post I will show you how you can easily setup an Azure OpenAI service and how you can use the API to get a transcript of any provided audio file.
Request access to Azure OpenAI
You need to have an active Azure Subscription. Currently you need to request access to the Azure OpenAI Services. You need to fill the request form from Microsoft, just answer different questions and give Microsoft some time to accept your request.
Add needed Azure resources
If you have access to the Azure OpenAI services, you can open the Azure Portal and start adding an OpenAI resource. Search in the Marketplace for Azure OpenAI. Make sure that you select the Subscription which has access to the Azure OpenAI services. I’ve explained in another post how you can create the Azure OpenAI Service and you can use this post to setup everything accordingly.
What is new is the deployment of the Whisper model. Currently the Whisper model is only available in North Central US or West Europe, so make sure that your Microsoft OpenAI Account is in one of these regions. You open the Azure OpenAI Studio. Click on Deployments and add the Whisper model. You can use any deployment name, but we need this value later on in our C# application, so make sure to remeber it. This is the reason why I use whisper.
In the Azure Portal we open our Azure OpenAI Service and open Keys and Endpoint to make a copy of the Endpoint and KEY 1, because we need these values later in our C# application.
Let’s code
I am using Visual Studio to create a new .NET console application using .NET 8. Currently this API is not supported by any NuGet package and we have to call the API by ourself.
Add the Spectre.Console NuGet package to the project and create a new folder called Models where we will add the request model.
Let’s start with the SpeechToTextResponse
class. This class contains the properties to deserialize the response from the API.
Now that we have the needed model available, we can open the Program.cs
file to build the business logic.
At first the console application is asking the user to provide the needed parameters. In this case we need the Azure OpenAI Endpoint, the Azure OpenAI API Key, the Model Deployment Name and the file path of a stored MP3 file. Theoretically the API also supports other audio formats, but the easiest way is to work with MP3 files, which is the reason while I have limited it.
We use the transcriptions
endpoint to post our properties to the API. The response is a JSON object containing the transcribed text. The console application will write the transcribed text to the console.
Sample
Let’s run the console application. You will see the header and the first input for the Azure OpenAI Endpoint. Now you can provide all the needed data.
Here is the transcribed text of the first scene of Macbeth. In one of my previous posts I’ve created this MP3 file using the Text-To-Speech API from OpenAI.
But the API is not only limited to the English language. The next screenshot shows a transcript of my channel trailer of my YouTube channel in German.
The API also supports different languages within one audio file and will transcribe the text in the spoken langauge.
Conclusion
In this blog post I’ve written a simple .NET console application to use the published Whisper API by Microsoft and OpenAI. So it is pretty simple to use the API.
You can find the source code of the console aplication on my GitHub profile.
If you want to use the API directly from OpenAI without Azure you can take a look at this blog post.