Speech-to-Text: Transcribe audio without writing a single line of code

Published in

NeuralSpace

4 min readSep 23, 2022

Introduction

The most natural form of human communication is speech. However, voice/speech is not nearly as semantically understood by existing AI models as text is. Won’t it be great if we could use speech as an interface, while interpreting the emotions and meaning behind, to have a meaningful interaction?

Speech-to-text, or automatic speech recognition (ASR), exactly does that and bridges the gap.

NeuralSpace’s Speech-to-Text is a technology that enables human speech to get converted automatically into text. It is built using cutting-edge AI models to provide precise transcriptions of any type of speech (and any person’s voice), whether it be in conversations or other contexts. Once we have transcriptions, it is essentially text that can be successfully analyzed by text services like Language Understanding or Entity Recognition.

With NeuralSpace’s Speech-to-Text models you can get audio transcriptions for various languages. We support two different ways of converting your speech to text: dictation and file transcription.

In this blog, we talk about STT’s features, and use-cases and give a tutorial on how you can use it on the NeuralSpace Platform (in a no-code way)!

Features

State-of-the-art Models: We have our own pre-trained state-of-the-art models through APIs and integrate them in any of your applications.

Domain Specialization: Our models are specialized in pre-defined domains such as finance or medical. We also have specialized models for different accents. For example, our medical domain specialized English STT model can accurately transcribe medical terms, and our Indian domain specialized English STT model can accurately transcribe English spoken in the Indian accent.

Low-Resource Language Support: Get going with our STT to support a wide range of languages worldwide. Even those that are not widely represented in the digital world.

Use-Cases

Captioning for Videos or Meetings: Our APIs and CLIs can be used for generating transcriptions for your videos or meetings very easily with high efficiency.

Voice Bots: With our Speech To Text service, one can extend their chatbot interface to voice while re-using the same NLU pipeline. Using our Speech To Text APIs, one also gets language support for various low-resource languages along with standard high-resource languages.

Automatic Transcription: With our STT models one can automatically acquire transcription of long speech audios within a few hours, which could otherwise take days to manually transcribe.

Voice Typing: Enable hands-free real-time transcription with our STT models in over 20 languages.

Language Support

Following are the 24 languages we currently support Speech-to-Text in. We are working hard to offer many more languages in the near future.

Arabic (ar)

Catalan (ca)

Chinese (zh)

Czech (cs)

Dutch (nl)

English (en)

Esperanto (eo)

French (fr)

German (de)

Greek (el)

Hindi (hi)

Italian (it)

Japanese (ja)

Kazakh (kk)

Odia (or)

Portuguese (pt)

Persian (fa)

Russian (ru)

Spanish (es)

Swedish (sv)

Tagalog (tl)

Turkish (tr)

Ukrainian (uk)

Vietnamese (vi)

Tutorial

Step 1:

Step 2:

Click on “Speech-to-Text” from the left side under All Services.

Step 3:

Choose the mode of transcription — File Transcription and Dictation.

File Transcription and Dictation for Speech-to-Text on the NeuralSpace Platform

| For File Transcription:

Step 1

Upload your desired file (size can be between 10MB-500MB)

Step 2

Click on the “Select Language” drop-down to choose the language and the Domain dropdown for the domain of the audio file.

File Transcription for Speech-to-Text on the NeuralSpace Platform

Step 3

Then click on transcribe and wait for the file to transcribe.

Step 4

Click on the “View Transcript” button beside the audio player to get the corresponding transcription for the audio.

Voila! You successfully converted your first audio file to text!

| For Dictation:

Step 1

Select the desired Language and Domain from the dropdown.

Step 2

Then click on the yellow microphone button to start streaming the transcription. To stop transcribing the audio, press the yellow mic button once again.

Dictation (Speech-to-Text on the NeuralSpace Platform)

There you go! You successfully transcribed what you were saying to text!

The team at NeuralSpace is working on adding more languages to our STT service, feel free to reach out to us if you have any preferences.

Try our Speech-to-Text service on the NeuralSpace Platform now! Sign-up and get $200 worth of credits!

Check out our Documentation to read more about the NeuralSpace Platform and its different services.

Join the NeuralSpace Slack Community to connect with us, ask questions and collaborate on exciting projects with other community members. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.

Happy NLP!