Inside SpeakX: How Deepgram Powers Next-Gen Language Learning at speakX.ai
In today’s world, where Artificial Intelligence is rapidly advancing across all sectors, SpeakX.ai is at the forefront of revolutionizing English language learning in India by making it more accessible, engaging, and personalized. Deepgram’s transcription technology has played a crucial role in enhancing how users interact with our platform.
The Role of Transcription in Language Learning
For any language learning platform speech-to-text transcription plays a vital role, as it is the primary form of interaction with the user base. Traditional models, although quite effective, fail to provide accurate transcription while dealing with a plethora of accents, intonations and noise.
At its core, speakx.ai aims to help users improve their spoken English by offering real-time feedback on their pronunciation, grammar, and fluency. This feedback is made possible by Deepgram’s robust transcription capabilities, which accurately convert spoken English into text, enabling the platform’s AI to analyse and assess the user’s speech in real-time.
Why Deepgram?
Deepgram stands out as a transcription landscape due to its end-to-end deep learning model trained on a huge corpus of audio data encompassing all languages, accents and dialects. This approach allows Deepgram to deliver high accuracy even in noisy environments or with non-native accents — scenarios common in SpeakX’s diverse user base.
Some key benefits SpeakX enjoys by using Deepgram include:
- Fast and Accurate Transcriptions: Deepgram’s transcription model is exceptionally good and quick enables SpeakX to analyse user response and provide feedback on the go.
- Real-time Transcription: Deepgram powers SpeakX’s end-to-end user-ai conversation system, eliminating waiting time and formulating a seamless experience for the users.
- Scalability: As SpeakX continues to grow, the scalability of Deepgram ensures that the platform can handle increasing user demand without compromising on speed or accuracy.
Deepgram in SpeakX
Deepgram offers easy-to-use SDKs and APIs that have allowed us to integrate transcription services into our applications quickly.
Transcribing an Audio File:
This simple snippet shows how easy it is to get a transcription with just a few lines of code. SpeakX uses this approach to transcribe user inputs for further analysis and feedback.
import { createClient } from '@deepgram/sdk';
const getDeepgramResponse = async ({
audioUrl,
}: {
audioUrl: string;
}): Promise<{
transcription: string;
}> => {
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const { result, error } = await deepgram.listen.prerecorded.transcribeUrl(
{ url: audioUrl },
{
model: 'nova-2',
['smart_format']: true,
language: 'en-IN',
},
);
let transcription: string | undefined;
if (error) throw error;
transcription = result?.['results']?.['channels']?.[0]?.['alternatives']?.[0]?.['transcript'];
return { transcription };
};
Real-time Transcription Using WebSockets
For a more engaging and continues conversation experience, we use Deepgram’s WebSocket integration which provides an efficient way to stream audio and receive transcription data instantly. Below is a quick example on how to set this up.
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
export const createDeepgramConnection = (sid, emitFn) => {
const deepgramClient = createClient(process.env.DEEPGRAM_API_KEY);
const live = deepgramClient.listen.live({
model: 'nova-2',
language: 'en-IN',
['smart_format']: true,
encoding: 'linear16',
['sample_rate']: 48000,
channels: 1,
});
live.on(LiveTranscriptionEvents.Open, () => {
live.on(LiveTranscriptionEvents.Transcript, (data) => {
const transcript = data?.channel?.alternatives?.[0]?.transcript;
console.log(transcript)
});
live.on(LiveTranscriptionEvents.Error, (error) => {
console.error('LiveTranscriptionEvents.Error: ', error);
});
live.on(LiveTranscriptionEvents.Close, () => {
console.log('Connection closed.');
});
});
return live;
};
Conclusion
Integrating Deepgram’s transcription technology into our mobile app has enabled us to offer real-time, accurate, and personalized language learning experience to our users. With the easy-to-use SDKs and powerful transcription capabilities, developers can quickly incorporate similar functionalities into their applications, whether for educational purposes, accessibility, or any other use case requiring reliable speech-to-text conversion.