AWS Transcribe — Add Text to your Voice Messages using Node

Published in

Simform Engineering

5 min readDec 20, 2019

Currently I am working on a chat app, in which we can send voice messages. We needed a voice to text feature for purposes like search and intent understanding, but didn't need live conversion to text and needed to send the voice in near to realtime. We are already using AWS for most of our purpose for this piece of software. So we decided to opt for AWS Transcribe using the transcription jobs.

The flow:

The API via which audio file was uploaded
- upload file to s3
- create a transcribe job with that s3 URL
Create a lambda
- Delete completed Transcription Job
- Read from JSON which was created by that job and fetch text
Create AWS Cloudwatch rule to detect completion of job and trigger above created lambda

1. API that accepts an audio file or voice message

Note: To keep the tutorial short I will skip installing of aws-sdk and configuring it with accessKeyId, secretAccessKey, region, etc.

Let's get our hands dirty!

// initialize aws-sdk for node
const AWS = require(‘aws-sdk’)// you can even configure it here if you wantconst payload = {
    key: 'AudioFiles/NameOfFile.mp3',
    Bucket: 'bucket-name-where-audio-file-would-be-stored',
    Body: <file-blob-which-is-recieved-from-client-side>,
    ContentType: audio / mp3, // this would be according to file
}await new Promise((resolve, reject) => {
    s3.upload(payload, (err, data) => {
        if (err) {
            reject(err)
        } else {
            resolve(data.Location)
        }
    })
})

Now the audio file would be saved on s3 whose URL would be :

const MediaFileUri = 'https://s3.amazonaws.com/' + 'bucket-name-where-audio-file-would-be-stored' + '/' + 'AudioFiles/NameOfFile.mp3'

Now we have to create a transcribe job.

const transcribeService = new AWS.TranscribeService()
const params = 
{
  TranscriptionJobName: 'NameOfFile_1234',
  Media: {MediaFileUri},
  MediaFormat: 'mp3',
  OutputBucketName: 'Transcribe-bucket-name',
  LanguageCode: 'en-IN', // english india or use en-US for US
}await new Promise((resolve, reject) => {
    transcribeService.startTranscriptionJob(params, function (err, data) {
        if (err) {
            reject(err)
        } // an error occurred
        else {
            console.log(data) // successful response
            resolve(data)
        }
    })
})

You should also create / update and add vocab to this, which could contain names of people who are sending message to and from, this would help AWS figure out proper nouns more distinctly. But that is out of basic scope of this blog.

Hurray! the job has been created and would soon start to transcribe from your audio file once it gets its resources to do so.

2. Create a Lambda function

Go to AWS console > Login to the dashboard and go to Lambda.

Click on create and author from scratch

const AWS = require('aws-sdk')

Now inside exports.handler() method which receives event as a parameter:

Note: This complete code should also be handled inside try catch block.

Use event.detail.TranscriptionJobStatus to get if completed or failed. Use event.detail.TranscriptionJobName to get the name of the job — which would actually contain the messageId in it in some specific format: <nameOfFile>_<messageId>, which we will stripe through it.

const nameOfFile = event.detail.TranscriptionJobName.split('_')[0]
const messageId = parseInt(event.detail.TranscriptionJobName.split('_')[1])
// close that job
const transcribeService = new AWS.TranscribeService()
const transcribeServiceParams = {
    TranscriptionJobName: event.detail.TranscriptionJobName,
}

Now let's delete the Job:

await new Promise((resolve, reject) => {
  transcribeService.deleteTranscriptionJob(
    transcribeServiceParams,
    function(err, data) {
      if (err) {
        console.log(err, err.stack)
        reject(err)
      } // an error occurred
      else {
        console.log(
          'successfully closed transcribe job : ',
          event.detail.TranscriptionJobName,
          ' on state : ',
          event.detail.TranscriptionJobStatus,
          data
        ) // successful response
        resolve('successfully closed transcribe job')
      }
    }
  )
})

Wow, the job has been deleted.

Let's see what has been converted:

const s3 = new AWS.S3()
const s3params = {
  Bucket: 'Transcribe-bucket-name',
  Key: event.detail.TranscriptionJobName + '.json',
}
const transcripts = await new Promise((resolve, reject) => {
  s3.getObject(s3params, function(err, data) {
    if (err) {
      console.log(err, err.stack) // an error occurred
      reject(err)
    } else {
      // this could contain multiple transcripts
      // successful response
      resolve(JSON.parse(data.Body.toString()).results.transcripts)
    }
  })
})
// fetch and concat all transcripts
let finalTranscript = ''
transcripts.forEach(transcriptObject => {
  finalTranscript = finalTranscript + transcriptObject.transcript
})

Yes, now you got the data that you wanted!

You could now use this to insert in DataBase and reference this message with either nameOfFile or with messageId this is what we have passed while creating the job. The transcript is in variable finalTranscript

Alternatively you could also use the job id that is returned on creation of job, and store it in db and match it with event in this lambda.

You could even similarly delete the JSON file after its data is consumed, though it's better to keep it since it contains much more data other than just the transcript, you might need to use it in future for more detailed insights of your Audio message.

In the end, add this before ending the lambda.

console.log('ended ----')
const response = {
  statusCode: 200,
  body: JSON.stringify('Transcription successful'),
}
return response

3. Create a Cloudwatch rule

Log on to AWS console and go to CloudWatch > Rules. Click on Create Rule button and fill in these details.

Add a trigger to CloudWatch rule which will be a trigger to our lambda which we created above. To do so Click on Add target button > select Lambda function in it > give in the lambda name which we kept: transcribe-job

That's it! you are ready to go and test. A basic flow has been created to transcribe your audio messages which are not being streamed.

Alternatively, you can also use a cron job or in other words a scheduled job in your own api server, but a con of that is that you would need to run it every minute > check the status of all transcribe jobs, loop the ones that are completed or failed > fetch file from it and do the same there.

These are my two cents on it, give me your approach to deal with this situation below, who knows your approach could be more optimized than mine, feel free to share your thoughts below.

If you loved this information or in some way this information has helped you feel free to give a pat on my back by giving some claps, this would encourage me to continue writing these types of blogs. Also share this with some of your co-developers who might be working on the backend, and to those whose product could benefit from this feature.

Adding of Database could bring in adding up VPC, security group and followed by NAT — But that is out of scope for the current scenario, comment below if you want a blog on it. You can also reach out to me on LinkedIn to discuss about the topic.