How to Customize the Houndify ASR Service
The Custom ASR Enhancement domain lets you customize the Houndify speech-to-text (ASR) service for your particular use case. By default, the ASR service has been trained to be general purpose in its translation. In this way, it can recognize most words commonly occurring in the different domains Houndify supports. However, for some applications, it can be useful to define or increase the Houndify algorithm’s bias towards new words by uploading a custom vocabulary that is suited for your domain.
Let’s say you work on medical reports, and your service needs to transcribe reports that include specialized medical jargon — anatomical terms, names of instruments, specific drugs, etc. Your report may contain phrases like “A 5-French Yueh catheter needle combination was taken.” The default ASR service may have trouble transcribing the word “Yueh” as it does not commonly occur in typical English conversations. But since you know that this is a common term in the context of medical reports, you can include the word “Yueh” in your custom vocabulary so that Houndify can correctly transcribe it.
Writing a Custom Vocabulary
We made customizing Houndify’s ASR incredibly simple and powerful. To get started quickly, all you have to do is upload a list of words to add to the vocabulary. But customization doesn’t stop there — you can also optionally define context by providing multi-word phrases, expressions, or grammars surrounding the words. This can help the ASR system understand how the words are used in the language.
First, if you don’t already have a Houndify account, create one for free at soundhound.com. After registering, you’ll need to enable domains. For Custom ASR Enhancement, you will need to enable the CustomASREnhancement domain along with the Speech to Text Only or Transcription domains. To upload a custom vocabulary, you will need to have your client ID and client key, so make sure you write those down.
Returning to our medical report example, we could upload the word “Yueh” by itself, or we could upload an expression such as "Yueh" . ["centesis"] . “catheter" . “needle"
. The square brackets indicate that the word “centesis” is optional, so ASR would recognize “Yueh catheter needle” or “Yueh centesis catheter needle”. For more information on expression syntax, see the Houndify Client Match documentation.
Custom ASR Enhancement expects expressions to be uploaded in a grammar format. A grammar is simply a set of blocks, where each block is a list of expressions. In this case, we can define a single block containing our list of medical terms.
The grammar should be added under the CustomASREnhancementWriteData
field of the Houndify Request Info json. The format of this data contains a list of “Blocks”
, and each block has a “Name”
(which can be any string) and a list of “Expressions”
. We also need a “RootBlock”
field with the same value as our block name. For example:
{
"CustomASREnhancementWriteData": {
"Version": "1.0",
"GrammarName": "medical",
"IsGlobal": true,
"Blocks": [
{
"Name": "MEDICAL_TERMS",
"Expressions": [
"\"Yueh\" . [\"centesis\"] . \"catheter\" . \"needle\"",
"\"hemostasis\"",
"\"paracentesis\"",
"\"tracheostomy\" . \"tube\"",
"\"endotracheal\""
]
}
],
"RootBlock": "MEDICAL_TERMS"
}
}
See the CustomASREnhancement domain documentation for more information about the CustomASREnhancementWriteData
format. Note that quotes inside the expressions need to be escaped when written in json strings. Custom ASR Enhancement is not case-sensitive, so “Yueh” will be transcribed as “yueh”.
Uploading and Testing the Custom Vocabulary
For this tutorial, we’ll use the Python 3.x SDK to send requests to the Houndify service. You can download the SDK here.
To upload the custom vocabulary, send a text request with the special text write_custom_asr_enhancement
, and the CustomASREnhancementWriteData
from above included in the request info. You can send the request using the script called query_houndify.py
in the SDK:
./query_houndify.py — text-query write_custom_asr_enhancement — client-id <client id> — client-key <client key> — request-info-file custom_asr_enhancement_upload.json
where custom_asr_enhancement_upload.json
is a file containing the CustomASREnhancementWriteData
json as shown above.
If the upload is successful, then you’ll get a response that says Successfully uploaded 1 block, 0 pronunciations
:
{
"AllResults": [
{
"AutoListen": false,
"BlockCount": 1,
"CommandKind": "CustomASREnhancement",
"CustomASREnhancementKind": "Write",
"PronunciationCount": 0,
"SpokenResponse": "Successfully uploaded 1 block, 0 pronunciations.",
"SpokenResponseLong": "Successfully uploaded 1 block, 0 pronunciations.",
"ViewType": [
"Native",
"None"
],
"WrittenResponse": "Successfully uploaded 1 block, 0 pronunciations.",
"WrittenResponseLong": "Successfully uploaded 1 block, 0 pronunciations."
}
],
...
Once you receive this response, the custom vocabulary is ready to be used!
To enable the custom vocabulary in audio requests, simply add the name of the grammar (eg. medical
) to the EnabledCustomASREnhancementGrammars
field of the request info json. You can send audio requests using query_houndify.py
as follows:
./query_houndify.py — audio-query <path to wav file> — client-id <client id> — client-key <client key> — request-info-file custom_asr_enhancement_enable.json
where custom_asr_enhancement_enable.json
contains
{
“EnabledCustomASREnhancementGrammars”: [“medical”]
}
After following these steps, you’ll hopefully see words such as “Yueh” transcribed correctly. However, it’s important to understand that uploading words and phrases with Custom ASR Enhancement does not guarantee that those words/phrases will always be transcribed. It only biases the ASR service more towards those words/phrases, making them more likely to be transcribed.
For more detailed documentation on the Custom ASR Enhancement domain and information about additional features not covered here, such as context-free grammars, weights, and custom pronunciations, see the CustomASREnhancement domain documentation.