With the release of NVIDIA’s Riva SDK as a public beta, getting a high performance, end-to-end speech AI pipeline configured and deployed has never been easier.
In this article, we’ll dive into the installation and setup process for Riva on Apollo, SmartCow’s first AI Engineering kit based around the NVIDIA® Jetson Xavier™ NX, and use it to run applications including automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS).
For those who do not have an Apollo can still follow along and use any other standard Jetson Xavier NX and AGX development kit. I will highlight any Apollo-specific processes.
What is Riva?
Riva is NVIDIA’s GPU-accelerated SDK that simplifies the development and deployment of real-time Speech AI applications.
Depending on your application needs, you can download a number of Riva-compatible pretrained models from the NGC Catalog, such as those performing ASR, TTS with different languages, and NLP.
After downloading pretrained models, you can use Riva to create a containerized inference server, which frees the user from dependency issues. Riva was only available for server machines for the majority of its development cycle.
About Apollo
Last month, SmartCow released the Apollo Audio/Visual AI Engineering kit. This is a development platform based around the NVIDIA® Jetson Xavier™ NX embedded system that enables users to use state-of-the-art AI models in their applications. For more information about Apollo, check out the Apollo website and the introductory article.
Apollo is primarily intended to relieve developers of the burden of sourcing compatible hardware on their own, as it includes both the hardware and the drivers needed to run it.
Installing Riva
The following installation procedure is very similar to the instructions provided in the README file on Apollo. You can access the README file by running the following command on Apollo.
$ cat /opt/apollo/sdk/riva/README.md
Prerequisites
Before you begin, ensure that you meet the following requirements.
- Ensure that you have at least 7GB of free disk space on the Apollo.
- Ensure that JetPack is installed on your Jetson. This is installed by default but can be re-installed by running the following command.
$ apt depends nvidia-jetpack | awk ‘{print $2}’ | xargs -I {} sudo apt install -y {}
- Set Apollo to its maximum power modes and set the fan to cool by running the following commands.
$ sudo nvpmodel -m 8
$ sudo jetson_clocks
$ sudo nvpmodel -d cool
- Ensure that the NVIDIA GPU Cloud (NGC) CLI tool is available on your kit. By default, the tool is pre-installed on Apollo. Non-Apollo users can obtain this tool by following the instructions provided in the NVIDIA guide.
Procedure
Step1 — Create a new NVIDIA account or sign in to an existing account.
https://ngc.nvidia.com/signin
Step2 — Generate your NGC API Key, which is required to download Riva.
For more information, visit: https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key
Step3 — Configure the NGC CLI tool with the API key obtained in the previous step by running the following command.
$ ncg config set
The system prompts you to enter the API key. You can paste the key you copied previously by pressing Shift+Ctrl+v on your terminal and then pressing Enter to confirm.
Step4 — After specifying the API key, the system prompts you to specify values for the items listed in the following table.
Here is an example of user input and system output during the configuration process.
$ ngc config set
Enter API key [no-apikey]. Choices: [<VALID_APIKEY>, ‘no-apikey’]: th15i5ju5t@g3n3r1cap1k3y,1td03sntr3allyw0rkEnter CLI output format type [ascii]. Choices: [ascii, csv, json]: asciiEnter org [no-org]. Choices: [‘g3n3ric0rgan1sat10n’, ‘cu5t0m0rgan1sat10n’]: g3n3ric0rgan1sat10nEnter team [no-team]. Choices: [‘no-team’]: no-teamEnter ace [no-ace]. Choices: [‘no-ace’]: no-aceSuccessfully saved NGC configuration to /home/nvidia/.ngc/config
Step5 — Download Riva.
Apollo users can use the download_riva.sh script, which checks if all dependencies are present, ensures that the NGC CLI tool is configured, and then downloads Riva.
$ cd /opt/apollo/sdk/riva
$ ./download_riva.sh
Note: Readers with other kits can download Riva for the ARM64 Jetson systems by following the Riva quickstart on the NGC Catalog.
Using Riva
After downloading Riva, you can find a quickstart folder containing all of the files required to deploy a number of Riva applications. I won’t go into too much detail about these in this article because NVIDIA covers them in the accompanying user’s guide.
Instead, I’ll focus on deploying various models using Riva in conjunction with Apollo’s hardware.
How Riva is configured, launched, and controlled is an important process to understand. The five files that are critical to these processes are as follows.
- config.sh: contains instructions on how to use Riva services and which models to download and launch.
- riva_init.sh: sources config.sh and pulls the Riva containers and model files from the NGC, converting them if necessary.
- riva_start.sh: launches the Riva server and client containers, serving the models that were prepared by riva_init.sh. The script also attaches the current console to the Riva container.
- riva_stop.sh: stops the running Riva containers, as the containers do not stop automatically when detaching the console.
- riva_clean.sh: deletes any Riva containers and model directories that are specified in config.sh.
Automatic Speech Recognition — ASR
You can access the ASR feature at:
cd /opt/apollo/sdk/riva/ASR
The sample-1-English-asr demonstrates how to use the English ASR Citrinet model with Apollo’s MEMS microphones. In addition to increasing the sampling rate to 48kHz, the example adds a bit-depth conversion function. Riva models assume that the incoming audio stream is 16-bit, whereas Apollo’s microphones stream data in a 32-bit format.
You can run sample-1-English-asr by running the following commands.
$ cd /opt/apollo/sdk/riva/ASR/sample-1-English-asr
$ ./riva_stop.sh
$ ./riva_init.sh
$ ./riva_start.sh
From within the container, run the following command.
# python3 transcribe_mic.py
sample-2-nonEnglish-asr modifies config.sh to download a non-English ASR model, in this case Spanish. You can run sample-2-nonEnglish-asr by running the following commands.
$ cd /opt/apollo/sdk/riva/ASR/sample-2-nonEnglish-asr
$ ./riva_stop.sh
$ ./riva_init.sh
$ ./riva_start.sh
From within the container, run the following command.
# python3 transcribe_mic.py --input-device 4 --language-code en-ES
Note: The language code argument is now set to en-ES, reflecting the change in the language model used for transcription.
Natural Language Processing — NLP
You can access the NLP feature at:
cd /opt/apollo/sdk/riva/NLP
The sample-1-intentslot demonstrates how to use the Riva intent-slot classification model with Misty. Slots and Intents labels are typically task-specific and defined as labels in data. This model’s primary use case is to jointly identify Intents and Entities in a given user query.
You can run sample-1-intentslot by running the following commands.
$ cd /opt/apollo/sdk/riva/NLP/sample1-intentslot
$ ./riva_stop.sh
$ ./riva_init.sh
$ ./riva_start.sh
From within the container, run the following command.
# python3 intentslot_client.py --model riva_intent_misty
You can also provide custom queries through the command line.
# python3 intentslot_client.py --model riva_intent_misty --query "Will it be cloudy tomorrow?"
The model assigns an intent for the given query. For example, for the previous query, the model assigns an intent of weather.weather.
# python3 intentslot_client.py — model riva_intent_misty — query “How are you?”
For the previous query, the bot returns the output of “smalltalk.personality_how_is_bot_doing”.
Text-To-Speech — TTS
You can access the TTS feature at:
cd /opt/apollo/rsdk/riva/TTS
The sample1-female-voice demonstrates how to use the Riva TTS model in conjunction with Apollo’s speaker driver circuitry. Apart from being able to vary the sample rate, the example also allows for experimentation with features such as voice, rate, pitch, and pronunciation.
You can run sample1-female-voice by running the following commands.
$ cd /opt/apollo/sdk/riva/TTS/sample1-female-voice
$ ./riva_stop.sh
$ ./riva_init.sh
$ ./riva_start.sh
From within the container, run the following command.
# python3 talk_stream.py --output-device <N>
The config.sh file in sample-1 is modified to download a secondary voice model, in this case a male one. From the container, the voice model can be specified using the –voice argument as follows:
# python3 talk_stream.py --output-device <N> --voice "English-US-Male-1"# python3 talk_stream.py --output-device <N> --voice "English-US-Female-1"
You can adjust the rate or speed of the output speech to make it faster or slower. If you do not specify the rate, the default speed of 100 is assumed.
# python3 talk_stream.py --output-device <N> --voice "English-US-Female-1"
The rate or speed of the output speech can be modified for faster or slower. A default speed of 100 would be assumed if rate remains unspecified.
# python3 talk_stream_rate.py --output-device <N> --rate <R>
You can adjust the pitch or frequency of the output speech to produce higher or lower output. If you do not specify the pitch, the default pitch of 0 is assumed.
# python3 talk_stream_pitch.py --output-device <N> --pitch <R>
Finally, users can influence pronunciation by manipulating phonemes (a unit of sound that can distinguish one word from another in a particular language). Within Riva, you can use phonemes to define the pronunciation of the TTS model.
$ """<speak>You say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@EY1}{@T}{@OW2}">tomato</phoneme>, I say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@AA1}{@T}{@OW2}">tomato</phoneme></speak>"""
Conclusion
This article delves into the Riva on Apollo installation and setup process, as well as applications such as ASR, NLP, and TTS on SmartCow AI’s Apollo — the Audio/Video Engineering kit, enabling edge AI capabilities.
Be sure to check out more off-the-shelf software examples, such as audio and video AI applications based on NVIDIA Riva and NVIDIA DeepStream.
About the Authors
Ryan Agius is the AIoT Engineering Manager working with SmartCow AI Technologies Ltd. at their Malta office. Before starting at his current position, Ryan worked as an Engineering intern with CERN as part of his thesis for a Masters in ICT (Signal Processing & Machine Learning) course.
Luke Abela is an AIoT Engineer working with SmartCow AI Technologies Ltd. at their Malta office. Before starting at his current position, Luke pursued a Master of Science in Artificial Intelligence at Queen Mary, University of London, whilst also working as a part time A.I. analyst.