The Audio/Video Engineering Kit with Edge AI Capabilities!

SmartCow
3 min readFeb 25, 2022

--

Overview

Have you been searching for development kits to evaluate audio-visual AI applications or models? Look no further! Introducing Apollo, SmartCow’s first audio/video engineering kit based around the NVIDIA® Jetson Xavier™ NX computing module, which enables developers to build applications with conversational AI capabilities.

With Apollo’s integrated GPU that includes the NVIDIA Riva, JetPack and DeepStream SDKs, you can seamlessly evaluate audio-visual AI models rather than enabling peripheral devices and ensuring system integration is successful.

Apollo’s integrated hardware includes — a base frame that allows the device to stand upright, four microphones, two speaker terminals, an 8MP camera module, a 2.08 inch OLED display and a 128GB NVMe SSD — all in one small package.

The device includes a board support package (BSP) with Ubuntu operating system and example audio-visual AI applications. Additionally, the board has two programmable buttons for adding custom applications. One of the buttons by default is set to “One-Key Recovery”, enabling you to easily reflash or restore Apollo to its default. The second button is not configured to let you add your own application.

(Photo: SmartCow)

Key features

Apollo hardware features

  • Computing module: NVIDIA® Jetson Xavier™ NX (384-Core NVIDIA VoltaTM GPU with 21 TOPs high performance)
  • Preloaded with NVIDIA Riva and DeepStream SDKs
  • Multimodal AI interface that has a small form factor
  • Equipped with an OLED display that shows the device’s status
  • One 8MP IMX179 camera module
  • Built-in Audio Codec and MEMS microphones
  • Two programmable buttons
  • Assembled with a base frame that allows the product to stand upright for a better user experience
  • Processes audio and video data simultaneously

Apollo software features

  • Computer vision and deep learning workbooks
  • Gstreamer examples (webRTC)
  • One-touch factory reset or upgrade
  • NLP examples (text-to-text, text-to-speech, speech translation, and so on)

Hardware specifications

Example audio-visual AI applications

Automatic Speech Recognition

The Automatic Speech Recognition application uses the NVIDIA Citrinet model to transcribe spoken language (speech-to-text).

Text-to-Speech

The Text-to-Speech application uses NVIDIA Riva models to generate speech audio for your applications based on the input text.

Natural Language Processing

The Natural Language Processing (NLP) application analyzes text data to determine the sentiment (how positive or negative) of a text using the open-source NLTK model. Based on the sentiment of the input text data, the application displays an emoting cow on the OLED screen.

The application can also perform Named Entity Recognition (NER), which is a process that takes a sentence and determines named entities.

Chatbot

The chatbot application uses the open-source GitHub repository Chatterbot to create chatbot instances. This application runs on a singular Jetson that calls two instances of the chatbot, which communicate with one another. You can run other NLP tasks such as sentiment analysis on the text generated by the chatbots, and also use text-to-speech on the text generated.

Speaker Verification

The Speaker Verification application is the audio equivalent of facial recognition. This application can extract the characteristics from a person’s voice because they are present whenever they speak, regardless of what they say.

Conclusion

To summarize, Apollo is an ideal engineering kit in a small form factor that allows you to get started quickly and focus your efforts on developing edge AI audio-visual solutions. And what’s more, there is a dedicated website where you can learn about Apollo’s new capabilities and stay up to date.

Learn more about Apollo: https://apollo.smartcow.ai/

About the Author

Sunil Devarbhavi
Sr. Technical Writer
SmartCow AI Technologies, Singapore

--

--

SmartCow

SmartCow is an AI engineering company that specializes in advanced video analytics, applied artificial intelligence & electronics manufacturing.