Voix: Car voiceprint authentication system concept

Published in

Voice Tech Podcast

9 min readMay 9, 2020

Exploring a Voice User Interface (VUI) system for vehicle voiceprint recognition and authentication.

Figure 1. The Voix voiceprint system and mobile app

While doing my master’s this year, I’ve been dabbling in voice user interfaces (VUI), voice interaction design, and multi-modal experiences that go beyond the screen. I wrote a technical specification paper for a voice user interface system concept I designed for one of my classes at the University of Washington, Master of Human-Computer Interaction + Design 🤓

The Concept

Voix is a voiceprint authentication system that allows users to lock, unlock, turn on and turn off their car both inside and outside the vehicle without their hands or mobile device — ultimately turning a user’s voice into a voice key. The system can be integrated into any new vehicle’s electromechanical system with an existing in-car assistant as an add-on from the car dealership at the point of purchase. The car company would install Voix into the vehicle as it’s being made which would give them the ability to reach into the car’s system similar to Alexa Auto SDK.(1) Voix makes voiceprint recognition possible through the creation of a user’s voiceprint using voice biometrics to identify the user’s unique voice characteristics which are unique to each person. With the voiceprint recognition, the system is able to extract phonetic features from the speaker’s voice signals to validate the speaker’s identity.(2) Similar to how thumbprint recognition works, drivers can create a voiceprint to add to their vehicle’s permitted voice profile. Furthermore, Voix allows users to add other driver’s voiceprints to the vehicle’s permitted drivers if there is more than one driver for the vehicle.

Voiceprint authentication is beneficial to users because they are unique to each person which makes them difficult to replicate. It is also a form of authentication that most people can use. In an increasingly digital world where physical keys and text-based passwords are no longer the norm, Voix’s voiceprint authentication provides security and peace of mind for users while not adding onto the cognitive load often experienced with text passwords. Voiceprints are also accessible to most users, especially those who have physical disabilities as they don’t require physical input or touching one’s extremities to a surface. However, there is a percentage of people who may be deaf or hard of hearing who might not be able to benefit from voiceprint recognition.

Voix’s target users are those who are in the market for a new vehicle from a car manufacturer. This technology may be beneficial for busy parents who may have their hands full and are carrying a child so they aren’t able to reach their keys. Or this could be particularly beneficial for an elderly person who does not have the dexterity to reach for their car keys and open their vehicle. Another use case for Voix could be for law enforcement who may need to get inside their vehicles quickly in an emergency situation but have dropped their keys or cannot reach them for some reason.

Interaction Experience

Onboarding and Pairing

Once a user has picked up their new vehicle with the Voix integration, the user will first need to go through the onboarding and pairing process. With a personalized email from Voix, the user will be directed to download the Voix app, and then will be asked to create an account. Following this, the user will be asked to connect their mobile device to their vehicle via Bluetooth. Once the mobile app has been paired with the vehicle, the user will then be directed to begin creating their voiceprint.

Voiceprint creation

Similar to the way users can create a thumbprint profile with the iPhone, users will use the Voix mobile app to create their voiceprint. In order to do this, they will be provided with a set of phrases on the screen to read aloud into their phone’s speaker to create a template of their voice. The interface will provide affordance that the user must continue to read the sentences aloud until the profile has been fully created. Depending on the quality of the recording the user is able to produce, the app interface will continue to prompt them with sentences to repeat until it has created a full voice profile. After users have created their voiceprint, this then serves as a template of their voice for the system to check against in order to authenticate.

Figure 2. User experience of creating a voiceprint

Adding other voiceprints to the car profile

In addition to creating their own voiceprint, the primary user of the vehicle will be able to create a voiceprint for other users who may also be driving the vehicle. To do this, the primary user must go through the voiceprint creation process with the new user through the app. The primary user of the vehicle will also have the flexibility to customize each additional user’s permissions. For example, if the primary user has added their partner’s voiceprint, they can enable permissions for unlocking, locking, turn on and turn off the vehicle. However, if the primary driver has children and they want to allow the children to get things out of the vehicle but not turn it on, the primary user has the flexibility to customize permissions for unlocking and locking only.

Figure 3. User experience of setting vehicle permissions

Technical Specifications

Building the Voix system requires engineering the robust voiceprint recognition algorithm to identify and understand a user’s voice while also assembling system hardware such as directional sound detection microphones, the main controller module, and speech recognition modules within the vehicle itself. This section goes into further detail about these components.

Voiceprint Creation and Algorithms

Each person’s voiceprint is unique to them because everyone’s vocal cavities and the way their mouth moves is different when they speak.(3) Voiceprint recognition can be valuable in authenticating the usage of a system or device from one person to another. The Voix voiceprint creation process is modeled on the use of a combination of text-dependent and text-independent biometric techniques invented by Lousky et al. from Nice Systems Ltd (Figure 4). As Lousky et al. describe in their US Patent, text-dependent voiceprints are based on particular words that require the user to utter particular words, whereas text-independent voiceprints use speech analytics to analyze past recordings of speech to authenticate the user. The voiceprint creation process begins with the system applying short-time Fourier transform on several overlapping audio streams of the user’s voice recording during the setup process on the Voix mobile app. Once this has been done, the system will create a three-dimensional image of the voiceprint that will detail measurements of magnitude versus frequency for a specific moment in time (Figure 5).(4)

Figure 4. Voiceprint creation process diagram. (Lousky et al. 2017)

Figure 5. Example of what spectrogram will look like during following voiceprint creation (Chen and Xu 2015)

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

Voiceprint recognition and algorithms

Voix uses voiceprint recognition to match a user’s voice to the voiceprint that has been stored in the system. Voiceprint recognition systems primarily involve low-level acoustic features in which feature parameters are extracted from the physiological structure of the glottis, pitch profile, formant frequency bandwidth, and its trajectory, etc.(6)

Taking into account potential noise interference that is likely to occur when a user attempts to use voiceprint recognition outside the vehicle, Voix’s system is modeled on a robust voiceprint recognition method that uses feature extraction algorithm, recognition algorithm, and endpoint detection algorithm based on spectrogram. This recognition method reduces the influence of noise while improving the efficiency of voiceprint recognition and improves the recognition rate of identifying authentication developed by State Grid Electronic Commerce and State Grid Xiongan Financial in 2018.(6) Figure 6 shows the flow of Voix’s voice recognition system.

Figure 6. Robust Voice Recognition System (H. Shen et al. 2018)

System Hardware

Building the robust voice recognition method for the Voix system requires engineering the main controller module, power supply module, the LD3320 speech recognition module, and vehicle electrical appliances.(6) The Voix hardware has been modeled on the speech recognition unit hardware design developed by Yuping Su from Northwest Minzu University (Figure 7).

This requires STC11L08XE MCU as its control unit, STC89C52RC as the main controller, and the recognition process is controlled by reading and writing the corresponding registers of LD3320 speech chip.(7) The MICN pin and MICP pin of the microphone are connected with pin 9 and pin 10 of the LD3330 speech chip which then receives the voice signal from the microphone after which it will transmit the signal to STC11L08XE MCU.(7)

Figure 7. System Schematic diagram (Su 2019)

Voix Sound Detection

To detect sound outside the vehicle, Voix requires the assembly of multiple ultra cardioid microphones. These ultra cardioid microphones, also called “shotgun” microphones have the ability to pick up on sound that it is pointing at. Using shotgun microphones will ensure the sound is directional and therefore easier to pick up on the user’s voice when they are approaching the vehicle from a variety of angles.(8) These microphones will be placed in both passenger mirrors and handle of the car wrapped in batting in order to absorb water and protect them in cold weather conditions.

To enable clarity of the user’s voiceprint and utterance inside the vehicle, Voix uses Acoustic Echo Cancellation (AEC) and beamforming. Acoustic Echo Cancellation (AEC) is used inside and outside the vehicle to remove the acoustic echo component from the microphone signal so the user’s voice can be clearly understood by the ASR engine. Along with this, an AEC algorithm will need to adaptively estimate the acoustic echo path between the loudspeaker and microphone components. This estimation should then be programmed to subtract from the microphone signal to get a clear signal from the user.(7) In order to attain the best signal, Voix will use beamforming, a signal processing technique for multi-microphone arrays that reduces audio interference from other directions.(9)

Figure 8. Placement of shotgun microphones in car exterior side mirrors and door handles

References

1 Amazon Alexa Auto SDK.

https://developer.amazon.com/en-US/alexa/alexa-auto/sdk

2 Voiceprint Recognition System — Not Just a Powerful Authentication Tool. Alibaba Cloud (2017). https://medium.com/@Alibaba_Cloud/voiceprint-recognition-system-not-just-a-powerful-authentication-tool-6b3702b5c5a

3 How Biometrics Work. Tracy V. Wilson. https://science.howstuffworks.com/biometrics.htm#pt3

4 Lousky et al. System and method for voice print generation. US 9,721571 B2, United States Patent and Trademark Office, August 1, 2017.

https://patentimages.storage.googleapis.com/fe/0c/44/a48cbc221c9e09/US9721571.pdf

5 P. Li, M. Chen, F. Hu and Y. Xu, “A spectrogram-based voiceprint recognition using deep neural network,” The 27th Chinese Control and Decision Conference (2015 CCDC), Qingdao, 2015, pp. 2923–2927. DOI: 10.1109/CCDC.2015.7162425

6 Haijuan Shen, Bo Wang, and Junsheng Wang. 2018. Research on Robustness of Voiceprint Recognition Technology. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2018). Association for Computing Machinery, New York, NY, USA, Article 65, 1–5. DOI:https://doi.org/10.1145/3302425.3302467

7 Yuping Su. 2019. Research on Vehicle Control System Based on Speech Recognition Technology. In Proceedings of the 2019 The 2nd International Conference on Robotics, Control and Automation Engineering (RCAE 2019). Association for Computing Machinery, New York, NY, USA, 88–92. DOI:https://doi.org/10.1145/3372047.3372105

8 Understanding Microphone Polar Patterns. David Willard (2018). Azden. https://www.azden.com/understanding-microphone-polar-patterns/

9 Amazon Alexa / Amazon Alexa Auto Developer resources. https://developer.amazon.com/en-US/docs/alexa/alexa-voice-service/audio-hardware-configurations.html