Voice As a Biometrics in Multi-Factor Authentication Scheme for Enterprises

Duc Duong
5 min readFeb 13, 2021
Photo: LuckyStep/Shutterstock.com

As more and more information are being moved to the cloud, there is an urgent need for even more secure methods to authenticate users. The current solution for passwords and authentication are insecure and could be hacked very regularly. Biometric based authentication provides a promising solution in the field of authentication processes because unlike the traditional methods like PINs, passwords which provide indirect authentication based on something a person knows, biometrics give direct verification based on the examination of an attribute of person him or herself.

In this article, I propose a solution for a multi-factor authentication scheme based on voice biometrics that can bring better experiences and better security for enterprises, especially in banking and financial services.

A LANDSCAPE OF BIOMETRIC TECHNOLOGY

Biometric based technology (also biometrics) is the technology that verifies or identifies individuals by analyzing a facet of their physiology and behavior (e.g., voice, fingerprint, face, iris, etc…). Biometric technologies can be divided into two categories, active and passive biometric authentications.

In the active methods of biometric authentication, the user is required to physically participate in the authentication process by taking an action like speaking, placing a finger or eye in a scanner device.

Analyze physical characteristics

In passive methods of biometric authentication, the user is identified without his/her active participation as during the authentication process, the system compares the user’s print like voice to his/her voice print registered.

Now, many mobile banking applications can even quietly track the user behavior via pattern like geographic location, typing cadence, etc. While there are more and more new biometric methods are emerged, banks and financial services are demanding for technologies that are easy-to-use and secure. Among the current biometric methods, voice biometrics is the most secure one to deploy.

Voice biometrics is utilized for voice recognition through analysis of an individuals pitch, speech, voice and tone. In terms of security, it has a great advantage over traditional methods like password or PIN, as they can be easily tracked or hacked while voice of an individual is an distinctive and unique as fingerprint. In voice biometrics, a voice is recognized in two patterns, namely text-dependent and text-independent.

Why need multi-factor authentication scheme? While voice biometrics are unique, there can be risky associated with using instead of passwords. As voice biometrics can’t be changed like a password can, so if they are leaked, there will many serious consequences. Therefore, voice authentication may not be good as a standalone method for authentication users, but it could be a good option as part of a multi-factor authentication protocol. This way, if passwords are obtained, the hacker will not be able to bypass the biometric scan and access the user’s data. In other words, multi-factor means more user choices and better security.

Combining two or more biometric modalities improves security and confidence

AN OVERVIEW OF VOICE AUTHENTICATION METHODS

The voice authentication system consists of a client side application and a server. The client application can be on a mobile device, or personal computer. The system will need to access to the recording mechanism on the device on the client side and be connected to the internet. The client side application will make APIs call to the server to authenticate users, sending over the recorded audio files over the internet.

The server will analyze the audio, fingerprinting it and determining whether or not the user is the one that he or she is trying to authenticate as. The server stores the voice features associated with a user’s account through an enrollment phase where multiple audio samples from one user are utilized to extract features and generate a general voice-print.

Voice-print is stored utterance of human voice used to verify a person’s identity. A voice-print is often captured by a biometric software. As voice-print is unique like a fingerprint, it can be used to verify and recognize a person.

The core of a voice authentication system is voice recognition or speaker recognition part. There are two common recognition tasks: speaker verification (determining whether a speaker’s claimed identity is true or false) and speaker identification (classifying the identity of an unknown voice among a set of speakers). Verification and identification algorithms may require the speaker to utter a specific phrase (text-dependent recognition) or be unknown to the audio transcript (text-independent recognition). To verifying or identifying, a common method is mapping the utterances into a feature space where cosine distances correspond to speaker similarity.

Generally, there are two approaches for speaker recognition. The traditional approach entails using i-vector and probabilistic linear discriminant analysis (PLDA). The steps are collecting sufficient statistics, extracting speaker embedings (i-vector) and classifying using PLDA. The statistic method often uses Gaussian Mixture Model (GMM).

Another modern approach is using deep neural network (DNN) acoustic models to extract sufficient statistics. The high-dimensional statistics are converted into a single low-dimensional i-vector that encodes speaker identity and other utterance-level variability. Then, a PLDA model could be used to produce verification scores by comparing i-vectors from different utterances.

THE COMMON USE CASES OF VOICE BIOMETRICS AS A MULTI-FACTOR AUTHENTICATION

As analyzed, voice biometrics should be used as one of the controls in a multi-layered control framework designed to protect processes and transactions. Depending the risk assessment for the transactions or processes, there are typical use cases for voice biometric authentication: Contact Center, Agent-assisted where customer is identified, authenticated and routed to a live agent by default for processing their request; Interactive Voice Reponse (IVR): automated, where customer is identified, authenticated and routed to the appropriate call flow for processing their request, this use case involves no human interaction; and finally, Mobile App Authentication: automated.

Let’s take an example of the Mobile App Authentication use case. Theoretically, we can combine voice biometrics as the first-layer for authentication, and one another biometric methods (like facial recognition or fingerprint scanning) or a traditional method (like PIN or password) as the second-layer for authentication.

Two-factor Biometrics Authentication Scheme

For the voice recognition part, there are different services providers allow developers to integrate voice authentication into their applications. Developers can also build their own speaker recognition services based on a sufficient statistics model like Gassian-Mixture Model (GMM) or from neural networks and deep learning approach.

Source: Duc Duong, The CuderWorld Magazine

Please feel free to reach out to me at [qduc] [dot] [duong][at][gmail][dot][com] for further discussion.

Author

Duc Duong (Twitter)

--

--