Diving Deep into Speech Accent Classification: A Case Study

Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model

Published in

Data And Beyond

2 min readMay 27, 2023

In this article, I describe the process of making a simple speech accent classifier. It is based on this Kaggle notebook with minor changes, such as using an updated facebook/wav2vec2-base-960h model pre-trained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. Full details of the analysis can be found in this public Kaggle notebook.

After training the full sample during 10 epochs (the process takes about 1 hour of NVIDIA TESLA P100 GPU available for Kaggle users), the accuracy has increased from 26% to about 92.7%:

Source: author, speech_accent_classification | Kaggle

Here is an example of a classification pipeline based on two audio samples (foreign and native):

Diving Deep into Speech Accent Classification: A Case Study

Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model

Written by Dmytro Iakubovskyi