Diving Deep into Speech Accent Classification: A Case Study

Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model

Dmytro Iakubovskyi
Data And Beyond
Published in
2 min readMay 27, 2023


Photo by Kelly Sikkema on Unsplash

In this article, I describe the process of making a simple speech accent classifier. It is based on this Kaggle notebook with minor changes, such as using an updated facebook/wav2vec2-base-960h model pre-trained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. Full details of the analysis can be found in this public Kaggle notebook.

After training the full sample during 10 epochs (the process takes about 1 hour of NVIDIA TESLA P100 GPU available for Kaggle users), the accuracy has increased from 26% to about 92.7%:

Source: author, speech_accent_classification | Kaggle

Here is an example of a classification pipeline based on two audio samples (foreign and native):

Source: author, speech_accent_classification | Kaggle



Dmytro Iakubovskyi
Data And Beyond

Top writer in AI, Movies | Senior data scientist | Editor in Data And Beyond | https://www.linkedin.com/in/dima806/