Sound Pattern Recognition with Python

Sound Pattern Black & White

As you can probably tell from the title in this post I will be toying around with python and sound to detect sound patterns. More specifically, knocking patterns, like the ones you make when you knock your door.

Knock Knock

Before we can talk about knock sounds and patterns, first we need to know how sounds are represented on a computer: at a very basic level, sound can be thought as a function of pressure over time, so we can represent a sound in a 2D graph, suchs as the one below.

The image graphs a sine wave with a frequency of 440 Hz and a sampling rate of 44100 Hz. We didn’t yet talk about sampling but we will below.

Although it may appear that this sound is a continuous function, it isn’t. To store sound on a computer we do a little trick: we sample it. The sampling process is very simple, we simply choose a frequency, the sampling rate or fs, and at each second, we take fs samples of the air pressure and create a vector of size fs x time in seconds. So our sound is just a very big numeric array.

Knock Knock?

But how is that going to help us detect knocking patterns?

Well, what defines a knocking pattern is the time between one clap and another and we just have to figure out a way to determine the time between the knocks based on our sound array. We will use this knock.wav sound for our first test.

Below is the graph of the knock sound file.

Notice how it is clear from looking where the knocks are, based on this our problem is reduced to: finding peaks on the sound signal and calculating the distance between them.

First we need to read the .wav file as a python array, assuming knock.wav is on the same directory as your script, we can do this with scipy:

To detected the individual knocks, we need two things:

  • A minimum value for a point in the signal to be a peak
  • A time distance to cluster all this peaks as a single knock, we will call this merging size

We need to use the sampling rate to transform seconds into an array index, as for the merging size, the choice is rather arbitrary and should be made taking into consideration the loudness needed for the sound to be considered a valid knock.

Now the juice of the algorithm:

Here we are doing two things, first we look for any value exceeding our minimum value to be considered a peak, and every time we find such a value we consider this value and any value for the next 0.1 seconds as one individual knock.

And the time of occurrence of this knock is the middle point of the focus. Lastly, we take the difference between all these focuses print then.
For the sake of modularity, let’s wrap this on a function:

Comparing the Knocks

Suppose we have a second knock sound, like knock2.wav here:

If we calculate it’s focuses, we have:

Something interesting: our code shows that there are 4 peaks, but if you look at the graph, you can clearly see five peaks, this happens because the claps are faster than what our merging size can handle. There are ways to fix this, the first one would be to tune the parameters and another would be to change the peak-detecting algorithm to a more complete(and complex) one.

Ok, we’re almost there… To compare the desired pattern (knock.wav) with the test pattern (knock2.wav) we can do this:

We can see that the test was accepted, and if you listen to both, you will notice that the two first claps are fairly similar in distance on both of the sounds despite the second sound having more claps.

As you can see, it actually works, although it’s not detecting patterns to the microsecond, it’s good enough for a toy secret knock eletronic lock or any other simple application. There are many possibilities on the field of sound processing and python surely is useful for it, and I hope that you liked this as much as I did.