Kim Hendrikse
10 min readDec 17, 2023

TDOA Sound Localization with the Raspberry Pi

TDOA Sound Localization is determining the source of a sound when all you know is the differences in the time of arrival of the sound event (Time Difference Of Arrival).

Sound travels quite slowly, around 343m per second at 20 degrees celsius. This makes it possible to determine where the source of sound comes from on a Raspberry Pi to a pretty good accuracy if the clocks on the Pi’s are well synchronized.

TDOA sound localization requires the sound’s arrival times at a minimum of three recorders for 2-dimensional localization. Generally, with three recorders, there can be two possible solutions. However, four recorders are necessary to pinpoint a single solution. In practical terms, though, three recorders are often sufficient to determine the best localization solution..

As luck would have it, I developed a project on the Raspberry Pi that achieves this, my “StalkedByTheState Autonomous Recording Unit” project or sbts-aru.

StalkedByTheState is the name of my suite of security programs to which this belongs and an Autonomous Recording Unit is a bio-acoustic term or a recorder that runs standalone on batteries.

With the addition of a cheap GPS, sbts-aru keeps the time on a Pi synchronized to typically less than 1 microsecond of error, even when disconnected from a network, such as running on batteries, and will run on all Raspberry Pi versions that support an SD card and a USB port. Such as the setup in this picture:

sbts-aru
Portable sound localizing recorder unit

So the accurate timekeeping has been setup, all that’s needed now is a USB-connectable microphone to get going.

sbts-aru initiates a recording program that:

  • Writes Timestamped Files: Creating timestamped files in FLAC format, organized within a date-stamped directory and file structure.
  • Recording Precise Time: Alongside each audio file, a tracking file is written. This file records the exact time of each sound buffer’s arrival.
  • Date structured directory: For instance, today’s recordings would be stored in a directory named:
/home/pi/disk/audio-sbts1/2023/2023–12/2023–12–17

That contains files such as these:

2023-12-17_13-49-20.906033--audio_sbts1--2023-12-17_14-09-21.074827.flac
2023-12-17_13-49-20.906033--audio_sbts1--2023-12-17_14-09-21.074827.tracking

And the tracking file could contain something like:

0 13-49-20.906033
1 13-49-20.952503
2 13-49-20.998936
3 13-49-21.045414

Which records the precise time of the arrival of each buffer of sound in the .flac file.

With the above we can accurately determine the start of the arrival time of sound at each recorder and in term determine the differences in arrival time for each of the recorders.

Enough of the details, lets have a look at a field test.

For the field test, I prepared 4 sbts-aru units that run on batteries and chose a suitably representable site in which the sound that I generate could be heard at all 4 recorders. In the absence of using something loud but illegal like a gun or big fireworks, I used an air horn.

Air horn
Not as loud as you might think

After walking around for hours to distribute the recorders such that the distances would be in the vicinity of 700m, I discovered that these air horns are not as loud as I had thought. In order to hear the horn on all 4 recorders I had to use a relatively small area of around 200m².

I found a field in the small village of Guttecoven in the Netherlands, which I was careful not disturb.

Guttecoven
Guttecoven

to place the recorders around.

sbts-aru
A recorder

Super. By this time I was fairly tired so I went to an interesting location within the recorders and let it blast. Just once!

As a note, TDOA Sound Localization works best within the polygon of the recorders and less well outside of this. Empirically I found that areas extending a bit outside of a vertex also has increased range of good accuracy. It all depends of the rate of change of the differences in time arrival in the direction you place the sound source. As the differences are all the localization has to go on, the iterative process that determines the sound will stop once further movement would not longer result in sufficient difference in the arrival times.

Okay, we will also need to know the co-ordinates of each of the recorders, I setup a script to run every minute that logged the location with the provided “get_location.sh” program so I could determine this later.

Nice, we are ready to do the magic. The next step is an important step, it’s the key to it all. For this manual Sound Localization example, we will need to determine the start time for the air horn blast as heard by each recorder. There are various tools that can do this such as audacity, but my favorite is Raven Lite from the Cornell Lab of Ornithology. Raven Lite is a truly lovely program for this purpose. We will examine the sound wave and the spectrogram of the recording in order to determine exactly when the sound of the air horn was received for each recorder.

Recorder #1

When the file is loaded, Raven Lite will show you both a waveform and a spectrogram view. Here we can see the air horn shown in the spectrogram.

Raven Lite
Initial zoom

I don’t use so many of the controls in Raven, but the ones I find are indispensable are the zoom buttons at the top.

Raven zoom

The second and third buttons zoom the x-axis and the last two buttons zoom the vertical axis. The waveform and the spectrogram view can have their zoom on the y-axis altered independently.

Let’s zoom in on the spectrogram. This is the most important zoom. Highlight in the waveform graph the area of interest and click on the first button, the one with the red square (Double clicking this button reverts the zoom).

Your first zoom might look something like this:

Small zoom
Greater zoom

We can successively zoom on the X and Y axes until we have a view that allows us to accurately determine the start time, preferably to sub millisecond level. That would be the 4th digit to the right of the decimal when zoomed sufficiently.

Big zoom
Zoomed right in

The above picture is zoomed in sufficiently. Note, the spectrogram is calculated over a window in term, a range. So using the spectrogram alone with the default settings is not the best way to determine the start time, use it as a guide and use the waveform graph as well. The spectrogram can appear to be showing the start time of the event before it actually started, but the waveform graph will show you reality. You can see that I’ve clicked the time line such that it is at the point where the waveform starts to show strong variation from the previous portion. Check with the left and right arrow that the precision of the point you are marking is changing the 4th digit, i.e. that you are selecting a time point in the sub-millisecond range.

Once you have your time selection, take note of the time in the botton left box you see above. In this case, it has the value “76.4207”. This is the offset in seconds from the start of the file as perceived by Raven Lite.

We need to map this to the actual time in seconds of the buffer of sound in which this section was found. You would expect them to be the same, however life is not perfect and in reality jitter in the system is such that if you did that, then the calculated time of arrival could be out by several milliseconds. Instead, this value is used together with the tracking file to found out the actual time of the buffer of source as close as possible to when it really happened. We do this with the provided program “gps_event_time.sh” as follows:

$ gps_event_time.py 2023-09-17_15-48-32.101997--audio_sbts1--2023-09-17_15-53-32.102047.tracking 76.4207
2023-09-17_15-49-48.522301

The parameters we pass are the path to the tracking file we are using and this offset in seconds from the window on the bottom left in Raven Lite at the point we selected for the start of the time event.

Great! We have out first recorder’s time of arrival:

2023-09-17_15-49-48.522301

Recorder #2

At this point, I’m going to diverge slightly from the analysis of this particular field test. If is often the case that in one of the recorders the sound event is clearly visible but in one of the other recorders it is barely detectable. In cases like this, which are very common when localizing events like fireworks explosions over long distances, that you need a little help in finding the event in other recorder’s sound files. We need to zoom to a place to start looking.

We know the exact time from the perspective of one of the recorders, it can be helpful to know the time offset for this time in the recording file of one of the other recorders. We can make use of a helper program provided “time_diffs.sh”. This program finds the differences in time from the time stamps provided to the standard input. If we provide the start time of the second file in here along with the time stamp of the event time, then we can get the offset in seconds for this time in the second file, or a hint as to where to start looking first.

$ time_diffs.py 
2023-09-17_15-48-40.911472
2023-09-17_15-49-48.522301

0.000000
67.610829

Note, this is also useful if we do not wish to localize with map co-ordinates but wish to use relative co-ordinates in some program of our own.

In this case it suggests that if we look around the 67 second mark in the second file the air horn blast should be somewhere around there.

If we look at the zoomed out graphs it is indeed somewhat difficult in this recording to find the air horn blast. It looks like this:

Blast2 full zoom
Where is the air horn blast?

But if we use the hint and look around the 67 second mark now we can clearly recognise the air horn blast:

Blast2

We now zoom in and select the precise time of the start of the event as before for this and the last two blasts and we have a collection of the times of arrival of the air horn blast at each of the recorders as follows:

sbts1: 2023-09-17_15-49-48.522301
sbts2: 2023-09-17_15-49-48.810330
sbts3: 2023-09-17_15-49-48.710824
sbts4: 2023-09-17_15-49-48.543376

Great! Almost there. We need to augment those times with the co-ordinates of their respective recorders in order to provide the input for the localization program. Then we run the localize_event.sh program with the temperature in celcius as it’s argument and with the augmented TDOA’s as in:

$ localize_event.sh 20
Enter GPS coordinates and timestamps. Press enter twice to finish.
51.014131667,5.813736667 2023-09-17_15-49-48.522301
51.015373333,5.811656667 2023-09-17_15-49-48.810330
51.016368332,5.814084879 2023-09-17_15-49-48.710824
51.015238333,5.815941667 2023-09-17_15-49-48.543376

Location: 51.014903961182846,5.814442712946816

Web links:

OpenStreetMap: https://www.openstreetmap.org/?mlat=51.014903961182846&mlon=5.814442712946816#map=15/51.014903961182846/5.814442712946816

Google Maps: https://www.google.com/maps?q=51.014903961182846,5.814442712946816&t=h&z=15

And out pops the localized location as co-ordinates and map links for each of openstreetmaps and google!!

How did we do? Well, the recorder locations were approximately as follows:

Recorder locations
Microphones placed on the 4 corners

I took a screenshot of where I was standing in google maps after blasting the horn here:

Where it was
Where I blasted the air horn

And the following screen shot is where it localized to. I’m afraid that I neglected to take note of the actual co-ordinates of the toot at the time, but I estimate the difference to be about 1–2m from the calculated location:

Calculated
Where the math said it was

That’s exciting!

After this I got together with bunch of friends so that we had 6 recorders spread out over various parts of Limburg in the Netherlands and had fun localizing the many illegal fireworks that are regularly set off. They became our test data to gain insights into the range and accuracy. The results are pretty impressive, I have been able to accurately localize the explosions over distances of several kilometers. In one localization, the result was indicated to be a car park in the center of a nearby town. I hopped in the car and drove there, some 30 minutes later I could still smell the heavy smell of sulphur.

In another closer localization I found a launch container was found just 1–2m meters from calculated launch location and exactly where eye witnesses reported it to have occurred.

In other localizations a recorder more than 5 kilometers away helped determine the source location. We started to see patterns, repeated explosion source locations. Many of these seemed to be some kind of mortar with a small initial explosion and 3 seconds later a very large one. We could in some cases localize both the launch and the air burst. These were consistent with expectations based on the location and predominant wind direction.

I hope that you enjoyed this article and will be inspired to get together and localize for yourselves. The software for this project installs with just one command on any Raspberry Pi.

The code and instructions are in my github repository.

In future articles I will go into the details, outlining the architecture of the sound recorder and more specific details of the commands.

Kim Hendrikse

I’m a die hard computer Nexialist and author of the computer vision powered home security software “StalkedByTheState”.