Fun with Software-Defined Radios: Mapping the Spectrum in the Mission, SF
Written by Nick Penaranda, Software Engineer at Beep
TL/DR: We’re scanning the RF activity near our office in San Francisco. The data is live at spectrum-map.beepnetworks.com.
We recently got our hands on some software defined radios. Unlike hardware radios, certain functions are not permanently “hard-wired” in SDR’s but rather can be controlled or implemented in software.
For example, let’s compare an FM radio module like this one and a simple, hypothetical SDR. The radio module has specialized circuits that enable it to tune to a certain frequency, receive and process those signals, then output an audio signal. In contrast, the SDR may consist solely of an antenna and an analog-to-digital converter, outputting only raw waveforms that must still be amplified, filtered, demodulated, and so on, before you have the same audio signal.
It seems like the SDR has only succeeded in creating more work for us, but it is precisely the lack of implementation that empowers SDRs: We are now no longer limited to the FM radio module’s operating band, nor frequency modulation, nor audio signals! [In reality, SDRs have their own limitations, conveniently omitted here.]
After playing around in gnuradio and gqrx — snooping around on local transit channels and listening to volleys of Wi-Fi traffic — we thought it’d be interesting to systematically survey the RF environment around our office. We automated the process and made a small Heroku app that lets us browse the data more effectively. Check it out here or read on for more details.
The Nuand BladeRF x115 is a relatively inexpensive SDR that has a generous 28MHz of bandwidth, tunable from 300MHz to 3.8GHz. You can do lots of things with this SDR, many of which would make the FCC very upset…
The BladeRF is connected to a Macbook Air running a modified version of osmocom’s spectrum_sense, sweeping the entire UHF band about once every 7 minutes, in 10MHz chunks. For each chunk, we tune to the center frequency, dwell for .25 seconds, then sample for 1 second. We then Fourier transform the wavelet into the frequency domain in 100KHz bins and dump that to a file. Repeat forever. This should give us nice clean cross-sectional data of the RF environment, right?
Well, that was the plan, anyway.
In reality, there are a host of problems with this approach, or at least with our expectations of the data we’d get. osmocom_spectrum_sense actually steps through the spectrum in 7.5MHz steps and discards the bins at the “tails” of each chunk. Furthermore, bins that lie on the boundary between two chunks are sampled and reported by both chunks, and often with different measured powers. Another problem with the raw FFT data are the large peaks at the center frequency bin. This is the transposed DC component that osmocom_spectrum_sense (more specifically, the bin_statistics_f block in gnuradio) does not remove. By the time we’re dealing with FFT data, correcting the spurious peaks becomes impossible. There are ways to correct this at the SDR level, but we couldn’t be bothered. So in our raw FFT data, we have arbitrary peaks at every chunk’s center frequency.
Taken together, this means we have pretty rough-looking data that is essentially meaningless at the level of granularity we collected it at. We could have gone through and processed the resulting data to average the bins at the boundaries, but that doesn’t really fix anything: It becomes average power in that particular bin between two sampling periods, plus whatever sideband/filtering effects from both chunks. What ground truth does this represent?
Also, the signals are intermittent. Each 7.5MHz chunk is only being sampled for one second every seven minutes. That’s like scanning a 270 megapixel (that’s roughly 16400x16400, by the way) video feed for interesting but briefly visible objects, one 86x86 square at a time, in realtime! You are bound to miss the vast majority of said objects. And I haven’t even mentioned antenna characteristics, but don’t worry, even more bad news there. The point here is this:
This data is basically useless as it is. GIGO and all that.
Ok, so what questions can we answer? Really, we just want an idea of what bands are being used, and when. We don’t actually care about the absolute power at that band, or even power in one band relative to another (more on this later). We look simply for non-zero powers, or any variation in a given band. Thankfully, we can see those features in this data. Sample for long enough and even those short, intermittent signals will show up as a bump in one of the surveys.
But before we start looking for these patterns, we must first process the data to a level of granularity appropriate for the questions we want to answer. Fortunately, the FCC has divided the UHF band into subbands whose widths are typically in the order of megahertz. We can re-bin our raw FFT data into larger, more stable bins by averaging groups of bins together and calling that value the “average spectral power centered at X frequency.” We average away the peaks and duplicate bins, and keep a simplified cross-section of the UHF spectrum.
The graphs below plot three consecutive surveys with four different bin sizes (ignore the numbers at the top of each graph). As we increase the bin size — and consequently, reduce the number of bins — we smooth out the DC peaks as well as make our measurements more stable over time. But, past a certain point, we begin to smooth out useful information too and approach a single scalar value for the entire spectrum. Again, useless!
As a convention, we call each pass through the entire UHF band in a single point in time a survey, and each FFT bin (raw or aggregate) a sample. The raw FFT data consists of 27,359 samples over 360 chunks (that would be 27,000 samples even, if not for the duplicate bins between chunks). We arbitrarily decided to reduce this by a factor of 400, 200, and 100, providing us with 68, 136 and 273 aggregated samples respectively.
We built a small Heroku app in Go to collect and view the data we’re collecting. The server code is written in Golang and the frontend uses bootstrap and Chart.js to display both cross-sectional and longitudinal spectral power data. The interface is admittedly bare but you can get a rough idea of what’s going on:
For any given survey, say, this one, we have some distinct peaks that indicate activity at or near that frequency. The labels in the graph above are rough estimates of where each peak or clump of peaks are.
The NTIA provides a U.S. Frequency Allocation map that we can use to get a rough idea of who or what is using each band:
The peaks we see at ~300MHz and 390MHz are government fixed and mobile radios. If we use a program like gqrx, we can look around more closely to see what’s actually going on in this band:
Not much going on here, just a few weak control signals… Let’s look at one of the more energetic bands:
From the NTIA chart we see that we are in the broadcast TV band. Searching through this list it appears that this is channel 29, and that constant signal at 560.31MHz is the ATSC pilot signal.
In the 800MHz range, we find lots of public services communications channels. Below, each column is a different channel and each line segment is someone keying their radio:
Unfortunately, our BladeRF is busy collecting data and these shots were taken with an RTL-SDR that has a max frequency of ~1.7GHz, preventing us from looking at cellular and Wi-Fi traffic. The takeaway here is that our wide-band surveys give us a very rough idea of where activity is, then we can make another, more detailed “live” pass with the bandwidth-constrained SDR.
What we learned, or: What not to do.
I hinted earlier that we discovered lots of problems with our approach. The biggest of these is the fact that the UHF spectrum is huge with respect to the bandwidth of the signals found on it. And, having only one SDR scanning the aether, we miss lots of activity that isn’t continuous (that is, basically all RF communication). Given the constraint of one scanner, we are forced to balance two opposing parameters: The quantity of surveys versus the “sensitivity” of each individual survey to intermittent signals. After looking at the data we’ve collected to date, it seems prudent to increase our dwell time, thereby increasing sensitivity and lowering the number of surveys we get per day.
The solution is to buy more SDRs and have them scanning different parts of the spectrum at different times. The ideal and completely impractical solution is to have 270 radios constantly listening and reporting activity in each 10MHz chunk. At that point, the benefit of using SDRs is mainly bragging rights and nothing else.
The other big issue preventing us from getting more utility out of the data is the receive antenna attached to the BladeRF. Simply put, the antenna we are using is less than ideal for this sort of use case. It has low-gain (read: non-directional) and is narrowly tuned to a specific frequency: 2.4GHz Wi-Fi. Because this was just intended as a side project, we didn’t want to spend hundreds or thousands of dollars for a good ultra wideband, omnidirectional antenna. Instead, we committed a crime of convenience and grabbed the first antenna we found in the lab, a reference antenna from Texas Instruments tuned to 2440MHz.
After finding some suspiciously quiet bands where there ought to be activity, we looked more carefully for a suitable antenna. As it turns out, a Wi-Fi hardware startup only has Wi-Fi tuned antennas on hand. So, we built one. Out of pennies. And some coax.
As funny as it may appear, this is a legitimate design called a planar elliptical (or disc) dipole antenna. We switched to this antenna at roughly 5pm on Aug 31, and the antenna characteristics between the two are easily visible between the surveys immediately prior to and after the switch:
Of course, this antenna isn’t better, per se, just more appropriate for our goals. We end up attenuating some signals in order to strengthen others. Our goal is not to extract data from or measure the power of any given signal, only to detect them.
But say we did want to assign meaning to the power values we’re seeing at these bands. To do that, we would need to characterize the antenna. Roughly, the process involves sending a signal to the antenna and measuring the voltage we send out along with that voltage that gets reflected back with special instruments at the antenna’s input throughout the range of frequencies we are interested in. Characterizing the antenna informs us of its sensitivity at various frequencies. We’d then normalize observed spectral power by scaling each frequency bin by the sensitivity in that band. Then, we have relative power between bands! But, we are neither equipped nor motivated to do that for this project, so maybe later.
Even if we did normalize spectral power with respect to the antenna characteristics, our ability to draw conclusions from such data is still limited. A host of interference effects that spuriously increase or decrease observed power at the antenna would prevent us from making even basic assertions about the source of the signal, such as its distance from our surveying machine. These are the same effects responsible for that mysterious one-foot square in your home where you get inexplicably dismal cell reception, or rhythmic bursts of static on the radio when you’re driving down the road (a phenomenon called “picket-fencing”).
Looking back, we severely underestimated the difficulty of conducting a wideband RF survey. Our naivete was not apparent until we tried to coerce meaning from the data. But through failure, we learn.
So what are we going to do with this data? For now, we’ll continue collecting it, just in case. We envisioned a service that overlays a heatmap of RF activity over San Francisco — similar to cell phone signal strength maps — but for a range of frequencies and services. One where you could point to a city block and say, “There’s an LTE tower here,” or “This building must be a call center.” Though we are convinced that the only conclusions to be drawn from this data, as it is, are very general and obvious, it has the potential to grow, either through more useful transformations and analyses, or simply more data.
We built this system partially out of curiosity, partially for fun, but mostly as a learning exercise. Understanding the pitfalls of RF communications will be extremely important as we move forward with other projects. With that in mind, this project — flawed as it was — provided invaluable lessons. Sometimes the best way to answer the question, “Why doesn’t this exist?” is to try and build it yourself.
That said, we still have this neat backend to ingest data and possibly do more useful things with in the future. If you’d like to help or have ideas for improving our process, we’d love to hear from you. Send us a note at firstname.lastname@example.org.