Authenticating ‘low-end wireless sensors’ with deep learning + SDR

Nihal Pasham
Aug 3, 2019 · 5 min read
Machine learning with RF data. Software/firmware identities can be spoofed. An un-spoofable physical identity is needed.

Any device that emits radio waves (i.e. a radio) has a unique RF fingerprint. RF fingerprints are attributable to slight variations in hardware manufacturing tolerances i.e. no two transmitters are the same, each one emits a slightly different signal from the other, regardless of the make or model.

The reason: RF transmitters are composed of analog components such as digital-to-analog converters, band-pass filters, frequency mixers, and power amplifiers. A tiny variation in any of these hardware primitives will yield a slightly altered radio signal. In technical terms, ‘manufacturing variations’ affect the following properties of an RF signal — I/Q imbalance, phase noise, carrier frequency, and phase offset, harmonic distortions, power amplifier distortions, yielding unique RF fingerprints.

A simplified block diagram of an RTL-SDR (i.e. a radio ) and its components.

So, if every transmitter is bound to be different from the other, can we not make use of these un-spoofable distinguishing markers to uniquely identify a radio transmitter. In short, the answer is yes.

With that context, let's get to the whole point of this blog.

The Goal:

Explore the feasibility of an authentication system to uniquely identify a specific radio via its RF transmissions and provide an additional layer of security for low-end wireless devices.

Conventional RF fingerprinting involves the detection of a transient or steady-state signal and the extraction of the fingerprint i.e. build a database of RF fingerprints and use it to uniquely identify a transmitter. This process of manually extracting features can take time, requires detailed knowledge of the signals and can get complicated depending on the RF characteristics you’re trying to fingerprint, especially for low-end edge devices.

But if you think about it, fingerprinting is mostly about learning fine-grained patterns in hardware-specific imperfections. So, all you probably need is a good pattern detector, like a deep-learning neural network.

Machine learning has had remarkable success in image recognition with breakthrough advancements in deep-learning-based algorithms, chief among them being — Deep Convolutional Neural Networks. DCNNs (for short) pretty much form the backbone of many modern computer vision systems. But at their core, CNNs are just very good feature extraction engines (or in other words, they’re very good at pattern recognition). So, I thought to myself- why not apply DCNNs to our problem?

To my surprise, I stumbled upon some research with promising results in this exact area — http://www.ece.neu.edu/fac-ece/ioannidis/static/pdf/2018/radio_identification.pdf

What follows is an attempt at implementing a PoC based on this research. Preliminary observations — it classifies 2 of my test emitters with 97% accuracy, especially at distances of 10 ft or shorter (with cheap hardware).

The set-up:

  • Deep learning libraries: TensorFlow as my backend and TFlearn as the high-level API
  • For raw radio data collection: An RTL-SDR with a pretty basic antenna, pyRtlsdr, and numpy libraries and 2 standard garage door remotes operating at 433mhz
  • Programming environment: Visual studio code, github
  • Miscellaneous stuff: A bit of SDR know-how, a little bit of experimentation on a Jupyter notebook and a couple of hours of uninterrupted peace.

Steps:

  1. Collect raw radio data (I/Q samples) over multiple transmissions via the RTL-SDR hardware and pyRtlsdr library
  2. Label, prepare and store your data in a format that’s consumable by your neural network via the numpy library
  3. Define your neural network with tensorflow’s TFlearn API
  4. Train the neural network with labeled data for about 50–100 epochs
  5. Use the pre-trained model to make predictions
  6. Evaluate for accuracy.
A typical radio identification workflow with deep learning. Training and pre-trained models can be deployed on say an edge/gateway device.

The code for the PoC is available at — https://github.com/nihalpasham/fingerprinting_radios_w_ML. Includes scripts to

  1. Capture, prepare, label and format IQ data-sets
  2. Define and train a DCNN
Chosen DCNN (from the paper) includes 2 convolutional layers and 2 fully connected layers to classify 2 distinct radios

Requirements:

  • Any piece of low cost SDR hardware. (sub 1Ghz will do for most IoT stuff)
  • Any IoT edge device capable of running a deep learning model.
  • Robust RF data samples (i.e. samples should include everything from low to high SNR, temp variations, injected noise etc). A model is only as good as its data.

Benefits: no relying on

  • Higher-level authentication protocols
  • Or schemes involving encryption, challenge-response pairs, etc.
  • Or managing a database of stored credentials.
  • Or dealing with masquerading or impersonation attacks.

Challenges:

  • Accuracy drops progressively with an increase in distance or range
  • Performance or speed of the trained model needs some evaluation. Haven’t put the PoC through a full suite of tests.
  • The computational overhead for targets such as low-end edge devices. It’s just an early PoC for now, needs to be put through a full suite of tests — like in low SNR scenarios.
  • My RTL-SDR is a cheap 25$ dongle that doesn’t have the bandwidth resolution or the frequency range to capture/record high-frequency RF signals like BT, WiFi, etc. and struggles to sample data beyond 1 million samples/s with pyrtlsdr library.

Credits:

Deep Learning Convolutional Neural Networks for Radio Identificationhttp://www.ece.neu.edu/fac-ece/ioannidis/static/pdf/2018/radio_identification.pdf

RF Machine Learning Systems (RFMLS)https://www.darpa.mil/attachments/RFMLSIndustryDaypublicreleaseapproved.pdf

Data Driven Investor

from confusion to clarity not insanity

Nihal Pasham

Written by

Product Security | IoT Edge & Cloud Security | Security Strategist | Adversarial Resilience & Neural Networks

Data Driven Investor

from confusion to clarity not insanity

More From Medium

More from Data Driven Investor

More from Data Driven Investor

More from Data Driven Investor

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade