cuSignal 0.13 — Entering the Big Leagues and Focused on Screamin’ Streaming Performance

Adam Thompson
Mar 24 · 5 min read
Image for post
Image for post

Authors, Matt Nicely, — NVIDIA
John Ferguson, Dan Bryant, Peter Witkowski — Deepwave Digital

What’s New in cuSignal?

Last fall, the development team their standing as the newest member of the NVIDIA-led GPU-accelerated data science project . If you are late to the party, cuSignal is a GPU-accelerated version of the popular Signal library, aiming at delivering GPU orders of speed to common signal processing functions.

Since then, we’ve been hard at work on bringing the code-base up to community and project standards, and have been engaging with the user-community, fixing bugs, adding features, building docs, and generally making stuff faster.

Notably, in cuSignal v0.13, we’ve added:

  • Anaconda installation support — If you’re a conda user on a Linux system, simply type:
# Stable - Coming soon in RAPIDS version >= 0.13
conda install -c rapidsai -c conda-forge cusignal
# Nightly - Available now
conda install -c rapidsai-nightly -c conda-forge cusignal
  • Enabling an aarch64 build — While Jetson users will still need to build the library from source, we’re actively working on enabling an aarch64 build. Windows users will need to continue to build from source.
  • GPU-accelerated acoustics — If you’re interested in source separation or speech signal processing, we’ve added support for both the real and complex cepstrum.
  • Speed, speed, speed — If you’ve been , we overhauled our Polyphase Resampler (resample_poly/upfirdn) to leverage raw CuPy CUDA kernels rather than Numba CUDA kernels. This change results in a ~2x speedup and is enabled by default.
  • Full integration into RAPIDS — cuSignal follows the overall project’s coding, documentation, project planning, and CI/CD standards.

Online Signal Processing with cuSignal

If you’ve taken the time to browse the , you may have thought, “Geeze, these speedups are compelling, but they’re all for really large signals; I don’t have 100 billion samples to process!”

We hear you.

Many signal processing workflows occur online, meaning that samples are streamed from a sensor to the GPU as the application is running. In order to reduce the probability of data loss during streaming, small buffers are used to facilitate data I/O from the sensor to the data processor. cuSignal implements a between CPU and GPU to reduce the overhead of this data movement.

Interfacing directly with a sensor has long been viewed as the domain of an FPGA; while FPGAs have significant value propositions with respect to ultra-low latency and deterministic performance, they’re often difficult to program. By enabling signal processing workflows to be built from Python, cuSignal drastically reduces the learning curve to building real-time solutions. By using cuSignal 0.13 and its refactored polyphase resampler, one of the RAPIDS contributors, , was able to achieve real-time performance on their , a GPU enabled software defined radio (SDR), at 62.5 MSPS of complex64 samples for a total of 3.7 Gbps throughput.

Addressing Streaming Signal Processing Challenges with Deepwave Digital’s AIR-T

Image for post
Image for post
Image of Deepwave Digital’s Artificial Intelligence Radio Transceiver (AIR-T)

’s leverages the embedded NVIDIA Jetson TX2 GPU and has a tunable radio front end that can receive and transmit signals anywhere in the 300 MHz to 6 GHz range. Since the Jetson series of GPUs share a memory space between the Arm CPU and respective GPU, zero-copy data movement can be used to reduce latency compared to traditional GPU-accelerated SDR platforms, making the AIR-T an ideal platform for streaming signal processing applications.

Image for post
Image for post
Comparing workflows of an SDR with Discrete GPUs and Deepwave Digital’s AIR-T

cuSignal on the AIR-T

Deepwave and the RAPIDS cuSignal team have been working closely together to improve the online performance of cuSignal, specifically focused on the polyphase resampler. In the below, we demonstrate the real-time performance of the resample_poly on the AIR-T, showing an 8x speedup over the embedded Arm CPU — all from Python with an almost identical API.

Video showing performance of resample_poly on Deepwave Digital’s AIR-T

For more information, including step-by-step instructions for getting cuSignal installed on the AIR-T, read the tutorial . Below is a quick look at the simple Python code to implement the improved GPU polyphase resampler on streaming data.

import simplesoapy, cupy
import cusignal as signal
from matplotlib import pyplot as plt
buffer_size = 2**19 # Number of complex samples per transfer
fs = 62.5e6 # Sample rate
# Create polyphase filter
fc = 1. / max(16, 25) # cutoff of FIR filter (rel. to Nyquist)
nc = 10 * max(16, 25) # reasonable cutoff for our sinc-like function
win = signal.fir_filter_design.firwin(2*nc+1, fc, window=(‘kaiser’, 0.5))
win = cupy.asarray(win, dtype=cupy.float32)
# Init buffer and polyphase filter
buff = signal.get_shared_mem(buffer_size, dtype=cupy.complex64)
s = signal.resample_poly(buff, 16, 25, window=win, use_numba=False)
# Initialize the AIR-T receiver
sdr = simplesoapy.SoapyDevice(sample_rate=fs, channel=1, auto_gain=True)
sdr.freq = 1350e6 # Set receiver frequency
sdr.start_stream(buffer_size=buffer_size)
# Run test
sdr.read_stream_into_buffer(buff)
s = signal.resample_poly(buff, 16, 25, window=win, use_numba=False)
sdr.stop_stream()
# Plot signals
plt.figure(figsize=(7, 5))
plt.subplot(211)
plt.psd(cupy.asnumpy(buff), Fs=fs, Fc=sdr.freq, NFFT=16384)
plt.ylim((-160, -75))
plt.title(‘Before Filter’)
plt.subplot(212)
plt.psd(cupy.asnumpy(s), Fs=fs*16/25, Fc=sdr.freq, NFFT=16384)
plt.ylim((-160, -75))
plt.title(‘After Filter’)
plt.show()

Final Thoughts

We’ve been humbled and amazed by the growth and interest in cuSignal. As always, open-source projects are driven by each of you, so don’t hesitate to use to ask questions, file feature requests, submit PRs, and report bugs. Further, Deepwave Digital is hosting an online Webinar on March 25th that dives into more detail about the AIR-T and cuSignal integration. You may register for this event and it will be available for streaming after.

Be sure to join the conversation on Twitter as well! The hashtag is a good way to keep up with the latest and greatest GPU accelerated signal processing conversations.

We know these are troubling and uncertain times. From the cuSignal, RAPIDS, and Deepwave Digital teams, we’re wishing you and yours happiness, safety, and health.

RAPIDS AI

RAPIDS Everywhere

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store