Recurrent Neural Active Noise Cancellation

Flickr, CC BY-NC 2.0

In my previous post I told about my Active Noise Cancellation system based on neural network. Here I outline my experiments with sound prediction with recursive neural networks I made to improve my denoiser.

The noise sound prediction might become important for Active Noise Cancellation systems because non-stationary noises are hard to suppress by classical approaches like FxLMS. That’s why I tried simple two layered perceptron before and why I’ve tried Recursive network for this time.

The work is simple: I challenged the task of predicting the samples ahead and I use Leva’s Polka as a guinea pig. The task requires a lot of computational power, so I shrink the song to the first 5 seconds in order to be able to train RNN on my laptop without discrete GPU.

Waybackprop

Image from Magenta blog

The core idea of the network architecture was derived from the post in the Magenta project blog. The author has brought a spirit of multi-speed signal processing in RNN domain. The post describes the idea perfectly, it even has lucid illustrations, so I’m not going to compete with it. Although, I’m giving short description of the idea here.

Music or audio in general has long-term underlying processes so the model needs to be learnt with such long examples. In that experiment I made NN with 400 samples input width. It’s only 50 ms, other information NN took from previous outputs.

The main problem with RNN in music prediction (or generation) problem is the complexity of learning. Truncated backpropagation unwinds history of input samples and input states in tens or hundreds times. That means feed-forward path is tens or hundreds times faster to compute than feedback path.

The main idea is to build several layers of common Recursive Neural Networks cells,usually comprise cells which take state of the previous stage and input values and returns next state and output.

Implementation

As in the previous experiment, I used python 3.4, tensorflow 1.0, Linux Mint 17.2, no real-time experiments was conducted though due to extensive computational requirements.

I’ve made a three-layer deep neural network of GRU cells. GRU cells was implemented by myself too cause it simpler to integrate in the whole system. Each layer takes inputs from:

  • the output of the layer below, which is faster one (greater sample rate);
  • itself previous output;
  • output of the layer above.

The first layer takes raw samples with 8 kHz of sample rate, the width of input is 1 samples. The internal state is vector of 64 values.

Outline

I’ve also tried my own cells made of plain perceptrons, and it worked too, but the performance of GRU cells was better and I stopped on them. The most top layer didn’t give any sufficient value in that experiment, but I’ve left it anyway.

Results

The song’s part used for the experiment:

The original and reproduced samples (top) and their spectrum:

Another part:

The target norm was mean square error, the first epoch has 0.0176689 error norm, and the last 800th has 0.000453576 norm.

There’s no patents about that problem, neither I don’t have commercial plans so far. However, I’d love to have a full-time job devoted to that theme, and that’s why I don’t put the source code to github.com. However, may be someday I’ll release them.