Experiment: finding objects with a Neural Network

Published in

techburst

8 min readOct 22, 2017

Last week I talked about a localization problem for RFID networks where the aim was to find tags based on the signal strength of one or multiple sensors. This can be achieved using trilateration.

Because I’ve read that Neural Networks are able to solve almost any problem, I’d expect that a finding objects using sensors must be a walk in the park ;-)

Goal of this experiment is to see how well a Neural Network will perform, and to learn a thing or two about Machine Learning.

You can find the full source at github.com/stetelepta/

Question: can we find the location of point P by measuring the distances (signal strengths) to the sensors?

The plan

Setup environment
Prepare the data
Setup the Neural Network
Experiments:
- Playing around
- Tuning hyper parameters
Conclusion

1. Setup environment

The main libraries for this experiment are:

NumPy, for number crushing and matrix calculations
Keras, a high-level deep learning library for python
Matplotlib, and Seaborn for making nice charts
HDF5, a file format to save the trained models

I use virtualenvwrapper and pip to install the required libraries in a virtual environment for this python project.

pip install -r requirements.txt

2. Prepare the data

Generate random points

Training a neural networks requires a lot of data and for this experiment I generate random points in a 2D grid.

In a real world application one has to deal with noisy sensor data, but for now I use this simplified simulation.

Helper functions to generate and plot random points

Now you can create as much data points as you need, yay!

# generate m=100 points in a 10x10 grid
points = get_simulated_points(100, 0, 10, 0, 10) # shape: (2, m)# visualize points
plot_scatter(points, 10, 10)

This plot show 100 points. For proper training, we will need a lot more..

Next, I need to define the locations of the sensors. The idea is to calculate the distance between each point and the sensors, and use that as our training data (X).

For a sensor s located at (s1, s2), the Euclidean distance between the sensor and point p at point (p1, p2) is given by:

Distance between a point at (4.78, 4,06) and a sensor at (0, 0)

Now, lets use three sensors at (0, 0), (10, 0) and (0, 10). With three sensors and m training examples, the shape of the training data will be (3, m), so there are three features for each training example.

Note that the sensor locations are not used as input feature.

# define sensors as an array with shape (2, nr_sensors)
sensors = np.array([
    [0, 10,  0],
    [0,  0, 10]
])# get distances between generated points and sensors
distances = get_distances(points, sensors) # shape: (nr_sensors, m)

Normalize training data

Before the distances can be used as training data, I need to normalize the data, to make sure the features are ‘on the same scale’.

I actually forgot to normalize the data, and I spent an evening puzzling why the performance of the network degraded significantly when the sensors were further away from each other.

# normalize by dividing the data by the maximum possible distance
x = distances / np.sqrt(np.square(10]) + np.square(10))

Generate output vector

Now the training data (X) is ready I need to create an output matrix (Y) with labels (=the true location) for each example.

We will use discrete bins, where each bin correspond to the (rounded) coordinates. For example: a point (4.78, 4.06) will fall into bin (5, 5) when using a grid with 10 x 10 bins. The resulting output for this point will be a (100 x 1) column vector.

The resulting matrix Y contains column vectors for each training example, so it will be an (100, m) matrix.

The more bins are used in the output layer, the more precise the network can predict a location.

Gausian noise
It feels right to reward the network for predictions that are near to the correct bin. Therefore, I added some gaussian noise to the output vector. This way, there is gradual penalty for predictions further away from the true location.

The amount of noise can be controlled by the parameter sigma.

3. Setup the neural network

I start with a simple 2-layer network with one hidden layer with four nodes, to see how things go. The hidden layer use tanh as the activation function, the output layer uses sigmoid.

from keras.models import Sequential
from keras.layers import Dense, Activation# create 2-layer neural network
model = Sequential()n_h = 4           # number of hidden nodes
n_x = X.shape[0]  # nr of sensors (3)
n_y = Y.shape[0]  # output dimension (100)# add fully connected hidden layer
model.add(Dense(n_h, input_dim=n_x, activation='tanh'))# add fully connected output layer
model.add(Dense(n_y, activation='sigmoid'))

4. Experiments

4.1 Playing around

With the network setup, I ran a few experiments to see what happens if you vary the number of sensors and their positions.

One sensor with a 10x10 grid
When using one sensor, there is no way for the network to know where the object is. The probabilities spread in a circle around the sensor, which is exactly what I’d expect.

prediction and true location for one sensor located at (0, 0)

prediction and true location for one sensor located at (5, 5)

Two sensors with a 10x10 grid

When using two sensors there are two possible locations where the object can be, they are on the intersection of the two circles around the sensors.

The prediction shows probabilities are highest on these intersections, so this result is also what I’d hoped for.

prediction and true location for two sensor located at (0, 0) and (10, 10)

If the two sensors are placed on the edge of the grid, there are no intersections of the circles in the grid, which improves the performance.

prediction and true location for two sensor located at (10, 0) and (10, 10)

Three sensors with a 10x10 grid

Three intersecting circles should be enough to pinpoint the location, and indeed the network is able predict the correct location of the object, using three sensors the network.

prediction and true location for two sensor located at (0, 0), (0, 5) and (10, 10)

4.2 — Tuning hyper parameters

To find out which hyper parameter gives the best performance I ran a series of experiments. I used a ‘base experiment’ and varied hyper parameters to find the value with the minumum loss. After each experiment I changed the hyperparameter of the base experiment to the best performing value, so each experiment have better results.

Experiments

Comparing optimizing algorithms
loss: 2.593
I compared Stochastic Gradient Descent (SGD) with the Adam method. Adam performs best in this experiment, but I should say that I used default settings for both methods. Apparently the performance of SGD is more sensitive than Adam and needs careful tuning of the learning rates.
Number of nodes in the hidden layer.
loss: 2.588If you use more hidden nodes in a layer the network is able to find more complex patterns, but with an increase of the computational complexity. I varied between 3 and 16 nodes, and found that 10 nodes have the best results.
Sigma
loss: 0.858I added Gaussian noise to the output vector to ‘help’ the network and reward it for predictions near the true location. The best value for sigma is 0, so adding noise does not help to reduce the cost.. I expect this parameter might still be useful when using more realistic sensor data.
Activation of hidden layer
loss: 0.86I compared a number of activation functions for the hidden layer: tanh, relu, elu and softplus and softsign. For some reason relu, elu and softplus all resulted in NaN losses. I need to dig into that some other time. For now I’ll go with the tanh function, because it performs better than softsign.
Activation of output layer
loss: 0.451
For the output layer I compared sigmoid and softmax. Sigmoid is used primarily for binary classification and softmax is used for multiclass classification and produces a (valid) probability distribution. Softmax performs a lot better for our problem.
Number of training examples
loss: 0.269
As expected, throwing more data to the network improves the results. The best performance is achieved with the highest number of examples, 10.000 is this experiment. I expect that by using even more examples the performance will keep improving.
Number of epochs
loss: 0.142Increasing the number of epochs (training cycles for all training examples) improves performance. The best results are with 50.000 the maximum number of epochs I tested in this experiment.

Comparison of losses for different hyper parameters

5. Conclusion

This first experiment was quite fun to do. I’ve learned a lot, and may conclude that Neural Networks can indeed be used for finding objects :)

If it’s the best way to go, I’m not so sure. You might as well just use the mathematical formulas for trilateration, but I find it really interesting to see that networks can learn these formulas by looking at samples.

Idea’s for improvement

Run on a faster machine. I ran this experiment on a Macbook pro, and training the data took many hours. I’ve read a very useful article about running projects with Docker on Google Computing Engine, so I guess I’ll try that with my next project.
Test with real world use case. This experiment used simulation and ideal circumstances. Would be nice to use the network for a real problem. For example, it should be possible to find cellular towers with on my phone by making multiple measurements of the signal strength, using the Field Test Mode.

It should be possible to find the cellular towers with a mobile phone by making multiple measurements of the signal strenth.

3D localization. For this experiment I used a 2D grid, but it would be not that hard to train the network with 3D locations. It would require a lot more training time, though and I need some new plots. If I have some spare time, I will try that.
Finer grids. I used just 10x10 grids, but I’d expect better accuracy with a finer grid. This also requires a lot of training time.