Testing a neural network solution

Mike Talks
TestSheepNZ
Published in
8 min readFeb 6, 2017

Last time we looked at some core basics about neural networks, which are a form of machine learning I used back in the 90s for my research year at The University Of Liverpool.

There was a lot of content, and I want in this series to avoid going into too much detail on neural networks, and instead focus more on a testing approach. However, I’m sure you’re still asking …

But how do they work?

Yesterday I took an overall look at them, but I didn’t look too much inside the neural network. I didn’t really want to overwhelm you in one article!

So let’s take a look at a model of the network itself, and explore a little bit more …

Model of a neural network. Image from here.

In a neural network, you have a set of inputs, which results in an output. For me that output in my PhD was “that tap is on/off”, for the Xero application discussed in Stephanie’s presentation it’s “that field should be X”.

Between the input and the output are multiple layers of artificial neurons. These neurons take the input, and a weight inside them works out if it triggers or not. This is then passed to the next layer which does likewise. The more complex your problem the more layer and artificial neurons you might need.

The whole thing works as a multilayered transformational matrix.

What happens is you feed through training data, the system compares the output it gets with the expected output. It then applies feedback which adjusts the weighted trigger levels of some of the neurons. Then it tries again with another piece of data, and makes an adjustment.

It keeps doing this, repeating cycles of data and adjusting the trigger levels, and hopefully getting closer to a repeatable pattern.

Data is key

As you can see in the description above, data is key. More importantly you need data for which you’ve already got an outcome transcribed.

In the 90s, neural networks were being seen as a bit of a silver bullet solution to be able to solve problems we couldn’t easily solve with mathematics or traditional logical computation.

One of the problems was many organisations who tried to jump on this technology believed you could put through any sort of data, and ‘magic will happen’.

Sadly no — that would be way too good to be true.

You need to be painstaking with your data. During my PhD attempt, that meant I was collecting data for weeks with my sensor array. I also did my best to clean it up some, as well as attempt a manual interpolation — basically doing a fast Fourier transform, and attempting different types of chromatic analysis on this.

At one point I got really excited about picking up a strong signature at 50Hz, to which my colleagues looked a little bit like this …

The Picard facepalm … in my defence it was the 90s …

Yup — the power supply in England is at 50Hz, so I could pick up whether electricity was on, but not the tap!

How to train and test your neural network

To train up a neural network you need a huge pool of data, data which has been manually processed to set an output. We were going for a simple “is water running” vs “is it not” on our first pass. Had we succeeded, we hoped we’d have been able to determine flow rates at a later date. [You can tell from this statement that I wasn’t able to train up my neural network, the problem essentially seemed to go down to the sensor and the data I was getting being too poor and noisy for this to work]

During this period, I had to train up and teach neural networks. I got them to work from some sample data a professor at the University had available. My neural network could work with other people’s data, just not mine!

The key — as discussed last time — is to get a broad and diverse set of data. You run it through your neural network with it switched to learning mode.

A graph showing overfitting and two important error lines which are explained below

The first step is to switch your neural network over from a learning operation to a running operation. You then run through the same training data you’ve just used through your system to observe the error rate you get from comparing the neural network output with the expected result from your data. This is the “training error” line in the above graph of overfitting.

So far so good — this is the first step of verification of your neural network (and where my tap system completely failed). Hopefully you’re seeing such a trending line. However, if you’re feeding data into your system, replaying the training cycles and you’re not seeing the “training error” line go down, then your neural network is struggling to find a pattern.

It is possible there isn’t a relationship that can be mapped between your inputs and your outputs. Although as we talked about last time, you can review your neural network, the quality of your data and the size of the data pool. All this is painstaking, because alas neural networks are not a quick and easy solution.

The second verification step is to test it a bit more. Now for this you can’t simply use data that you’ve already used for training — the neural network has learned to cope with that explicit case (seen in the training error graph). This means you need some additional data around to do this — as a rule of thumb, when I did neural networks for other people, it was about 25% of the data you’d used for training.

Typical results from this stage of verification can be seen above as the “test error” line, and notice how they always have a higher error rate than “training error” data. That shouldn’t be unexpected — the neural network trained on explicit data provided and re-verified in the “training error” data line.

So where am I, the manual tester?

So far we’ve done most of this checking automatically. Looking at such a neural network or any other machine learning system, we can feel lost on what to test.

That feeling of being at a loss is fundamentally because as testers we often fall back on testing boundaries, especially for business rules. As we discussed last time a fundamental thing at issue with a neural network is there are no hard and fast boundaries, there’s a pattern. A pattern we can’t see — it’s inside the neural network, so we have to explore it.

There are a few patterns I learned from James Bach’s Rapid Software Testing course which I think apply here. What you need to do though is start by mapping out the data which has been used to date. Then why not try …

Initial exploration. You can start out by just taking a few data samples from your training and test data and running them through your neural network system to “get a feel”.

Try a few obvious scenarios, then make a change. Choose a few items of data which are far from where any decision boundaries should be, and see how it behaves. You might have an expert around who can provide you a simple example of “in this scenario, no questions asked, X should happen”.

For example if we had a neural network for our “can you go on this ride”, try someone who is very young, short, light or someone who is very old, tall, heavy. Then try a “leap and creep” approach, creeping (making small changes) or jumping (making much larger changes) the input data toward where you think a decision should tip. Keep notes of what happens, and talk with colleagues on “does this feel right?”.

Look at how the data used is clustered. Try scenarios which are in the gaps. Try scenarios which are ludicrously outside of the data provided. How does the system respond? I personally would expect for a system that if you provided inputs which were ludicrously outside of the training data, I would raise a defect if the system tried a guess, because any output it gave would be just a guess, but the user might not know this. Far better to give a response around “eeek … cannot compute”.

Push those inputs. You’re providing inputs to your neural network system. Try pushing the limits of the data types provided. This is classic tester territory — try big and small. Big numbers, big strings. If your system uses text, try weird capitalisations, use non-letters like @$#$% and of course everyone’s favourite, spelling mistakes. Again, there’s no hard-and-fast rules here, but make notes, and discuss anything which feels wrong. The classic as mentioned above is trying to get a result for something which is absurd.

Pitting expert vs AI

How does your neural network stand up against the experts?

For most forms of neural network, we’re trying to replicate some form of human decision making. The Xero case discussed last time is a great example — basically it’s functioning as an assistant to fill in data missing from an accounting entry, to make it easier to reconcile.

Back when I worked at Kiwibank, I worked alongside the reconciliation team on one of our products. It’s basically about balancing up transactions on a card with money being paid out. They hired three people to do it, but even this team said some of the work was so repetitive, they’d love the system to do more for them, so only one person need do it, and more handling the exceptions.

Almost always there should be an expert around. A great form of testing is to effectively put them against your neural network. Essentially peer reviewing the decisions your neural network is making. If there are major issues, return back to the source data you’re using to train your neural network. You might notice typically whenever we have an issue we’re returning here? The training data you’re using is essentially the closest thing you have to source code on such a project!

And finally, a word from Jonathon …

I went to school with a guy named Jonathon Tepper who’s become more of an expert in neural networks than I am. He had this comment on the first part of the series, useful information if you want to go off and explore this more …

I enjoyed reading what you’d written — you covered some key areas such as overfitting and curse of dimensionality in a very engaging way for non-technies.

Vanilla neural nets such as multi-layered perceptrons trained with backprop are not biological plausible models although they are non-linear function approximators that can form complex internal representations to learn any computable function (assuming the appropriate data is available).

The issue is the solution space is non-convex and so there are many suboptimum solutions thus these nets are very difficult to train and optimise….more so with so called deep neural nets which need to be trained in a particular way to overcome the vanishing gradient issue. Also, does every problem need such a complex featural representation to be solved?

Occam’s razor is common guiding principle in the machine learning world….use the best performing model with respect to number of resources and computations required. Note that there is a branch of neural network research that aims to model biological neural networks called Spiking Neural Networks — however, this is not my area so cannot comment on usefulness of these models.

--

--