Introducing Neural Networks — thinking about AI

Mike Talks
TestSheepNZ
Published in
8 min readFeb 6, 2017

It seems that Artificial Intelligence (AI) and machine learning are all set to become another very hyped buzz word.

I started to experiment again with some systems last year which use elements of AI, with a view to see how the technology was being developed and get a general ‘feel’ for how it was working.

Then last week I saw Stephanie Wilson from Xero doing a presentation at the Unicom conference in Wellington on the “rise of the machine (learning)” about some applications which have been under development at Xero, particularly one that uses machine learning to help users complete their accounts.

[It was a great talk, and you should be able to catch again if you sign up for the CASTx conference in Sydney in March, which looks to be a great event. Book here!]

My experience in AI goes back to my failed PhD in 1996 which was titled “intelligent opto-electronic sensors for nuclear devices”. It sounds impressive, but the research was relatively easy to explain…

A traditional sensor has a whole bunch of electronics in it, in a nuclear environment (this research was funded by British Nuclear Fuels) the lifetime of this sensor can be dramatically shortened. [But don’t forget kids, radiation is perfectly safe]

The sensor I was working with was unique because it was very simple, and involved having the sensor electronics far away from the thing being monitored. So the sensor head could be in a hostile environment, whilst the core components could be much further away in a much safer place.

The challenge was tough though — because the sensor was relatively simple, the data we got out of it was quite messy. The idea was to use different signal processing to get a useful signal out from the system. One of the prime candidates for this was neural networks!

Neural networks (which I’m most familiar with) are a form of machine learning which mimic to a degree how a brain operates and learns. [For the record I’m not sure which kind of machine learning Xero used for their accountancy assistant, but I suspect it was a neural network]

We’re very used to being humbled before computers, they can do programmed, computational tasks really quickly. They can remember much more than we can. You’ll never be able to calculate 2837 x 2383 as fast.

But there’s an area they’ve traditionally been lousy at, and it’s in pattern matching. Our brains are wired like mad not for algebra and arithmetic (so ease off on yourself if you’ve never been good at them) but for pattern matching.

Here’s something I bet you’ve never noticed before, if you have about 3 or 4 objects or people, your brain never really counts them. It’s wired to pattern match items up to 4 really easily and efficiently. Only superhumans like Donald Trump are able to do a pattern match count up to a million — what a guy!

You use the pattern matching capability of your brain to recognise faces (there’s a whole cortex developed in primates like ourselves and ape species like gorillas and chimps to achieve this) and recognise voices. You can be in a room, and see only 3 people, yet know there’s 4 people in that room because you can hear another behind you. That’s taking multiple sources of information and making a fit.

Traditionally, computers were not good at this. But they’ve been getting better. It helps that as they grow more powerful, they can simulate more neurons in more layers — back in 1996 I had a 33MHz PC with 16MB of RAM to power my artificial neural network!b

For the case of both my own project, and the project Stephanie was talking about from Xero, where we’re talking about applying neural networks or machine learning, we’re not at all talking about creating a self-aware computer.

That would be monstrous — mine would have been in charge of a reactor, but more frighteningly, Xero’s machine would be a computer that was self-aware but assigned menial accountancy work. My brain cannot get out of my head what happened in Red Dwarf when they created a self-aware toaster…

The goal with a lot of neural networks applications such as the one I worked on, is simply to create a system which given a set of inputs can make a decision.

For my PhD project, I spent my year trying to teach a computer to tell me from the signals coming from my sensor whether water was flowing through a pipe or not. [Essentially ‘did I leave the tap on’]

For the project Stephanie mentioned from Xero, this was more ‘if a user has entered data in one column, but left the second blank for reconciliation, can we make an educated guess what should go in the second column based on usual behaviour?’.

The business rules for “to ride this ride”

Now of course we have machines which do that already, which are typically mapped out as business rules. Let’s consider some such rules for a children’s ride — to be allowed on,

a) the user is between the ages of 6 and 18 inclusive

b) the user must they are taller than 1.1 metre, but shorter than 1.8 metre

The decision making in this example is really simple, and it’s the kind of thing computers do really well. It’s the simple application of logic.

Consider this broker …

But what if there is no obvious or established relationship? Let’s consider a stock market broker — she doesn’t just make a decision to buy or sell a stock based on it’s current price. She looks at how it’s been trending. She checks to see how other similar stocks are behaving. Then she decides to buy.

What’s important to notice is she won’t always be right. But if she’s good at her job, she’ll be right significantly more times than she’s wrong. She doesn’t really know how she makes the decision, it’s more a hunch or intuition. But the longer she’s been at it, the more her hunches have paid off.

What’s going on there, the more she’s been a broker, the more she’s been exposed to data, watching behaviour, seeing how it plays out. She’s not learning a set of rules, but becoming aware of a pattern — what she ascribes to intuition, but is really pattern matching.

All that information, all that learning, and in the end it boils down to three very simply decisions — buy, wait or sell.

This in essence is how neural network work as well. You create a system of artificial neurons to process information, then you set it to learning mode and run through a whole set of data which covers inputs and what the response was. You then turn it to decision making mode, and run through some separate data to see if it’s learned to make good decisions.

It’s a tricky process — like the broker above, the result will never be 100% accuracy. [Heck if you think about it, if you made a neural network stockmarket predictor which was only 60% accurate, you’d still be a millionaire] If you’re getting poor decision making, you might need to develop a system with more layers of neurons, you might need more data, you might need better data, or even you might need less data.

More layers of neurons?

Yup — this is all down to the design. Neural networks operate in layers of artificial neurons with process your input onto the next layer. Each neuron has it’s own trigger setting which adjusts as it learns.

You have to play around with how many layers of artificial neurons, and how many neurons you have in each layer to find what works best.

Better data?

Good learning depends on good data, and one problem I saw in the neural networks we’d try when doing this at the University of Liverpool was that we’d cluster our data together. Clustered data doesn’t help us build up a decision map across the larger area of what we’d call “decision space”.

More data?

If you’ve got broad data, but still having issues it might be that you don’t have enough data to learn a pattern. There was an awful temptation to throw a lot of data at the neural network and treat it as a kind of deus ex machina or ‘god out of the box’ which could find out the relationships that matter. [You’re basically asking the computer to make good decisions from being given scrappy data. The learning can only be as good as the data provided]

But too many inputs created problems. Each input is really adding a dimensions to the decision map the computer is trying to create. This means a geometric increase in data would be needed.

It’s similar to what we experience testing basic business rules. Let’s consider our ‘to ride this ride’ ruleset, and explode it a bit. If we were just doing basic boundary testing of decisions making (and we’d had full testing of this relationship mandated), if we had rules based on …

Just height. Then we’d need in total 4 tests (around the maximum and minimum heights). Not I’m not testing on the boundary here — just bear with me though.

Height and age. We’d now have a combination of 16 tests.

Height, age and shoe size. We’d now need 64 tests.

Height, age, shoe size and weight. We’d now need 512 tests.

The data sets you need for multiple decision dimensions can explode, and as with before, you have to be very wary of your data clustering, in which case having more will not mean ‘better’.

Less data?

Yes, sometimes less is more. Neural networks are odd creatures. If you were really lucky you’d find that they had a successful response rate of 95–97% But never 100%

A graph showing overfitting — it’s the test error you want to look at. Look at how showing more data (what’s meant by ‘Training cycles’) causes the error in the neural network to increase and not decrease.

What you tend to get when you overload your neural network with data, is what’s officially called ‘overfitting’. It’s a phenomemon we see very much in human behaviour such as with the breakdown that mathematician John Nash has in the film A Beautiful Mind. Your neural network starts to see patterns in your data which aren’t useful, but are down to noise in your system.

It’s the equivalent of what an Asperger character in the book The Curious Incident of the Dog in the Night-Time suffers from. Something really bad happened to him on a day where he saw a large number of red cars, and hence he attributes a relationship there between the number of red cars he sees and if something bad has happened. He has tuned his life on a piece of random noise.

Okay — that’s covered a little about what neural networks are. Tomorrow we’ll look a little at what’s most important, how to test them. And I’ll include items to think of as a tester.

--

--