Optimising Neural Nets With Biomimicry

Leon Fedden
9 min readOct 12, 2017

--

This is part two in a series on studying algorithms inspired by nature. See part one here where we look at the implications that arise when we select an algorithm to solve a target problem. The next two parts, part three and part four can be found at their links.

As mentioned in the first post, I’ll be trying to take the topics learnt in Natural Computing taught at Goldsmiths and applying them to or against neural networks where possible, to keep things interesting and different from the other posts. Towards the end of this post, I will attempt to weave the first post into this one, by reviewing how the No Free Lunch theorem applies to the suitability of Dispersive Flies Optimisation to solving network weights.

Finding inspiration in nature to fuel our ideas for algorithms

Dispersive Flies Optimisation

An integral inspiration for optimisation and search problems is nature. Dispersive Flies Optimisation, or DFO for short, is inspired by flies swarming behaviour over food sources.

Swarms of flies may comprise of a single fly, to thousands or countless millions, depending on the species. When disturbed, swarms of flies disperse and then regather, which is summarises the behaviour of the algorithm, the formation and then weakening of the swarms. For each member of the swarm, the position vectors of each member are defined as such that:

The first thing to note is that the swarm is essentially a matrix with the shape ‘NP’ by ‘D’. In the above equation, ‘t’ is the current time step, ‘D’ is the dimensionality of the problem space and ‘i’ is the member of the swarms population that has a total size of ‘NP’. When ‘t’ is 0 (it’s the first iteration,) each fly is initialised as follows:

The initialisation is very simple — Assign the position a random locations across all dimensions. To be more specific; for every element ‘i’ in the flies position, assign the element the lower bounds of the d’th dimension plus a random number ‘r’ drawn from a uniform distribution scaled by the upper and lower bounds of that given dimension.

Each iteration, the fitness each fly in the swarm is computed, and the swarm’s best fly is stored. As a deviation from the initial algorithm, I found it prudent to also store the best solution over all iterations too. Once the iterations best fly has been computed, the fly’s new position is calculated:

So essentially there are a few things going on here, but it’s not that complex. Firstly, if a random number drawn from a uniform real distribution is less than ‘dt’, which is our dispersive threshold, the i’th flies d’th dimension will be dispersed and reinitialised to a random position in the search space. This stochasticity is essential to prevent the flies from prematurely optimising to local minima. If the random number is not within the disturbance threshold, the i’th position is set to the best nearest neighbour fly in the array of flies (left or right), and then moved a random bit along towards the iterations best performing fly.

One of the really fascinating things about this algorithm and its swarm intelligence comrades is the emergent, rich and deeply complex behaviour that arises from the simple rules that each individual fly must follow.

Shut Up And Show Me The Code

Below we can see a toy example of this algorithm in Python, optimising the flies until they find an acceptable solution for the target vector. It’s commented fairly well so I wont go into it too much but a high level overview is that the flies must optimise towards the numpy array called ‘target_solution’, and they have reached the global minima or the optimal position when the flies position vector returns zero Euclidean distance from the target solution.

What About Something Harder?

So obviously that’s a fairly straight forward toy problem, so what about something a little more complex?

In lieu of using Reinforcement Learning, I thought an interesting way to apply DFO would be to optimise a neural network’s trainable weights to complete the cartpole problem. The cartpole problem is a standard Reinforcement Learning literature task, and the OpenAI gym library provides a Python cartpole environment for our neural net to interact with. The problem is as described on their website:

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the centre.

A perfectly balanced pole

So above is the network in action, running for 200 timesteps, and below is the code to reproduce it. Once the network has successfully balanced the pole for 200 steps, we consider the problem solved. I should add here this is not the place to learn about neural networks or the computational library TensorFlow — I will make no effort to illustrate what is going on with the network or TensorFlow specific code. A good place to learn more about neural nets is here and you can start with TensorFlow at their official docs. Have a look through, and then I’ll break down the code below in the last section of this post:

So WTF Are These Changes?

So when I was hacking around with DFO, to try to improve the yielded results, a bunch of changes were made to the main DFO algorithm. Again, I wont cover the necessary boilerplate code required for plotting or representing neural nets! In summary the most notable changes made to DFO were:

  • Method of updating (seven in total.)
  • Elitism of flies.
  • Decaying disturbance threshold.
  • Multithreaded computation for quicker fitness estimation.
  • Eternal god fly.

Updating

There were seven updating methods made available in the above code, and combined with the other modifications, there were quite a few modifications to try out. I’ll aim to cover most of them, but you can read them in the code from the above notebook.

The original method of updating was slightly changed to randomly reinitialise the flies positions using a gaussian distribution with a mean and deviation derived from the networks original weight settings rather than a normal distribution. Anecdotally, this seemed to deliver a better performance.

Another updating method was dubbed the hybrid mode.

The major difference here is the fly is not assigned it’s best neighbours position, but retains its own and moves towards the average of the best neighbour and the global best. This in practice didn’t lead to very good results as the average of the two is not necessarily optimal — there are many configurations of network parameters that could work, which means the problem is multimodal, not unimodal and the mean of two optimal wights may not necessarily score well at all.

The next step up from the hybrid mode was the n_fittest mode.

This code assigned the fly the best nearest neighbour and moved it towards the average of the ‘n’ fittest flies, which in the below graph, ‘n’ equals 20. It never optimised in time but most interestingly the populations mean and deviation never really changed after the first iteration.

As we look at the graphed results, one major thing to note is that one iterations fittest fly, may not be the next iterations fittest fly, even if it has found the best rewarding neural network parameters for a given iteration of DFO. This is because of the Cartpole environment’s random starting angle, which can lead to a poorly constructed network being good at balancing the pole for some angles but not others. We can see this in the reward graphs, when a fly seeming reaches the optimal score of -200, but the next round the reward is a lot worse!

There was also a couple of random modes, such as random_uniform or random_gauss. These modes just randomly assigned values using the specified distributions to the flies. This was generally less successful searching for larger networks weights, but had plenty of success in the lower dimensional search space.

Elitism

Elitism was introduced to ensure the fittest individuals were left just as they were.

In the code, the variable ‘elitism’ is an integer setting how many of the fittest population should be safe from updating. The indices of the ‘elitism’ amount of fittest are then stored in ‘n_fittest’ so that the program knows not to update them.

Elitist approach
No elitism

Above we can see two run throughs of the program, one with elitism set to 20, and one set to 0. For both runs, the mode was set to the original DFO update method and the population size was 1000, and the disturbance threshold was set to 0.1- which is very high!

The elitist approach took longer to find the best reward which is likely due to the best scoring flies being forced to stay at local minima, rather than stochastically optimising. The program that had no elitism quickly updated the flies to the global minima, but had problems retaining the global minima position every iteration, as the best flies were never safe from being updated.

Disturbance Threshold Decay

Another idea I tried was reducing the disturbance threshold by a specified amount for each iteration. The rationale behind this was to reduce the amount of random position resets that happened as the flies optimised closer to the global minima. In practice however, this is just another parameter that is hard to configure without intimately knowing the problem at hand.

Multi-Threading

So when you have to wait for a population of 1000 complete many steps of the cartpole simulation, it can get a little boring waiting for the environment to finish. A quick hack to to make the simulation multithreaded, which was managed mostly by this code here. As many threads are made as computing cores that are available are made, and it led to massive speedups.

BTW — this code is collated from around the DFOStrategy class, and is not ran in the above Gist’s order!

Eternal God Fly

Okay so this is a silly name (it’s late and I’m getting tired) assigned to the fittest fly of all time during a given run. By providing a memory of the best fly, we ensure that the best network is always available to test. Note I use the ‘≤’ operator — this us due to us wanting to keep a newer network that has more ‘experience’ against environments, compared to a luckier earlier iteration.

Thoughts On DFO’s Performance For Neural Weight Updates

DFO had a noticeable decrease in performance when the dimensionality became very large. For example, trying to optimise on a network with 256 neurons in each layer would be practically impossible to solve in a sensible time. Perhaps if it could run on a GPU with a similar amount of time lended to an algorithm like backprop or policy gradients with experience replay then it would be a fair contest though. However, as per the supplied video, with enough tweaking DFO did eventually manage to get a working network that solved the problem!

In the No Free Lunch post we looked at the relationship between an algorithm and it’s search space. It is looking at this that will determine the performance of an algorithm for a given problem. So how well does DFO fit with its search space of neural network weights?

I would argue that the DFO is not best suited to the problem at hand. Because of the regular updates to the weight positions and the non-linear relationship of the weights, it seems like this is a poor manner to improve towards a global minima. The most grating thing is the random resets to flies positions. This easily means an essential weight can be reset to an entirely random value and cause a massive, systematic chain reaction of bad output from the neural network.

That being said, with enough tweaking, it did work, and there are likely many further improvements that would better the performance.

Concluding Remarks

I’d like to wrap up with this great quote from AI heroes Stuart Russell and Peter Norvig:

The quest for “artificial flight” succeeded when the Wright brothers and others stopped imitating birds and started using wind tunnels and learning about aerodynamics.

In short, finding inspiration in nature for novel algorithms is a useful insight; a great lesson to learn when considering the daunting task of deriving novel algorithms to solve arbitrary problems. It is also integral to remember that we only learnt to fly once we stopped creating machines with feathers and flapping wings, so we must remember to sprinkle a little fiction where necessary and come up with the relevant maths or computations to obtain the good results.

Thanks for reading!

--

--

Leon Fedden

Wizard nerd summoning TensorFlow, C++ and Python black magic from the void. https://leonfedden.co.uk/