Stupid TensorFlow tricks
A new take on an old (Thomson) problem
Google’s machine intelligence library, TensorFlow (TF), has become synonymous with deep learning. Despite the name, deep learning involves just a few simple things, the complexity comes from repeating these simple things millions of times (concretely, it’s the composition of millions of elementary functions). To “solve” a problem in TF, you find the minimum of some function. The hard part is the backprop which requires the derivative of this massive function. This is where TF excels, as it removes the drudgery of algorithmic differentiation and automagically moves the computation to the GPU. This allows you to do amazing things like write fake Shakespeare or draw cats. In practice however, the framework can solve any differentiable minimization problem. I wanted to see how far I could push this idea.
The Thomson problem is a classical physics question, “What configuration of N positive charges on the unit sphere minimizes the energy?”. The potential energy for each pair of charges is 1/r, so the function we are trying to minimize is the pairwise sum of all the charges.
Seems simple, right? For low values of N, it is. N=2 places two charges on polar opposites of the sphere, N=3 places three charges in a concentric ring on the equator, and N=4 gives a tetrahedron. Larger values of N, especially when N is prime, break these nice geometrical descriptions. N=11 puts the charges in a configuration that completely breaks the symmetry — while the charges are in equilibrium, they are distributed in such a way that there are more on one side than the other; it has a net dipole moment!
Solving this in TF is surprisingly easy. We setup our input variables to be normalized onto a unit sphere, compute the symmetric distance matrix, and pull out all unique pairwise distances. The potential energy is the sum of all these 1/r distances, so we use that as our objective function.
Running the model and saving the configurations takes a bit more work. The details to the project are stored in this github repo.
How well does our model work? For any value of N, we can converge to a stable solution energy minima in a matter of seconds, and we can refine that to the full floating point precision in a matter of minutes by tapering down the learning rate. As N gets larger, we find more and more solutions that are stable (the gradient is zero), but are not the global minimum. We can compare these solutions to those that were posted on Wikipedia which lists 470 configurations. For low values of N, the first solution found is the known best solution. Around N>30, this slows down and it takes exponentially longer to find the known solution. These “almost” solutions are extremely close in energy to the global solution, but become diamonds in the rough — a rare and special configuration that differentiates it from the countless others.
TensorFlow’s computation on the GPU performs admirably here. Using a large value of N=2000, my GTX-980 GPU computes 38 iterations per second while my poor 8-core CPU can only compute 3.6 iterations per second. That’s an impressive 10x speedup!
Visualizing the configurations illustrates the regularity and the apparent symmetry, even if we are content knowing that it might not be the global minimum. Doing this in python plotting library, matplotlib, really pushes it to it’s absolute limit! While 3D visualization isn’t my forte, I found you can get a decent result by spinning the camera angles at rates governed by sines and cosines.
For me, this was a fun project; it was explicit yet simple, with tangible rewards that lead to new knowledge. This is the epitome of low-hanging fruit. How many other simple projects are out there can be accelerated with algorithmic differentiation and GPU-acceleration?
What else can we do with TensorFlow?
Leave a comment or link to some other stupid TensorFlow tricks.