A Short Study in Code Tuning

Tomi Tiihonen
Nov 1 · 7 min read

A thousand fold improvement in 3 short steps?

One of my favourite descriptions of an approach to improving something is by arguably the greatest artist of all time, Michelangelo di Lodovico Buonarroti Simoni.

“The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material.”

Atlas Slave by Michelangelo

We will now attempt to do is what was so succinctly put by Michelangelo to do to for a piece of code that calculates Happy Primes. We will start with a naive rough implementation and chisel away.

The following code calculates sum of 5000 first happy primes, i.e. a prime number that is also a happy (a natural number in a given number base that eventually reaches 1 when iterated over the perfect digital invariant function for). While this function is unlikely to have any real life use it makes a good example for couple of reasons, firstly there are no samples of highly efficient implementations readily available (so we cannot simply google the answer), secondly we are dealing with more than one problem to solve and finally the problem is entirely CPU bound (making it platform agnostic).

Sample Application

Here is an initial naive Java implementation put together with samples found in the net.

package codetuning;

import java.util.HashSet;
import java.util.Set;

public class HappyPrime {

public static Integer happyPrimeSum(){
Integer happyPrimeCount = 0;
Integer happyPrimeSum = 0;
for (Integer n = 0; happyPrimeCount < 5000; n++) {
if (isPrime(n)) {
if (isHappy(n)) {
happyPrimeCount++;
happyPrimeSum+=n;
}
}
}
return happyPrimeSum;
}

private static boolean isPrime(Integer n) {
if (n <= 1) {
return false;
}
for(int i=2;2*i<n;i++) {
if(n%i==0)
return false;
}
return true;
}

private static boolean isHappy(Integer n) {
Set<Integer> seenNumbers = new HashSet<Integer>();
while(seenNumbers.add(n)){
int digitsSum = 0;
while(n > 0) {
digitsSum += Math.pow(n % 10 , 2);
n /= 10;
}
n = digitsSum;
if(n == 1)
return true;
}
return false;
}

}

There are several issues with this but one biggie — see if you can spot it?

Beware of the time complexity

Let’s put this code in for a test. I have created a unit test that calls the happyPrimeSum() function 100 times in a loop, and times it. Result — a whopping on my Early 2013 MacBook Pro.

Now lets run this with a profiler.

JVM Profiler — CPU Hotspots

Result is pretty obvious, we are spending 99% of the time in the isPrime(Integer n) function.

In order to ascertain that a number is a Prime number, the happyPrimeSum() function iterates through integers until we have found 5000 happy primes.

for (Integer n = 0; happyPrimeCount < 5000; n++) {
if (isPrime(n)) {

The 5000th happy prime happens to be 400157, so this loop has been called 400157 times and in this loop we call isPrime(Integer n) the equal number of times.

private static boolean isPrime(Integer n) {
... for(int i=2;2*i<n;i++) {

The isPrime function contains another for loop and so we have discovered our biggie. The reason why nested for loops can be dangerous, is because you will often end up with quadratic time complexity — O(n²). This is product of two nested linear time complexities of O(n).

As an example of quadratic function — say you would have to find a particular name in telephone book (remember them still?) by trawling each page each line until the number is found. Now imagine the book has 10 times more pages and each page is 10 times longer — your task is increased by 100 fold (i.e. 10²), i.e. time needed is increased exponentially.

This is a classic problem, small datasets can still be reasonably fast but larger datasets expose the fundamental flaw.

So what to do?

We can do two simple modifications to isPrime(Integer n) that significantly improve the performance of this function.

if (n%2==0) return false;

for(Integer i=3;i*i<=n;i+=2) {

With these changes the result of our test is now . A massive improvement by changing just two lines of code.

What a difference a type makes

I must admit, I do not think I have ever written a for loop using an Object representation of an integer.

for (Integer n = 0; happyPrimeCount < 5000; n++)

The reason why I wanted to use Integer and not an int was to investigate the costs associated with Objects vs primitive types. If we simply replace all Integers in our code with primitive int, the test now runs at .

for (int n = 0; happyPrimeCount < 5000; n++)

So not an insignificant saving, in this case as we are still working on fairly small set of data we are really not getting hit by garbage collection issues as we have abundant memory. However in large object graphs object churn (persistent creation and destruction of objects) can have a significant impact on performance.

Know your algorithms and dataset

So far we have made a significant improvement in our code — but we can still do better. Identifying primes is exhaustively studied subject and the most efficient way of identifying primes is using a prime sieve. A function using a sieve works differently to our iterative checking in as much as you first generate the sieve and then check your number with the sieve.

boolean[] isPrime = primeSieve(500000);
for (int n = 0; happyPrimeCount < 5000; n++) {
if (isPrime[n]) { ...

As we pre-populate the sieve, we need to know its dimension. This is where understanding your dataset becomes important, as we know what the last happy prime is we can ensure the sieve is large enough. In this case we are sacrificing memory for performance by creating a sieve that is too large for our needs. By using the sieve our test now runs at .

Now lets run this with a profiler.

JVM Profiler — CPU Hotspots

Now our costliest operation is the isHappy(int n) function and the subsequent adding into the seenNumbers HashSet.

The reason why we are using seenNumbers set is because if number is not happy the process turns into a cyclical repeating pattern and seenNumbers set is used to validate if we are in that cycle (not a happy number) and break the loop. We can improve this function by using more efficient data structure of boolean array instead.

static private boolean isHappy(int n) {
boolean[] seenNumbers = new boolean[1000];
boolean numberNotSeen = true;
while(numberNotSeen){
int digitsSum = 0;
while(n > 0) {
digitsSum += Math.pow(n % 10 , 2);
n /= 10;
}
n = digitsSum;
if(seenNumbers[n]){
numberNotSeen = false;
}
seenNumbers[n] = true;
if(n == 1) {
return true;
}
}
return false;
}

And now our test runs at .

Don’t repeat yourself

After our last test the profiler shows that isHappy function is still the costliest.

JVM Profiler — CPU Hotspots

The algorithm in this function is a constant repetition of splitting numbers to digits, squaring them and summing up — rinse repeat. There are only 10 digits, do we really need to calculate to find out is 2² is still 4 repeatedly? We can pre-calculate them instead.

private static int[] powerOfTwo = new int[]{0,1,4,9,16,25,36,49,64,81};...
digitsSum += powerOfTwo[n % 10];

We can do the same for numbers that will be unhappy, as numbers that will never be happy will indefinitely circulate between known set of values. And we can replace the cache set seenNumbers by simple pre-calculated look up instead.

private static boolean[] willBeUnhappy = new boolean[500];
willBeUnhappy[4] = true;
willBeUnhappy[16] = true;
willBeUnhappy[37] = true;
willBeUnhappy[58] = true;
willBeUnhappy[89] = true;
willBeUnhappy[145] = true;
willBeUnhappy[42] = true;
willBeUnhappy[20] = true;
...
if(willBeUnhappy[n]){
return false;
}

After these changes our test now runs at .

So, by using these three principles we have reduced the time taken from 29 minutes to half a second:

  • Reduce the time complexity of our iterative functions
  • Use better performing data types and structures
  • Pre-calculate often needed static results

Here is the final resulting code, I am sure it can be still be improved?

package codetuning;public class HappyPrime {
private static int[] powerOfTwo = new int[]{0,1,4,9,16,25,36,49,64,81};
private static boolean[] willBeUnhappy = new boolean[500];

public static int happyPrimeSum(){
willBeUnhappy[4] = true;
willBeUnhappy[16] = true;
willBeUnhappy[37] = true;
willBeUnhappy[58] = true;
willBeUnhappy[89] = true;
willBeUnhappy[145] = true;
willBeUnhappy[42] = true;
willBeUnhappy[20] = true;

int happyPrimeCount = 0;
int happyPrimeSum = 0;
boolean[] isPrime = primeSieve(500000);
for (int n = 0; happyPrimeCount < 5000; n++) {
if (isPrime[n]) {
if (isHappy(n)) {
happyPrimeCount++;
happyPrimeSum+=n;
}
}
}
return happyPrimeSum;
}

private static boolean isHappy(int n) {
while(true){
int digitsSum = 0;
while(n > 0) {
digitsSum += powerOfTwo[n % 10];
n /= 10;
}
n = digitsSum;
if(willBeUnhappy[n]){
return false;
} else if(n == 1) {
return true;
}
}
}

private static boolean[] primeSieve(int n){
boolean[] sieve = new boolean[n+1];
for (int i = 2; i <= n; i++) {
sieve[i] = true;
}
for (int factor = 2; factor*factor <= n; factor++) {
if (sieve[factor]) {
for (int j = factor; factor*j <= n; j++) {
sieve[factor*j] = false;
}
}
}
return sieve;
}
}

ThePEG

Performance Engineering Guide

Tomi Tiihonen

Written by

ThePEG

ThePEG

Performance Engineering Guide

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade