From 6 hrs to 2.5 hrs to 3 mins
From the title of this post, it would be difficult to guess what I am going to talk about. Ok, so no more suspense its actually the run time for one my machine learning algorithm, which I have recently implemented over vacations around Diwali.
Recently I got interested around machine learning, artificial intelligence, intelligent web, data mining, information retrieval topics. One of the interesting problem is building recommendation systems.
Broadly there are two steps involved in building systems based on machine learning theory.
1. Learning — which is essentially building statistical model, can be called build phase.
2. Predicting — which is applying the learning of step 1 to predict in a given context. can be called apply phase.
I am going to concentrate mainly on predicting stage. Where the algorithm essentially tries to predict what customer could have bought given he/she has already bought something. So to test the quality of this algorithm I have applied (step 2) on set of 3520 user transactions and then averaging the result quality.
Now the time to process this 3520 records were initially 6 hrs. bad, very bad.
After some debugging to find out which particular step is taking most of the time. I figured out it was the reading value from the model, which was standard HashMap implementation in java. And I couldn’t realize why the read is slow in HashMap, because HashMap has O(1) best case time complexity and O(n) worst case for accessing/searching an element.
Assuming even O(n) should not be a problem, I discussed this with one of my colleague and we started thinking to utilize my dual core laptop in a more better way, so I essentially spawned two threads and shared the computation among this two threads. And guess what the time to process 3520 records reduces to 2.5 hrs.
However, getting further greedier didn’t help because when I spawned 4 threads the processing time went back close to 6 hrs. May be the over head of thread management was more than time gain by parallel. Also since dual core machine can actually run only two instructions at a time, so its not parallel for 4 thread in true sense.
Learning: Having more threads than CPU, would not help finishing task faster. But having single threads when you have multiple cores is bad as well. So try as many thread as number of available CPUs. You can use Runtime.getRuntime().availableProcessors(); to get cpu count.
Also when discussing with colleague we realized HashMap can go very bad if proper hashcode is not defined, but we never though it could be so important, so after doing multi-threading, I just wondered what could be the best approach to return good hash codes. And came across this link which has reference to “Effective Java” book by Joshua Bloach, the same person who was the key engineer in developing java collection framework, inclusing HashMap, yes you can see his name on HashMap source code.
And this time it was eureka moment for me. Following above reference, I wrote new hash code for my keys in HashMap. And the result was just unbelievable, the whole process got finished in 3 mins. Just 3 mins!!
Learning: Never ignore writing good hash code for your maps.
So now you can relate the topic with content.
I shared this findings, so that you and I always remember the importance of above two learning when it comes to performance of algorithms doing large data crunching.