The Pilot Week

The GSoC results announcement was on the very next day of my last semester exam. The timing couldn’t have been more right, I would say. With no internship in hand, and a summer of unproductivity staring at my face, I was dearly holding on to the hope of getting selected.

Calvin pulls back on the throttle and lurches ahead!

After 1 month of waiting, the calendar read the 4th of May. But the results were no where to be seen. You know why? Because the good ol’ men decided upon this frustrating geographical system which tries to align everyone’s clock. So the results page said, ‘Results in India will be announced on 9:30 pm’, you just need to wait another 12 hrs. JUST 12 HOURS. JUST. Having worked on this for more than a month now, I could barely wait another minute. Eventually the clock ticked 9.30, results came and rest is for another blog.

I will be working on Data Retriever for the whole summer! Don’t unplug me from the matrix now.

‘There should be one — and preferably only one — obvious way to do it’ ~ Zen of Python

Getting to know the team

The week started out with a formal welcome into the Data Retrievers team. Though l already knew the members of the Data Retriever team. It was a totally different feeling getting invited by them. I have also been invited to Ethan White’s lab, ain’t it amazing?!

I will take this chance to thank my mentors, Ethan White and Henry Kironde. The way they maintain the repository along with other members is admirable. It will take something really special to impress them this summer.


Julia much!?

This is a coders’ only section. As a heads up. These sections will be there in all of the coming weekly entries.

A fresh approach to technical computing.

Julia is the new scripting language in town. Authored by Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah in 2009. With syntax so easy it would put Python to shame. And with a JIT compiler I bet its fast; faster than python ever will be (lets ignore the C bindings for now). It is an even better glue language. The inbuilt interpreter also works like a charm.

Printing statements, doing calculations. Perfect.

julia> println("I'm Julia. Nice to meet you")
I'm Julia. Nice to meet you
julia> 3 - 2
1

Assigning a variable? Done.

julia> a = 34 * 2
68

Taste for LLVM code. No problemo.

julia> f(x) = x * x
f (generic function with 1 method)

julia> f(2.0)
4.0

julia> code_llvm(f, (Float64,))

define double @julia_f662(double) {
top:
%1 = fmul double %0, %0, !dbg !3553
ret double %1, !dbg !3553
}

In a split second, you are looking into LLVM-optimized X86 assembler code. Don’t need it? No problem, just check out the speed comparisons this optimized code has to offer against other languages.

Consider an arbitrary nxnx3 matrix A. We want to perform the following operations on A
A(i,j,1) = A(i,j,2)
A(i,j,3) = A(i,j,1)
A(i,j,2) = A(i,j,3)
For instance, in Python the code looks like:
for i in range(n):
    for j in range(n):
        A[i,j,0] = A[i,j,1]
        A[i,j,2] = A[i,j,0]
        A[i,j,1] = A[i,j,2]

The above code segment uses loops. We are also interested on how the same operations are done using vectorization:

+--------------+--------+--------+--------+
| Language | n=5000 | n=7000 | n=9000 |
+--------------+--------+--------+--------+
| Python | 19.12 | 37.49 | 61.97 |
| Python+Numba | 0.25 | 0.22 | 0.30 |
| Julia | 0.10 | 0.22 | 0.34 |
| R | 233.78 | 451.77 | 744.93 |
+--------------+--------+--------+--------+
Elapsed times obtained by copying a matrix using loops.

This is just one example, a very basic one. But it doesn’t fail to show that a simple code written in Julia would drub plain python and R in speed.

The language shows such promise even without its 1.0 release. But with its fast and beautiful syntax, Julia has my bet on it.

Will be using Julia for this project, see you around :)