A Dive, Down the Levels of Abstraction

Ishan Bhanuka
3 min readJul 13, 2018

--

As a Computer Science undergraduate, I spend most of my time in the first four levels of this stack. This leaves me with a lot of unanswered questions about how stuff works the way it works. This summer I got the opportunity to dive deeper, peel of the layers abstraction and learn a bit about the technology behind the technology we use today.

Let’s take an essential computer part.

Can you guess what this is ?

Does the structure seem similar to you ? The hierarchical model with banks, rows and columsn remind you of something ?

This how memory is arranged in a RAM (DRAM to be more precise). The green box is the “Memory Controller” and the pink box is the structure of a single “DRAM Bank” many of which together make up the Gigabytes of memory we use.

This perspective makes me wonder at the enormous amounts of technological innovation that have gone into computer science! A single line of code printf(“Hello world”); goes through a complex mechanism of operations that make the letters appear on the screen. But it’s not all sparkles and dazzles, all this complexity is very relevant and adds up to something very practical, which I’ll illustrate with an example.

The above diagram shows that memory is stored in rows as contiguous chunks. This property is exploited by caching. The given diagram shows the layers of cache that store data loaded from main memory to allow faster access.

Memory hierarchy (in terms of clock cycles)

In line with principles of spatial locality, cache store contiguous blocks of memory fetched from the RAM. Matrix multiplication is a type of computation where this property of cache can be extensively exploited to increase performance. Take these two methods for multiply two NxN matrices A and B.

void naive_method() {
for(i=0; i<N; ++i){
for(j=0; j<N; ++j){
for(k=0; k<N; ++k){
naive[i][j] += A[i][k]*B[k][j];
}
}
}
}
void locality_aware_method() {
for(i=0; i<N; i++) {
for (k=0; k<N; k++) {
for (j=0; j<N; j++) {
locality[i][j] += A[i][k]*B[k][j];
}
}
}
}

The change in the position of j and k in the nested loops make all the difference. This small change makes the operation a row-major operation for matrix B. All the elements of a row in B are iterated over before moving to the next row.

However the improvement in performance enormous, because iterating over a row is much faster. The values of a row are contiguous segment of memory and are stored as a block in the Level 1 (L1) or Level 2 (L2) cache. This makes memory access faster than the naive method because each value in the column would have to be fetch individually from the Main Memory (RAM). The results for the Matrix Multiplication of two 1024x1024 matrices:

Matrix Initialization Time:  0.016488Naive Matrix Multiplication Time: 21.023548
Locality aware Matrix Multiplication Time: 3.429461
Your code speedup when compared with Naive method: 6.13x

At CASS 2018, I learnt this and many more micro architecture and computer system architecture design principles that have made possible the computing power we use on a daily basis. It was an intense 5 day learning experience that introduced advanced topics like branch predictors, cache prefetching, multi-core systems and GPU architecture. Along with lectures, hands-on session and talks on industry scope in this field, I was made aware this complex and exciting field of computer science. I must thank all the lecturers at CASS for this amazing experience and especially biswabandan panda for being an awesome host!

--

--