(Mostly meant for Indian undergraduates who have taken a course on computer organization/architecture and want to pursue research in the field of computer architecture)
Preamble: What is computer architecture and how to get the basics right before jumping into the research? if you want to brush up your computer architecture basics then look out for resources . If you are not interested in computer architecture then you may refer the postscript of this blog :)
Real blog begins now: Since my early post-undergraduate days, I have often been asked by undergrads/potential post-grads: “What entails as computer architecture research? What does one really do?” My immediate response would be to ask them what they thought it meant. This question would typically receive a mixed/confused responses with primary keywords being: Intel, AMD, verilog, pipeline, caches, 8085 pin diagram, a chip, VLSI circuits, FPGA, Verilog/Bluespec code, processor, performance, and power. To be frank, some would even get-away saying “I am from CS and computer architecture is taught in EE so no idea what you are talking about”.
Before I get into what “computer architecture research” means, let’s take a step back and understand what “Computer Architecture” defines. What is Computer Architecture? You must be familiar about definitions like “art of, science of, organization of processor …” The definition that I like the most is the following: “Computer theorists invent algorithms that solve important problems and analyze their asymptotic behavior (e.g. O(NlogN) or O(N^3)). Computer architects set the constant factors for these algorithms…” If you like an explanation then please refer . I guess this definition would have cleared the air a bit if not completely. “Architects are really good at building computer systems that work and work well”.
With this definition of computer architecture, I would describe computer architecture research as “an effort to provide substantial data and reasoning on why a particular insight/idea/trade-off works well over others”. Still confused? Let’s take a small example and work it through as an “architect”. Let’s enter into a world of computer systems where there are no hardware caches. So your latest laptop will not have caches.
At this point, you want to do architecture research (do not forget the world with no caches). Let me spell out the situation: On every memory access, the processor needs to access DRAM (primary memory) to either read or write (known as LOADs and STOREs respectively in the computer-architecture community), and for some reasons accessing the DRAM is costly (hundreds of processor cycles). So, on an every read/write, the processor has to spend hundreds of cycles. The above Figure illustrates the scenario.
An ideal and the best solution will be to make these LOADs/STOREs less costly (say one cycle :)) so that there will be almost a 100X speedup. Let us see how a computer architect will solve this problem. A computer architect will not go for a theoretical/analytical proof that will prove that LOADs/STOREs can be processed within one clock cycle. Rather (s)he will run a set of representative programs (often known as benchmarks) and find out how important is this problem? Let me spell out “important”. If the benchmarks show that only 0.00000001% of the entire execution time is spent on the LOADs/STOREs then optimizing for DRAM accesses will be a futile exercise. Why? (well Amdahl will not be happy about it. Who is Amdahl? use your favorite search engine)
Assuming, your search engine answered you satisfactorily, let’s make the problem more realistic by saying the benchmarks spent 50% of the time on LOADs/STOREs (reading from/writing into the memory if you forgot what is LOAD/STORE). This sounds like a real problem. The architect within you is excited. The architect starts again with a thought process of understanding the memory access (LOADs/STOREs) pattern. There comes the bingo moment: the architect finds that most of the accesses show two kinds of access patterns: (i) a1, a2, a3, a4, … (ii) a1, a2, a3, a1, a2, a3, a1, a2, a3, … Here a1, a2, ……… are distinct memory addresses. OK, big deal. Does this ring a bell? Oh yes, spatial and temporal locality right ? Absolutely.
What next? Based on the above two access patterns, an architect would need a container that can exploit temporal and spatial locality, so that a LOAD/STORE that accesses the same address repeatedly or the subsequent nearby addresses, should not wait for the DRAM to respond. The architect is in a dilemma now. What should be the container? What should be the size of the container? what should be the access latency to access the container? Can the architect use registers (thousands of them) to exploit the locality? Why not a buffer? Why not a small DRAM (relatively smaller to actual physical memory)? Well your undergraduate architecture course would have talked about an SRAM alternative. So, now, the architect has multiple options and there comes the trade-off. The trade-offs can be in the form of latency, energy, power, chip area, and whatnot.
Now, the architect has to evaluate the performance benefits of all these options. But how? The architect is a poor guy who can not afford to build a chip (several chips in this case, one for each option). Completely impractical. So the architect decided to write software that can mimic (simulate) the hardware. Welcome to the world of simulators  , the bread and butter of computer architecture research.
Finally the architect simulated all the options and decided to go for an SRAM based container (why? why? why? brush-up your computer architecture course again) that you all know as a cache. So for an architect, cache is just a speculation technique that speculates based on spatial and temporal locality, and helps in bridging the latency gap between processor and the DRAM. This is still an open problem in the architecture community and it will be awesome to have a memory hierarchy with 100% cache hit rate .
In conclusion, an architect takes a problem that is worth solving, performs experiments to find out that it is actually worth solving, finds out insights that can help in finding out a solution, simulates different form of ideas/insights/designs and evaluates its trade-offs, and finally quantify the effect of the proposed solution.
Stay tuned for Computer Architecture Research-201 or may be 001. I am planning to post a series of blogs related to computer architecture research. Please do suggest themes for the next blog if you have any. This blog would have been impossible without the suggestions from my students, colleagues, and friends.
Postscript: I forgot to mention about the job prospects. OK, here is the list (not exhaustive though): Intel, AMD, ARM, IBM, Qualcomm, Apple, NVIDIA, Microsoft, Google, Facebook (yes, Microsoft, Google, and Facebook too), and many more. In fact one of my students joined Tower Research as “Low latency software developer” because of his aptitude in computer architecture.