The David vs. Goliath Story of Parallel Computing

Published in

cat /dev/urandom

4 min readMar 17, 2016

Have you ever looked at a wind turbine, a car piston, or a jet turbine blade and wondered how it was made? How about a racing yacht? Perhaps you’ve wondered about the shapes of the most fuel efficient vehicles? If you’ve had any of these thoughts, you probably got as far as realizing it was probably designed on a computer with some person using a computer aided drafting program to lay out the shape that performs the function. There is another step though, simulation of the design to optimize it.

What would happen if Company A has the funds to build a simulation of a product to remove excess material, and strength it only where it absolutely must be. This results in a product that is super lightweight, yet still extremely strong. Company B makes a similar product, and they can afford large amounts of compute power provided by cloud service providers, yet they can’t afford to pay to have a simulation built. Company B produces an inferior product, despite having, perhaps the best initial starting product. Company A has more funding and can purchase the talent needed to write parallel simulation code. (example of scaling performance for molecular dynamics simulation)

What would happen if Company B decides to go ahead and simulate their designs with code written by your average programmer (something sequential, perhaps in Ruby, Python, or Java). Company B would eventually produce the same product as Company A, however it would take them orders of magnitude longer. Company B would still lose. Why? Both Company A and Company B both have access to basically the same hardware through cloud service providers. Cloud service providers bring access to top notch hardware at rates that are quite affordable. What hasn’t come down in cost, is the difficulty and technical skill needed to build highly parallel, performant software.

Why is it so hard to write parallel code? In short, it is a dark art. The reason it is so revolves around the types of interfaces that have been built and details about the hardware that must be taken into account in order to make an application perform. Often each type of hardware has its own proprietary programming model, or (perhaps worse) the same generic code must be re-written (new loop bounds, algorithms, cache blocking) for each hardware set. Even worse, top programmers must know how the hardware works (e.g., how locks are implemented, how atomic operations work, often even details about processor coherence protocols, and even how instructions are scheduled) to edge out the most performance. In many cases the details they need are unknown, or are only available with experience (or insider knowledge, most famous case of this is the undocumented interrupt for Minix…I can assure you, there are many more of these undocumented “features”). This is what big companies do when building the leading edge of their product lines, they hire people that can code at this level. These people come at a price though. When efficiency of the product itself draws a high premium, it is often profitable to spend on those who can write high performance code. Where does this leave the small companies though? Current US DOE estimates put HPC code at approximately $100 / line (oddly I’ve seen EU estimates at 100€, seems this is a nice round figure). This is hugely expensive, and out of reach for most small companies given that many simulations are thousands of lines long.

Reducing the barriers to building high performance code, and parallel code in general is hugely important to the economics of the world (hopefully I can find time to quantify that statement in a future post). I realize we’re just getting to the point where we’re focusing on teaching people to code in general, but we need to be looking at the next step. Reducing the barrier of entry to parallel programming for start-ups and small businesses is critical not only to the competitive advantage of these small companies, but also critical to continuing the technological growth that many of these companies bring to the table. We’ve made big-data storage quite cheap. Getting that data is relatively easy in many cases as well (via web scraping, or even purchasing in bulk), but processing data on a time scale that is actionable is the differentiator. In my opinion we have plenty of cores (and we’re only going to get more). We have lots of cheap computation. It is, however, largely unused by the people that could use it most due to the cost of actually writing the programs to use it.

The next big challenge, in my opinion, is reducing the cost of entry to coding performant parallel systems. Part of the solution is looking at programming languages as a user interface. Languages must be as intuitive as most of our top graphical user interfaces are. How much time did Apple spend optimizing their GUI? My guess is quite a bit. The same effort needs to be put into the design of the next generation of programming tools. Programming tools shouldn’t be designed for the best programmers, but made more accessible. Languages should be easy to understand and truly abstracted from the hardware on which it is running. Will we get to perfect hardware utilization with this abstraction? Probably not. The goal shouldn’t be perfect, several NP-hard problems would have to be solved in P to do so efficiently (lookup P != NP to get the subtle joke). The next focus should be on automating the partitioning, scheduling, and distribution of processing across distributed systems. Much work has been done in this area, but again, it is largely a manual art. One day, a programmer will be able to write an application in a single language, without hardware specific code, and it will execute on all the hardware available to it. That day is probably far in the future (although things like OpenCL come close, it’s definitely not intuitive to your average programmer, nor does optimize globally).

The David vs. Goliath Story of Parallel Computing

Written by Jonathan Beard