Abstraction: the most ignored computer science concept

Lane Holloway
4 min readApr 6, 2016

--

Ask a developer what they think the most important concept to know about computer science is and you’ll get a plethora of answers: algorithms, data structures, regexs, and other implementation level concepts. You rarely see someone answer abstraction.

Abstractions at the Language Level

Thinking about the programming languages they fall into three buckets:

  1. tell the machine exactly what to do
  2. tell the machine how to do it
  3. tell the machine what you want done

Telling the machine exactly what to do is programming assembly language. Move bits from here to there, now add them, now store it back in this register.

Telling the machine how to do it falls into the realm of C/C++ and Java. You aren’t twiddling bits, you’re instructing the machine that you want a loop over a collection using an index variable and then perform an operation with each piece of data in the collection.

Telling the machine what you want done, which falls into the realm of Haskell, F#, and Java’s Streams (to an extent — I know it isn’t exactly right). Here you write programs that say: given this collection, I want you to filter all the items taking only particular ones, then transform the items into a different format, then after that I want you to create a histogram of the number of times some value appears.

The higher the level of abstraction the less involved you are as a programming twiddling bits at the electron level but the more you’re involved in actually getting work done(tm).

Abstractions at the Design Level

Design in this case, covers everything: from method to class to component to a system to multiple systems. At each point, you need to understand the abstractions at which you’re working.

At the method level, this is a simple as figuring out what the problem you are actually solving and choosing the appropriate data structures and algorithms.

Following the hierarchy upwards we find ourselves at the class / object / related functions and methods level. Here we’re trying to coordinate a set of operations that makes sense being together. At this point we’re more concerned with larger operations being performed such as: persist this set of data to a datastore or submit work to a job queue. We aren’t as concerned with how we’re going to do it, only that we’re going to do it. However, the algorithm we use to do this work and the data we send and receive matters.

Next comes coordinating multiples of classes / objects / related functions: how do all of these work together? At this point, I think you’ll see I’m beating a dead horse; each high level the same thing is done: figure out the real problem, determine the correct data structures and algorithms, then implement (or tell people how to implement).

Learning How to Handle Abstractions

Having talked about abstractions and how we don’t really think about them, you’re probably asking yourself: “Self, how do I apply abstractions so I can become a kick-ass take-no-prisoners developer / architect / demi-god?” The answer: practice.

No matter what level of developer you are all it takes is sitting down and thinking about the problem. As an example problem, think about a simple rsync case. You want to keep directories on two different machines up-to-date with one another (copying and overwriting files/directories only). What does this mean? First, let's write some assumptions:

  1. When the operation is complete, all files and directories exist on both machines and are the same.
  2. If a file or directory on the source does not exist on the target, it will be copied or created (in the case of directories) on the target and vice-versa.
  3. If a file exists on both machines, the newest file wins the conflict.

These three assumptions define how we expect it to work at a high level and how it should behave.

At the next level down, we want to look at how would we accomplish this on a single machine. Again, writing down some assumptions:

  1. The machine has a list of all directories and files on the other machine along with metadata to determine the differences between files.
  2. A machine is defined as the source machine and another the target.
  3. A list of files to receive and send is created
  4. A list of directories to create is created
  5. Files are copied appropriately
  6. Directories are created appropriately

Now, going down one more level, we’re looking at each bullet point above and figuring out the algorithms needed to solve each of them. At this point I’m going to step aside and call it “left as an exercise to the reader” so you, dear reader, can get some practice. :)

--

--