Layers of Abstraction

Too much of a good thing is bad…

What is Abstraction?

Currently in my University studies, I have been learning about distributed systems. In the Tutorials, we have been implementing a toy multi-tiered application, that has:

  • A data tier
  • A Business Tier
  • A GUI (Presentation Tier)

Each of these layers can run in different places: The data tier may run on one server machine, and the business layer may run on another. The GUI may be on a client machine, on a mobile phone, or render HTML and run JavaScript on a web browser.

At a lower level in Networking, we also talk about layers of abstraction. The Data Link Layer talks to the Internet Layer who talks to the Transport Layer who talks to the Application Layer. Each Layer “abstracts” the gritty details away from the layer above it. Each layer handles its own details so the layer above it doesn’t need to worry about it.

These are forms of abstraction. We are creating layers in the program, in order to hide the details from the callers. There are several reasons why we want to do this:

  • Re-Use: If we are going to use something multiple times, it is better to group something into some ‘container’ to store it to use it again.
  • Working in teams: In large scale programs, it may not be possible for every developer to know all parts of the system. In this case, one wants to abstract away as much of the concerns as possible in order to make it easy for another team member to use your work.
  • Intellectual property: If one is writing some code that one is selling, you probably want to make it as abstract as possible, to hide the details that make you money from competitors.
  • Maintainability: If the code were just a jumbled mess, it would be hard to figure out where things were. If there’s a bug, or we want to change or add a feature, we need to be able to easily find where to start.
  • Security: Oftentimes there are checks that need to be put in place every time some action occurs. Abstraction allows us to ensure these checks get done every time an action occurs.
  • Easy to change: If other code doesn’t know about the details of our implementation, we can change the code to do things in a different, better way, and the code won’t break* (unless it does)
  • I’m sure there are more… there’s a reason this is a much-loved concept.

We use this all the time in Computing. They teach it from the very beginning of any decent Computing course. From the simple procedure/function/method, to classes and modules, to Model-View-Controller, to Multi-tiered Distributed Systems. Abstraction is a fundamental concept of Software Engineering.

Your subtitle implies there’s a problem…

Performance Concerns

Here’s one issue with too much abstraction I hear a lot. “All this abstraction is clogging up my system!” It’s well known that there are some use-cases where you avoid running abstraction. Video games are such a use-case. It is important to reach that smooth 30 or 60 fps, or your customers will cry foul. Another example, is in Big Data. Using abstractions that bloat up your system and add a couple of milliseconds to each operation may be seem harmless at first, but if you’re iterating over a lot of data, those milliseconds add up!

An issue I see in many modern apps is that we seem to have accepted bloated, slow-ish software. In Windows 10, half the time, I tend to go for the old-style applications, rather than their “Modern Apps”. They seem to be quite slow at times. You open the program, and whereas the old version loads up pretty much instantly, the “new and shiny” one takes 5–10 seconds or so to load up before you can start using it.

I don’t think this is a matter of “They used C#. Garbage Collector, lolz!”, C# and Java can be plenty fast enough for a GUI application. What I suspect may be the issue is too much abstraction.

There are certain costs with adding on various types of abstraction. Often times, when you generalize code, developers will put checks in to ensure that the parameters being passed into their abstractions are valid for their class/method/whatever. When you add layer upon layer inside your application, you perform multiple checks, some of which are often redundant in your specific use-case, which adds up as you slap on more layers. There’s also the matter of things going on in these abstractions that the caller is not aware of/did not intend. See the next section for more on this.

Everything has a cost. The way inheritance is implemented, at least in C++, and I’m pretty sure in Java and C# too, has a cost. Virtual functions are implemented as function pointers. Whenever you call a virtual function, the code must first go to the vtable, find the index of the function you wish to use, get the memory address of that function, and then jump to that address, adding indirection — and a cost. The cost may be acceptable, but you must be aware of it.

Keep your code simple if you want speed. Too generalized, over engineered code is usually a good indicator of slow code.

And don’t say that speed isn’t important. If a customer sees a program that responds faster, and more fluidly to their interactions, and gets in their way as little as possible, they are likely going to move over to that program, over one that takes a little longer to respond. I’m not saying we should all abandon high level languages to write everything in procedural C or assembler; I’m saying that we need to be aware that our choices have costs, and make sure we are making the right choices for our domain… and make sure we are not under-estimating the cost.

Not knowing what’s going on underneath

“But isn’t that the point of abstraction and encapsulation?” I hear you ask? Yes! As I mentioned in the first section, this is kind of a good thing… but it’s also a bad thing.

In my class on distributed systems, we are using .NET to make things easy for us, as we create our client-business-data toy multi-tiered program. One of the things our lecturer has been stressing is that behind the scenes, this is going over the network. It goes down through the transport layer, to the data link layer, through the network, to the client, possibly through long distances, taking a long time, and taking up bandwidth. When we write distributed code, we need to keep in mind the various costs that come with running it.

Whilst reading about the recent left-pad blowout in the JavaScript world (read about it here), someone pointed out that left-pad was actually kind of inefficient. People were blindly using this code in production, because they decided they should use someone else’s abstraction, and didn’t bother looking at the details (even though it was open source, so they easily could have). They just trusted that whoever created that package knew what they were doing.

In C++, std::string is immutable. When you perform an operation on a string and “mutate it”, it is actually allocating space for a new string behind the scenes. If one is not careful, one can easily perform loads of allocations and de-allocations without even really realizing it. A couple of years back, it was realized that precisely this was happening in Google Chrome.

Not having to think about the goings on of every part of the code is a blessing, but it can also be a curse.

It can lead to code bloat

Last year, I embarked on a year-long group project at University in Java. Because we were relatively inexperienced, we made several mistakes along the way. It was quite an eye-opener in the problems of development of a project of somewhat larger scale!

As we sat down and thought about our problem, we naively started writing all these abstractions, and programming all sorts of things that weren’t necessary. Before we knew it, we were experiencing code bloat!

I remember several times throughout the semester looking through other sections of the code, some of which I’d written, and having little idea what on Earth was going on. It was difficult to reason about the project. We had to jump all over the place, and some things were not clear.

That experience shattered my faith in Object Oriented programming. Since then, I’ve come to forgive Object Oriented programming a little (after all, it is an industry standard, and I’m going to need to be able to use it if I want a job in the industry in all likelihood), but I still distrust it, and am not fond of the dogma that often comes with it.

Premature Optimization is often quoted as being the root of all evil. Premature Abstraction is just as bad.

You aren’t going to need it!
Keep it simple, stupid!

Are you done with your rant yet?

In Conclusion:

Abstraction is a good thing, but too much of a good thing can be bad. Rigid following of a set of “best practices” can sometimes lead to pain. Of course, not having a set of standards is even worse, but like many things, I believe in a balance.

Abstract away as much as is helpful, but every time you add another layer, think:

“Is this really necessary?”

“What are the costs?”

“What are the benefits?”

“Is it worth the problems it will cause?”

“Is there anything I can do to lessen the problems it will cause?”

I think moderation, and willingness to compromise when necessary is what is mandated.

Disclaimer:

This is my first blog post, and I am inexperienced at writing these. If I have made any mistakes, or not framed my thoughts very well, I humbly request you go easy on me, and perhaps give me some constructive feedback.

I will also freely admit that I am writing from the point of view of a student, who has little experience out in the real world. All I know is what I have learned at University, my own meager experience, and what I have read and listened to on the Internet.

My views will mature as I gain more experience, and I suspect that learning “the right way to program” will be a life-long journey of mine. I look forward to the learning experience.