Reliable apps with (HA)Proxy — Intro

I’ve been away from web application development for quite a few years.

--

To be honest, it has been a pleasant experience: no “annoying” customer requests, no endless changes and rewrites, only important stuff (TM).

Just kidding of course, but systems programming is indeed different. One of the defining features of such development is that your app is just supposed to work, never stop working, recover automatically in case of failures, and log all interesting problems automatically. Fancy that, it is a good idea to have that everywhere, don’t you think so?

Intro

I’m not a front-end guy by any means, so we’re going to limit ourselves to the main course/foundation, a typical three-tier app, maybe just with a few sprinkles of micro-services, REST APIs, etc. Don’t be afraid of the acronyms — the bunch mentioned above just means that our application talks HTTP, the lingua franca of the modern Internet, and occasionally speaks with a database or other services/applications (again over HTTP).

Language & framework

We are spoiled for choice here. I don’t think you can make a wrong selection.

My first pick would be Python, a fairly popular and approachable language. It’s not the fastest gunslinger in the West, but it gets the job done and is fairly easy on the eyes and the brain.

Other popular languages are just fine, developers are plenty for JavaScript, Go, Ruby, Java, C#, and modern PHP (did I tell you a joke about senior PHP developers?). Just make sure you and your team are intimate with it, and that it is a good fit for the problem domain.

I have a confession to make. When I said Python is my first choice I lied, don’t tell anyone. These days I try to use Rust for almost everything. Pain and suffering are never out of style, and there seem to be many people like me, with Stockholm syndrome, who love Rust. It’s a mad world indeed.

Full metal jacket development is not for the faint of the heart, but I’d still suggest that everybody tries weird or exotic languages, programming with Rust, Lisp/Clojure, Haskell, Prolog, really changes your perspective and mutates the way you program in your favorite language.

You ask me about frameworks? Find the 3 most popular for your language, pick any, and life will be good.

Data(base) layer

Most of the useful applications need to store data somewhere. Sometimes even plain files are enough, but as soon as your concurrency number is greater than one, reading/writing/locking (files) becomes tedious.

So you go forward and select either classical RDBMS, modern NoSQL database, document database or some vendor/cloud specific product made to resemble the above.

And now you’ve made your first mistake. You see, while programming language selection seems important, it is actually data that is the king. Successful applications/projects store so much data, and that information is so important, that it trumps everything else. You will be held hostage by the data.

In addition to that, there is some intrinsic connection between data, and the way you need to program your app. Hope that’s not just my imagination.

Unless I know something very specific about the way my app is going to work, I’d choose classical RDBMS, PostgreSQL namely. It is a lovely database system, with a long history and good street cred (it is more than 25 years old, and it is constantly gaining in popularity in the last decade)

If I need to use hosted service, I’d use PostgreSQL based/compatible one, since it should be easy to move data to another PostgreSQL system.

Caching layer

Computers and programs are made of parts/layer with different speed. Aside from buying faster computers and coding faster algorithms (good luck with that), there is only one thing you can do — cheat! Avoid doing work or cache things. Keep the handy table under your pillow.

Is the cache really necessary part of the reliable app? Absolutely! You don’t have to use it right away, but you should never miss the opportunity to speed up things with a little cache. It’s not that hard. You see, one closely guarded secret is that when your program is slow, under stress, it becomes unreliable. There are many examples where fine applications become trash just because they were not fast enough, could not keep up or scale (horizontally or vertically). As your application grows performance can become an important feature!

There are actually many program/app layers where caching is possible, lookup tables, function memoization tricks, private in-memory caches, Redis/Memcached, HTTP caching (Nginx), REST API micro cache (Nginx, HAProxy, Envoy) etc.

Proxy layer

You already knew about the previous ingredients, but this is probably something new. Maybe you are not aware of it, but if you deploy your app to the one of the popular clouds, you are already using a proxy in front of your app, maybe even “inside”, if you are a Kubernetes person (I must admit I’m not). These proxies only expose a very limited set of functionality, and there is a lot more to explore and use. We’ll explore our options and some ideas for use with HAProxy, one of the oldest, fastest and most versatile proxies around. There are other proxies which can be used when situation requires it:

  • Cloud proxies — When in Rome, do as the Romans do. Use what’s available (AWS ALB/NLB, Google LB, …). Useful as an edge proxy, access and logging layer and as a WAF.
  • Nginx — an excellent web server, but rather rudimentary reverse proxy (commercial Nginx plus version is better). If you already use it as a web server and don’t need advanced stuff, go for it.
  • Envoy — newcomer to the scene, rather popular and much advertised proxy. It really tries to follow HAProxy lead in the feature department. I consider it MongoDB of the proxy world (you didn’t hear it from me). It should be good to use by now (2022), Google uses it as an edge load balancer on Google Cloud.

HAProxy recipes

HAProxy has so many features that one can seriously ponder calculating its Schwarzschild radius. No living person knows all the HAProxy stuff anymore (to quote its creator Willy Tarreau). The configuration language is somewhat arcane but terse; at least it’s not some YAML horror.

Saying all the above, it is still the best of the best. It can proxy Layer 4 (TCP) or Layer 7 (HTTP) traffic, it speaks TLS/SSL, it has the biggest selection of load balancing algorithms, it has a few tricks for popular databases (MySQL, PostgreSQL, Redis), you can even teach it new tricks via binary checks, Lua scripts, etc. It can reliably log your traffic, shadow the traffic, passively and actively monitor/observe HTTP traffic…

One additional area where a Proxy can help in your micro-service-oriented or distributed applications is in applying the DRY principle. A proxy can provide services such as logging, authentication, and transformation, so you don’t need to implement the same stuff over and over for every micro-service (this is especially important if your micro-services are not written in the same language or framework).

We will try to present some ideas in the following sections.

--

--