ElasticScaling on the fly

NoOps: Hotswapping slow function calls with RabbitMQ

reinman
reinman
Sep 3, 2018 · 8 min read

Here’s an example of how AI enables autoscaling. AIOps is just the beginning of AI sweeping through the entire software process. Just stay away from Big Tech however if you hope to stay in business

The Problem: Function Calls are Broken

DevOps, NoOps, and now AI Ops is all the rage today and a centerpiece of the so-called “Digital Transformation” buzzword. Yet almost 80% of the costs associated with “DevOps” are painful manual-intensive work required to scale up a codebase. Not exactly a sexy transformation.

Why is code so hard to scale in the first place? The proliferation of non-trivial (if not formidable) tools should be a red flag to any CIO: Kubernetes, Terraform, SaltStack, Ansible, Chef etc… the list keeps expanding.

Maybe these are misguided attempts to address a fundamental flaw in how function calls are built. Yes, this is a trollish statement: after all, function calls are a core construct in just about any mainstream programming language. However, decades ago when IBM and Silicon Valley mapped “lambda calculus” (which almost all programming languages are based on) to “von Neumann hardware” (which almost all hardware is based on), they made a problematic assumption — namely that the caller and callee must sit on the same processor.

The Fallout: DevOps

Along comes distributed computing… and developers started breaking up function calls into various client and server components via micro-services, event queues etc. Unfortunately, no one knows where the bottlenecks are until they put the system under load. The result has been a phobia of anything that smells monolithic —as a result, you can’t hardly write a line of code without first getting expensive DevOps and cloud specialists involved. Might as well let them write it all instead. But isn’t that also premature optimization?

NoOps = Rethinking Operating Systems

Asking developers rip up code to address runtime performance concerns sounds like another kludgy workaround for Silicon Valley’s lack of real innovation (/troll). It certainly increases DevOps cost and overall system complexity. Isn’t this is the sort of thing you pay operating systems and cloud providers for?

Backdooring Higher Order Calculus

The lambda computation model never played well with distributed computing. Bottom line, lambda calculus has no way to express runtime considerations. As a result, most programmers aren’t terribly comfortable developing inside a runtime (several blockchain disasters are proof of this). From a mathematical perspective, the DevOps macro trend is really an attempt to place higher order orchestration calculus such as a rho/reactive (or phi) on top of lambda. But why wasn’t this taught somewhere? Tragically, much of the computer science leadership from Berkeley and Stanford is now gone. So basically you are on your own.

Big Tech Cloud-Native or Bust

This leaves one obvious answer: hand everything to a Big Tech cloud provider and hope they acqui-hire you before copying it (or buying someone else). It’s no secret that AWS has been key to Amazon’s ability to expand quickly into new verticals. The self-hosting alternative is pretty grim: DevOps is a software development process unto itself and horizontal configuration is almost orthogonal to traditional coding.

Functional Programming

But not all hope is lost. There is a bit of a renaissance in computer science that can collectively be called “functional programming” (FP) or category theory. Category theory is an intersection of algebra (code) and geometry (data) that attempts to put some mathematical rigor behind a lot of the languages/frameworks/fads that have been churned out by the west coast over the years.

Returning Home to NYC

The home of FP of course is Wall Street — NYC finance has been pioneering advanced functional technology at scale for decades. Spreadsheets and trading systems are denotational and reactive by nature. Unfortunately, NYC is not known for sharing its secrets. But in a sense, the computer industry is returning back east to its roots (IBM is in midtown — hence the location of the famous Steve Jobs photo).

As such, the FP community however is starting to figure things out. I don’t want to dive into weeds too much but I should give readers awareness of projects like Unison, that basically realizes that any function can be distributed:

Any mapReduce operation can be distributed

Hotswapping = True Continuous Delivery

But what if you don’t know where the bottlenecks are in advance? After all, you don’t really want to shut down a trading system to fix a problem your code. Hotloading/hotswapping is one key area where the west coast just doesn’t get it. Hotloading code changes (e.g. at the function level) is very useful for rapid Agile turnaround when developing new ideas or collaborating with testers. It is also critical for properly leveraging newer tech like persistent memory hardware. Only hotloading provides for true CI/CD and in 24x7 systems you have no other choice.

Autoscaling: the Holy Grail of DevOps?

As you probably guessed, hotloading is key NYC sauce for automatically resolving performance bottlenecks by replacing traditional lambda calls with message queues, e.g. Unison’s distributed mapReduce.

Next-Gen Node

In the below walkthru, we will be using NodeServerless, an Estonian ‘serverless’ flavor of NodeJS. It is a reference implementation of the Inception AI project and borrows some ideas from NYC finance.

Example Walkthrough

Forget trying to learn Kubernetes or Terraform: suppose all you know is JavaScript. Below are two functions foo() and bar(). Note that bar() calls foo()(as indicated red arrow). Let’s pretend that system profiling discovered a bottleneck here.

Suppose our bottleneck is the call to foo() here

First some background on how NodeServerless works. Applying a Multics concept called single-level memory, foo() and bar() are automatically exposed as endpoints:

Our context called ‘tallinn’ already provides a mesh/topology, which aids in service discovery. Note these methods are already declared async (sprinkling code with async/await is an antipattern I covered previously).

Basic hotloading capability already allows us to edit these functions on the fly. They can also be synced with a traditional filesystem (as shown below) for replication across nodes.

Kubeless and Fission

For those that are not DevOps savvy, here is what Kubeless looks like:

Love those carriage returns

And here is Fission:

UNIX is a hostile world

RabbitMQ

Getting RabbitMQ working properly can be tricky… and this is where the system can apply a bit of automation magic. First let’s verify behavior of the existing code:

Automated testing is yet another Silicon Valley fail

Note that foo(10) = 15 and bar(4) = 12. Also note how the system is asked to “remember” these test cases for later… that’s only possible in an interactive system.

Let’s look at our RabbitMQ console and see it is rather empty:

Resolving Bottlenecks on the Fly

Okay back to our example. Suppose that system profiling discovered a bottleneck at foo()e.g. we need to split up foo() into message queue calls. Ideally we probably want to hide all this, but for this article let’s open up the sausage maker so you can see what is going on.

First, I run mq convert to (automagically?) transform foo() into RPC client and server components:

foo() is broken up into client and server RPC

There are new functions foo_mq_base() and foo_mq_start()? What happened?

First, the original function foo() has been hot-swapped with an autogen RPC client call. This ensures the function call from bar() still works:

The contents of the original function foo() has been moved to foo_mq_base():

The function foo_mq_start()allows us to manually start an RPC worker that consumes the message queue. Note that our code will use a queue called tallinn:root/foo

Let’s now re-invoke our earlier call to foo(). This will hang because we have not yet enabled things on the server side:

So let’s enable a service worker:

This immediately allows our call to foo() to complete with the correct value 15:

At the same time, the RabbitMQ console updates and shows our request and reply queues (note the reply queue is persistent/shared across the context ‘tallinn’ and uses correlation ids):

In the real world, we would continue to ramp up workers until the ready queue size gets under control.

Remember this is a running server. To prove the hotswapping did not break anything, we re-run bar(4) and get back the correct result:

Automated Testing

Let’s re-run our test cases from earlier to make sure:

The tests have passed. We’ve successfully hotswapped foo() with a RabbitMQ counterpart with just a single command: mq convert!!!

Final Considerations

In practice however, scaling lambdas is usually not the bulk of the problem (unless you are doing CPU intensive operations like image processing). Usually some underlying data is the bottleneck and that must also be split out. Hence this sort of operation must be automated.

Just please don’t hand things over to Big Tech!

reinman

Written by

reinman

Estonia is bringing back 1960s Multics using applied category theory and space cat memes

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade