Key Words: Transient

Published in

DevelUp

7 min readJan 16, 2017

adj. lasting only a short time; existing briefly; temporary: (dictionary.com)

In a Nutshell

Transient means “temporary” or “non-permanent”. For our purposes, the opposite of “transient” is “persistent”. Anything that is meant to exist for a short time can be considered to be “transient”.

Why it Matters

Understanding the difference between transient and persistent data (and data stores) allows technicians to mitigate workloads by trading RAM for CPU and disk read/writes (collectively “ops” or “iops”). This can be leveraged to speed your applications up, dramatically, and lead to higher application performance and reduced infrastructure costs.

The trends towards containers also mandate fundamental changes in how we approach problems, making it hard for anyone to achieve success without first developing a mastery of the term.

About the “Key Words” Series

Being an autodidact definitely has a major downside: those of us who learned to code on the hard streets of IRC often miss out on some extremely fundamental terms and concepts. I’ve found that very few ideas are truly original, and often my eyes are opened the moment that I learn what something is called.

So, in hoping to save others a bit of grief, this 101/102 series attempts to tackle some of the most useful words in modern web development and outline some of their common usages.

CS Usage: Transient Memory

In computer science, the most obvious example of the word “transient” is in basic computer hardware. Virtually all CS/IT professionals understand how RAM and CPU cache (transient) differs from hard drives (persistent).

Data written to RAM rarely survives a full [user] session and never survives a reboot. Conversely, data written to a hard disk usually stays there until it is explicitly removed (or the disk fails).

CPU Cache (L1, L2, and L3) is even less permanent than RAM and is usually recycled and replaced very rapidly as the CPU cycles.

Dev Usage: Transient Data Stores

The most common usage of “transient” for developers is in reference to “data stores” (anywhere that you can store data).

As you may have guessed, development concepts are often extensions of basic computer science concepts, and therefore, closely related. Transient data stores almost always refer to data stores that exist in “transient memory” (RAM) and are sometimes referred to as “[in] memory stores”.

Variables

As developers, we leverage transient data stores constantly and without realizing that we’re doing it; every time you define a variable, you’re allocating space in RAM and every time you set the value of a variable, you’re storing data to RAM.

All variables, unless and until they are persisted, are transient.

Transient Databases

Perhaps less known to developers is the concept of “transient databases”, which are applications dedicated to the storage and retrieval of transient data.

Transient databases work in a similar way that variables work and usually have simple interfaces, such as “key/value”. Of course, they cannot be as fast as variables because of the overhead involved in communicating with them, but they have the added benefit of allowing multiple applications to “share” transient data and are still extremely fast when compared to traditional, persistent, databases (MySQL, MongoDB).

Transient databases are fast because they store all (or most) of their data in RAM. The downside, of course, is that data is often at risk of being discarded or lost.

Many transient databases offer some form of persistence, but developers should be skeptical of such things. Data persistence has an unavoidable cost and as reliability increases, efficiency and speed, inevitably, decrease.

If you want to get the most out of transient databases you should disable persistence and only store data into them if you can afford to lose that data (because you can rebuild it, even if doing so requires extra work).

I’m sure that dozens of transient databases exist but the two that stand out the most, to me, are Redis and Memcached.

Caches

Although it is a broad term, a cache is almost always a transient data store that allows persistent data to be served rapidly from transient memory (such as RAM).

Within the context of web development, a “cache” (aka “caching layer”, “forward cache”, “caching proxy” or “CDN”) is a middle-man apparatus that exists between an application and a persistent data store. They often store the responses from “read” operations, in RAM, from persistent data stores for a finite amount of time.

Cache Example 1: CDNs
AWS CloudFront, is a CDN (“content distribution network”) service that sits in front of your web site. Whenever a user requests a unique resource it will check to see if that resource is in its cache. If so, it will not contact your web site but will, instead, send the cached data to the user (this is called a “cache hit”). If not, it requests the resource from your website, forwards the response to the user, and then caches (stores) the resource/response for a predefined period of time (this is called a “cache miss”).

In this way, CDNs leverage transient data to trade reduced consistency (how up-to-date the resources your users receive are) for compute workload (CPU and disk ops) at your origin (in this case, your website).

In extreme cases you might be able to reduce the workload of your dynamic website, thus the size of your infrastructure, by as much as 95% by allowing your website content to be just a few minutes old for any given viewer.

CDNs usually have the added benefit of being geographically dispersed (hence the “N” in “CDN”). Often, CDNs will include multiple caching servers (called “edge servers”) and will route users to the closest one. By doing so, “cache hits” will further improve the response time of your website by, effectively, moving your website closer to your users.

Cache Example 2: Query Caches
Another example of caching persistent data is MySQL’s query cache. In short, MySQL allows you to cache (store) the response to some (or all) queries in RAM, so that subsequent requests for the same data can be sent much, much, more quickly.

Although it can be difficult to do, properly configuring your DBMS and its internal caching mechanisms can radically reduce the compute burden on your DBMS, especially if a significant portion of the queries to your database are identical.

Query caching, being at the database level, usually has the added benefit of being able to automatically “invalidate” (clear) stale data whenever the data is changed within the database, which means you don’t even need to trade consistency for the speed and efficiency boost (though, you may still need a lot of RAM).

DevOps Usage: Containers

Although linux containers (LXC) have existed since 2008, they really caught their stride in 2013 thanks to Docker.

Containers, led by Docker, are revolutionizing systems and devops, for many reasons, but they come with a catch: in order to use them as intended and reap all of their benefits, technicians and teams have to fundamentally change the way they build application environments.

In short, the most important thing that we all have to embrace, is the transient nature of containers and the environments (and data) within.

Of course, there are ways to persist data from and within containers, but all of those ways go against the grain, to at least some degree. If you want to save yourself a lot of hardship, you won’t try to fight it, except in the narrow subset of cases that you absolutely must.

I’ll save the container “deep dive” for future articles, but I should, at least, explain the fundamentals of their transient nature for the sake of being comprehensive.

The Transient Nature of Containers

Containers exist at rest as “images”. From the outside you can execute applications (usually, exactly one application) within the file system of an image.

Whenever you execute an application within an image, that image becomes instantiated, and an instantiated image is referred to as a “container”. If you’re following best-practices, containers are deleted, entirely, as soon as the application stops.

This means that, like bare-metal machines and virtual machines, containers lose their RAM whenever they stop. Unlike machines and VMs, though, they also lose any changes made to their file system since instantiation. In this way, containers can be thought of as transient virtual-machines.

Fully coming to terms with the transient nature of containers can be quite a struggle, but it really just boils down to the primary application being unable to persist anything to disk.

You can still pre-bake the disk with persistent data during the image “build” process and communicate with external data stores. So, you’ll be happy to know that almost every problem can be mitigated, with a bit of refactoring, and after doing the work, you’ll realize that you’re much better off, for many reasons that extend beyond containers.

Conclusion and The Rise of Transience

Computer science has been trending toward transience for quite a while, maybe since the utterance of the term “computer science”, and, so, its not a trend that is likely to go away anytime soon.

You might be familiar with other concepts, such as “commodity hardware” and “disposability”, both of which paved the way to the modern trend towards transient environments and infrastructures; the modern state of the art should not be surprising to anyone that has been following along.

Virtual machines came onto the scene and were probably, initially, praised for the advancements in resource isolation that came with them. However, as nerds learned to automate VM image builds and cloud services brought the “commodity hardware” advantage to the masses, the disposable nature of the VM started to lend itself to dynamic, and increasingly transient, infrastructures.

Containers just take the next, obvious, step, by removing the overhead that comes with turning a computer on and waiting for it to boot up.

A pleasant side-effect of this is that containers are making micro-service architectures more practical and now we’re seeing more of those explode onto the scene.

Coincidentally, persistent databases have been learning to trade CPU and disk ops for RAM by ramping up their internal caching capabilities, such as with MySQL’s query cache and similar DBMS trends towards index and full-on data caching.

Likewise, most compute-heavy web applications are learning to use forward caches, such as Varnish, and IaaS CDNs such as AWS CloudFront, to leverage transient data as a way of reducing compute workload and disk ops, in exchange for slightly reduced consistency (see also: CAP and PACELC).

Because of these trends, its important that all CS/IT professionals understand how transient data and, by extension, transient data stores, databases, and infrastructures, are facilitating advancements throughout the industry.