Software Engineering 101: Stuffing multiple values into a single 64bit value

Skrew Everything
From The Scratch
Published in
10 min readAug 11, 2020

There are some things which every software engineer should know but unfortunately it is not taught in the schools. One of them is: How to stuff multiple values into a single value.

Software Engineering 101 Series —

  1. Stuffing multiple values into a single 64bit value (you are here)
  2. Time Zones and Working with Dates

If you don’t know about this, you may be wondering

  1. How does that help?
  2. Why can’t I use different variables for each value?
  3. Why to complicate things unnecessarily? I have 8GiB of RAM, why should I care about saving 8 bytes?

The main aim of stuffing multiple values into a single large value is not to save memory or decrease the number of variables in the program.

The main aim is to encode multiple useful values into a single value to identify it uniquely, most of the time

Instead of going into “how to do it” first, let’s take an unconventional route and first go into a use case to understand why it is needed and where you can use it. This can make you understand the “how to do it” intuitively.

Use Case

Let’s consider a scenario where this technique is used extensively, database sharding at application level.

Database sharding at an application level means the application itself knows where the data is residing among your swarm of database instances. Instead of having some kind of routing mechanism or hash tables to locate the data on the server, the application knows which database has the data it needs. This type of sharding doesn’t depend on the database you are using and doesn’t matter if the database doesn’t provide any sharding features like MySQL.

For example, you are just starting your website and designing a database for it. Normally, if you don’t have any natural key, you would use the auto-increment value as the primary key. And also you use that auto-incremented ID as a part of the url to map to the data(like how Stackoverflow does).

15505515 is the postID

After 1 year, your database server is struggling to catch up with the traffic requirements. To solve that, you have 2 options.

  1. Vertical Scaling
  2. Horizontal Scaling

Vertical Scaling is costly and finite. So, you want to opt for horizontal scaling but MySQL doesn’t provide horizontal scaling feature. So, you add a new MySQL instance to shard the data. Now, you need to find a new mapping to find the sharded data directly instead of searching all the instances.

This is where we can use the “stuffing multiple values into the single 64bit value” technique.

Before doing that, we need to do some things:

  1. Give every database shard a unique ID and call it as dbID
  2. Now, you already have auto-incremented postID for every row in a database.
  3. Decide how many bits you want to allocate for each ID above. For example, you allocated 10 bits for dbID and 42 bits for postID.

10 bits can store 2¹⁰ = 1024 values, 42 bits can store 2⁴² = 43,98,04,65,11,104

It means you can create 1024 shards and each shard can have 43,98,04,65,11,104 rows.

So, instead of identifying the post using the postID, you also need to identify shard i.e., dbID.

So, instead of encoding your url like “your_domain.com/posts/postID”, we encode it like “your_domain.com/posts/newID”. Now, the newID contains the info about postID and dbID so that your app knows which shard to ask for the data directly without any middleware.

Now, we are going to encode postID and dbID into the newID like:

(I’m using Java, so Java’s Long is 63 bytes. Why is this important? We’ll see about it below in “how to do it”.)

Disclaimer: Don’t be intimidated if you don’t know what it is. We will talk about how to do it after this.

long dbID = 997;
long postID = 8_04_65_11_104L;
long newID = (dbID << 53) | (postID << 11);
System.out.println("newID: " + newID);
System.out.println("dbID: " + ((newID >> 53) & 0x3FF))
System.out.println("postID: " + ((newID >> 11) & 0x3FFFFFFFFFFL));

The output will be:

newID: 8980194136231510016
dbID: 997
postID: 8046511104

So, now your new url can be “your_domain.com/posts/8980194136231510016” instead of “your_domain.com/posts/8046511104

This “8980194136231510016” has the information about the shard ID and post ID.

This is called application level sharding because your application knows that postID: 8046511104 lives in database id: 997. There is no need for middleware which needs to check in which database the postID: 8046511104 is.

I’m pretty sure that now you know why it is useful with a real-world use-case. Now, let’s see how it is done

How to do it

First and foremost thing you need to be careful about the size of Integer in your preferred language.

For example,

Javascript has only one type both for floating and integers i.e., Number. Javascript only uses 53 bits to store the integers but not 64 bits. In that 53 bits, 1 bit is used for sign. So, you can only use 52 bits to encode.

Java has “Long” type which can store 64 bit integers. But unlike C and C++, Java doesn’t have “unsigned”. So, you can only use 63 bits because 1 bit is used for sign.

So assume “B” is your number of bits you can use. In the example above, we have used Java, so B = 63.

NOTE: Whatever you wanna encode, your sum of individual values in bits cannot exceed “B” and also only positive numbers are allowed

Let’s assume we want to store the following values in a single value:

  1. A with 15 bits — range can be from 0 to 32, 767 (2¹⁵-1)
  2. B with 35 bits — range can be from 0 to 34,35,97,38,367 (2³⁵-1)
  3. C with 10 bits — range can be from 0 to 1,023 (2¹⁰-1)
  4. D with 3 bits — range can be from 0 to 7 (2³-1)
  5. E with 1 bit — range can be from 0 and 1 (2¹-1)

The sum of individual values in bits is 15 + 35 + 10 + 3 + 1 = 64 bits.

But our “B” is only 63. So, you have to let go any one of them to bring it below 64. We will let go “E” , which is 1 bit. Now, our sum of bits will be 63 bits, which is equal to “B”.

This is how we want to arrange all our values at the end

visual representation of our 63 bits with various values

There are 2 parts for this.

  1. Encoding
  2. Decoding

Encoding —

This is how “A” with 15 bits looks in a 63 bit data type:

Visual representation of 15 bits in a 63 bits

We need to move all the 15 A ’s to the leftmost empty bit. We need to left shift by 63–15(A) = 48 bits

A << 48

After the left shift, this is how it looks:

Visual representation of 15 bits in a 63 bits after left shift by 48 bits

This is how “B” with 35 bits looks in a 63 bit data type:

Visual representation of 35 bits in a 63 bits

If you look at the previous image, “A” has already occupied 15 bits from the left side. So, we need to shift all the 35 B’s where 15 A’s end.

We need to left shift by 63-15(A)-35 (B)= 13 bits

B << 13

After the left shift, this is how it looks:

Visual representation of 35 bits in a 63 bits after left shift by 13 bits

This is how “C” with 10 bits looks in a 63 bit data type:

Visual representation of 10 bits in a 63 bits

If you look at the previous image, “A” has already occupied 15 bits and “B” has already occupied 35 bits from the left side. So, we need to shift all the 10 C’s where 35 B’s end

We need to left shift by 63–15(A)–35(B)-10(C) = 3 bits

C << 3

After the left shift, this is how it looks:

Visual representation of 10 bits in a 63 bits after left shift by 3 bits

This is how “D” with 3 bits looks in a 63 bit data type:

Visual representation of 3 bits in a 63 bits

If you look at the previous image, “A” has already occupied 15 bits, “B” has already occupied 35 bits and “C” has already occupied 10 bits from the left side. So, we need to shift all the 3 D’s where 10 C’s end

We need to left shift by 63–15(A)–35(B)-10(C)-3(D) = 0 bits

D << 0

After the left shift, this is how it looks:

Visual representation of 3 bits in a 63 bits after left shift by 0 bits

Now, we have all the “A”, “B”, “C”, “D” bits at their respective positions like below

(A << 48)
(B << 13)
(C << 3)
(D << 0)

Now we want to combine these into one variable. The | operator works by looking at each bit, and returning 1 if the bit is 1 in either of the inputs. So:

0011 | 0101 = 0111

If a bit is 0 in one input, then you get the bit from the other input. Looking at (A << 48), (B << 13), (C << 3)and (D << 0) you'll see that, if a bit is 1 for one of these, it's 0 for the others. So:

encodedValue = (A << 48) | (B << 13) | (C << 3) | (D << 0)
encodedValue = (A << 48) | (B << 13) | (C << 3) | (D << 0)

Now that we have completed with the Encoding part, let’s get into Decoding part.

Decoding —

In encoding, we have used left shift( << ) and OR( | ) but we use the opposite in decoding. We use right shift ( >> ) and AND ( & ).

Now we want to unpack the bits. Let’s start with the D. We want to get the last 3 bits, and ignore the first 60 bits.

To do this, we use the & operator, which returns 1 only if both of the input bits are 1. So:

0011 & 0101 = 0001

So, if you want to unpack only “n” bits from right side, you need to perform AND with “n” bits in which all the “n” bits should be 1.

We got “D”

To get “C”, we need to right shift till it reaches the rightmost bit. Right shift “C” with the same number you have used to left shift during encoding and perform AND operation with 10 bits(because “C” is 10 bits) set to 1.

We got “C”

Keep performing the same thing for the rest of “B” and “A”.

Right shift “B” by 13 and perform AND operation with 35 bits(because “B” is 35 bits) set to 1.

We got “B”

Right shift “A” by 48 bits and perform AND operation with 15 bits(because “A” is 15 bits) set to 1.

We got “A”

Now, we have decoded the single encoded value into their own individual values

decodedD = (encodedValue >> 0)  & 0x7
decodedC = (encodedValue >> 3) & 0x3FF
decodedB = (encodedValue >> 13) & 0x7FFFFFFFF
decodedA = (encodedValue >> 48) & 0x7FFF

The magic numbers —

There is a very high probability that you must be wondering what is that Hex value and how to arrive at that particular value.

In many programming languages, you can use different radixs like 2(binary), 6(hexa), 8(octal) and 10(decimal) to represent numbers.

Here, we are using Hex to represent the numbers. Why? Because it is easy to represent multiple 1 in Hex than in any other radix systems.

For example, if you want a number which has 4 bits set to 1, then in binary it will be represented as 0b1111 and in hex it is represented as 0xF. If you don’t know Hexadecimal, then binary might be easy for you to write but you can’t write 48 1s in binary. But in Hex you can write it like 0xFFFFFFFFFFFF. Instead of writing 48 1s in binary system, we write 12 Fs in hexadecimal system.

So, if you don’t know how hexadecimal system works, you need to learn about that. It’s an interesting topic of its own.

Want to contact me? Contact me on Twitter.com/@SkrewEverything.

Found any mistakes? Comment down below.

Liked it? 👏👏👏 it and Share it.

--

--

Skrew Everything
From The Scratch

A wannabe artist 👨‍🎨, but can’t draw 😫. A wannabe athlete 🏃‍♂️,but can’t run 🥵.Found my peace with coding 👨‍💻 and writing ✍️. Twitter.com/SkrewEverything