ES2015 Quickie: Binary and Octal Literals

Kerri Shotts
Aug 22, 2017 · 5 min read
Retroputer having a bad moment (this is what executing random code does!)

So, you might recall a story I wrote last year about a faux emulator I’m writing called Retroputer. Due to the nature of retro emulation, working with bits and shifting them around is something that the code has to do quite a bit of, and I’ve enjoyed being able to use the new literal forms in ES2015.

In ES5, we could write numbers using these formats:

  • Decimal (base 10)
  • Hexadecimal (leading 0x)

In ES2015 we get the following new formats:

  • Octal (leading 0o)
  • Binary (leading 0b)

Of course, ES2015 is still limited in its signed integer representation, which means the octal and binary literals are similarly constrained. Even so, they are incredibly useful.

Using Binary Literals

Let’s imagine a simple program (in a faux assembly language) for an imaginary computer:

LD A, 0x02      ; A = 2
LD B, 0x03 ; B = 3
ADD A, B ; A = A + B
ST [0xFF], A ; Memory at address 0xFF gets the value of A

When stored as a text file, this program might appear small, but it’s large for what it does. We can do so much better by encoding this program into a smaller representation. When memory was exceedingly expensive, it was important to store information as compactly as made sense (with consideration given to how expensive it might be to unpack the information). So let’s imagine that we have a faux machine language where the instruction and its operands are packed into smaller representations of just a few bytes (and sometimes only one byte).

It’s pretty easy to understand what our simple program does — it just adds 2 and 3 together and stores the result into the highest byte of memory (in this case, we only have 256 bytes). How might this be encoded into a more compact form? It might look like this:

(   HEX   )       BINARY           DISASSEMBLY
----------- ------------------- ------------
(0xC0 0x02) 1100 0000 0000 0010 LD A, 0x02
(0xC4 0x03) 1100 0100 0000 0011 LD B, 0x03
(0x41 ) 0100 0001 ADD A, B
(0xD2 0xFF) 1101 0010 1111 1111 ST [0xFF], A

As you can see, our little program turns into seven bytes of code. That’s pretty good!

Now let’s imagine that we want to write an emulator that decodes these instructions and executes them. How would we decode these instructions? Easy — we can mask out certain bits and preserve others using JavaScript’s bitwise AND operator (&), and then shift bits as needed using JavaScript’s bitwise shift operators (<< and >>). I’m not going to get into just how that all works, but the useful part is this — when we need to specify a mask, previously we would always have to work in decimal or hexadecimal. That’s not hard, but it can be easier to visualize what’s going on by using a binary literal.

Warning: JavaScript’s bitwise operators truncate anything beyond 32 bits!

For example, compare:

if ((opcode & 0xF0) === 0x40) { // ADD dr, sr instruction
dr = (opcode & 0x0C) >> 2;
sr = (opcode & 0x03);
regs[dr] = regs[dr] + regs[sr]; // Simplified greatly! ;-)
// normally you'd do more,
// like setting flags
}

to

if ((opcode & 0x11110000) === 0x00100000) { // ADD dr, sr
dr = (opcode & 0b1100) >> 2;
sr = (opcode & 0b0011);
regs[dr] = regs[dr] + regs[sr];
}

Note: Of course, you’d actually want to use constants instead of the magic values I’ve used in this example.

Personally I find using the binary literals a bit more readable — as long as they aren’t too long. Beyond a certain point, my eye just blurs over everything and using something like hexadecimal is actually easier to read.

That said, you might want to check out and evangelize this stage 1 proposal so that we could use underscores in our literals as separators. While for decimals we’d almost certainly use them for thousands separators, with binary we could group in whatever form made sense. For example:

0b1111_0000   // easy to see that the four highest bits are 1s
0b0000_11_00 // here we're grouping by the upper nibble and then
// splitting the lower nibble

In this case, longer binary literals with separators would still be pretty easy to read and grok.

Using Octal Literals

If you’re in any way familiar with Linux or macOS (and other Unix-likes), you’ve probably encountered the permissions system, which is specified as follows:

owner   group   other
rwx rwx rwx (read, write, execute)

These are often represented as octal numbers. Here are some examples:

own grp oth
rwx rwx rwx
100 100 100 = 0o444 = A file readable by everyone
111 000 000 = 0o700 = A file only I can read, write, and execute
111 111 111 = 0o777 = A file everyone can read, write, and execute
111 101 101 = 0o755 = I can read, write, exec, but everyone else can
only read and execute.

Just like with binary literals, these octal literals are useful when masking portions of an integer in order to determine which permissions are set:

const ownPermissions = (permissions & 0o700) >> 6,
grpPermissions = (permissions & 0o070) >> 3,
othPermissions = (permissions & 0o007);

Parsing Literals

parseInt accepts a radix argument, which lets us specify which base we’re parsing, but it’s also incredibly easy to forget. For example, what might 010 parse to? It depends on the radix, but if the radix is omitted, it might be parsed as an octal, or it might be parsed as decimal. (The actual outcome is implementation-specific).

Furthermore, if we use parseInt with map, it is all too easy to forget the radix, and have a spectacularly unexpected outcome:

["42", "19.2", "0xF3", "0b011101"].map(parseInt);
// [42, NaN, 0, 0]

… which is so not what we wanted!

Turns out, there’s a better way, and we’ve already had it for a while: Number. Number provides a lot of useful static methods for working with numbers, but it’s possible to use Number to convert a value into a number using very specific rules. ES2015 just extended it to provide conversion support for binary and octal literals.

For example:

Number("42")     =>     42
Number("12.4") => 12.4
Number("12.5e3") => 12500
Number("0x10") => 16
Number("0b1100") => 12
Number("0o10") => 8
Number("010") => 10
Number("") => 0
Number("hi") => NaN

In short, the rules here are very simple: if no radix prefix exists, a string is always parsed as a decimal. If a radix is specified, the string is parsed as hexadecimal, octal, or binary as specified. A radix must be 0x, 0o, or 0b; 0 doesn’t count, and so 010 will be converted as decimal 10.

What about other values? Well, numbers are passed through as-is, and text that can’t convert to a number is given as NaN, unless the string is blank, in which case the result is 0. true becomes 1, and false becomes 0. undefined becomes NaN, and null becomes 0.

So, instead of using parseInt in your code, you can use Number. It works especially well in our failed parseInt example:

["42", "19.2", "0xF3", "0b011101"].map(Number);
// [42, 19.2, 243, 29]

Ah, much better!


)

 by the author.

Kerri Shotts

Written by

JavaScript fangirl, Technical writer, Mobile app developer, Musician, Photographer, Transwoman (she/her), Atheist, Humanist. All opinions are my own. Hi there!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade