It’s all just 0 and 1: About Bit and Bytes

Jeremy Buisson
CodeX
Published in
7 min readSep 13, 2021

--

We often hear or say that computers are just a bunch of 1s and 0s. This series of short articles try to explain what it means!

How we imagine computers using 0s and 1s

This article is the first part of a series; you can find the others here:

After the publication of the first article, I’ve created a small website here to quickly play around with Binary conversions, you can also check the source code from Gitlab here.

PART 2 — About Bit and Bytes

Nowadays, computers and digital information are so present in our day-to-day life that we don’t even notice the usage of the Binary system. Still, we all used the terms Bit and Bytes. Would it be to buy a laptop, choose your new smartphone or compare your mobile phone plan. Most of the time, even without knowing what it is, we understood what it means.

The Bit is considered the most basic unit in computing, and the word comes from the contraction of “Binary digit”. Each digit in a Binary number is a Bit. It can either be 0 or 1.

Even if being a very widely used unit in computers and digital communications, it isn’t part of the International System of Unit. Nevertheless, it is commonly used in the same metric way, 1.000 bits would be written 1kb (one kilobit), 1.000.000 bits would be written 1Mb (one megabit), etc.

The Byte is considered to be a unit of digital information equal to 8 bits. Its number of bits changed during the early days of computers, with different companies and technologies doing as they needed. With more modern communication appearing, the need to define a standard quickly becomes inevitable, and the 8-bit Byte imposed itself as being the most commonly used.

As the Bit unit, the Byte isn’t part of the International System of Unit but still commonly uses its rules. When the Bit uses the lower “b”, the Byte uses the capital “B” as its symbol, 1.000 bytes would be written 1kB, 1.000.000 bytes would become 1MB, etc.

The kibibyte is actually the mathematically correct binary unit used by experts instead of the kilobyte used by the general public. A kibibyte is precisely 1.024 bytes, while a kilobyte is only 1.000 bytes.

You may have already noticed, but this is something you may actually use more often than you would have thought! When you want to buy a new computer, most people look for its storage capacity or memory (RAM). You may also look at the CPU, and don’t worry, this is also linked to 0 and 1, but we’ll discuss it later in this series of articles.

When we talked about a 256GB (Giga Bytes) hard drive, it just means that it can store 256.000.000.000 Bytes. As a single Byte is composed of 8 Bits, it can persist 8 x 256 x 10⁹ different values of 0 and 1.

It’s the same for a computer’s memory. When you have 16GB of RAM, your computer can basically store up to 8 x 16 x 10⁶ different values of 0 and 1 in its Random Access Memory.

You can even check this right now! If you’re on windows, open your Task Manager (right-click the taskbar at the bottom of your screen). Expand it to see more details. You’ll see under the Memory tab how much each process uses your memory, the amount of data it reads on the storage, and even the amount of data used on the network.

Task Manager on windows 10 showing how many Bytes each process uses.

In the same way, if you played with some old video game console like the NES then Super Nintendo, or Sega Megadrive, you must remember we called them 8-bit systems, 16-bit, 32-bit, or 64-bit for the N64. Well, you guessed right, this is the link to the system of such console being able to use a more significant chunk of memory, the more bits, the best was the console’s capabilities!

As we’ve seen, the Binary system is in fact widely used in our day-to-day life, represented as Bit and Bytes, and present in all modern systems, but how is it used exactly?

Binary in practice

One of the most common things we can do on a computer is to store text files. But before being able to write or read a text file, we need to quickly discuss filesystem and encoding.

A filesystem is a system for organizing and filing on a storage medium the structure of the writing, searching, reading, storing, modifying, and deleting of files in a specific way. The most common filesystem is the NTFS (New Technology File System), introduced by Microsoft in 1993 for Windows. When you create a file, the NTFS system will store all the information relative to this file (except its content) in the MFT, for Master File System. Here it will store amongst other metadata, the name, the type, the length, the permissions, the creation date, or where on the physical storage to find its content. On Windows, you can easily find this information by doing a right-click and following the “Properties” option on a single file.

The encoding could be presenting as a mapping between binary codes, and their alphabet characters equivalent. There are multiple encodings, like ASCII, UTF-8, or ISO-8859–1 because there are multiple characters set. The most basic one, the ASCII for American Standard Code for Information Interchange, is originally built for the English alphabet and contains 128 characters from which 95 are printable. These include the digits 0 to 9, lowercase letters a to z, uppercase letters A to Z, and punctuation symbols. Non printable characters are also called control characters, they give information on how to display the content of the document, amongst them we have the white-space, the line-break, page-break, … The ASCII is a very limited encoding, as I won’t be able to write my own first name using it, not including accents like the “é” letter: Jérémy. More modern encoding, like UTF-16 for 16-bit Unicode Transformation Format, allows up to 1.112.064 characters using the Unicode standard, which supports 154 modern and historic scripts, as well as symbols and emoji, like 🦄 or 🤘.

Text as Binary

Now that we know a bit more about filesystems and encodings, we can eventually write some text in a file and read its content as binary. Let’s create a simple TEXT file (.txt) containing: “Binary ! 🤘” (download this example file here).

Most operating systems will consider this file to be UTF-8 by default, being the most common encoding, and will so use the UTF-8 characters table to display the content:

A simple text file is displayed by Notepad on Windows 10. Note that depending on the operating system or even software you are using, the emoji could be different.

But if we read this file using a Binary reader, either with an extension in VS Code like Hex Editor or directly using this small website I’ve created here, (source code available on Gitlab), you’ll find the following binary content:

The Binary equivalent of “Binary ! 🤘”

Using the ASCII table, we can confirm the Hexadecimal value of each symbols. As said before, the ASCII table only covers 128 characters, from the Hexadecimal value 0 to 7F. Anything outside of this table will mean that it’s a non ASCII character. In our example the emoji needs 4 bytes to represents itself. The first one is F0, which means “I’m a 4 bytes symbols”, and all the others, combined uniquely represent the 🤘 emoji.

But you don’t want to save just simple text files on a computer. For the system to understand how to read a file we can provide what we call a file header or file signature. Doing so, we add some extra Binary numbers at the beginning of the file. This sequence of bytes, inform the system or a software reading its content, how to proces it. For example, to explicitly say that our file is an UTF-8 text file, we add 3 bytes: EF BB BF (download this example file here).

The first bytes are used to determine the file type, here the 3 hexadecimal EF BB BF values mean UTF-8.

There is a huge list of file signatures, and it’s in constant evolution as more and more file types exists. The most complete list I’ve found so far is from Gary Kessler and can be found here. I’ve tried to add as many as I can in my simple binary reader, but with thousands of them and only 2 hands, it’s far from complete. Feel free to play around anyway!

Just to give you a last example, I’ve created a very simple Image file in Bitmap format (download this example file here). For a system to understand it’s a picture to display (and not a text file), the Hexadecimal signature is 42 4D, which is “BM” decoded in ASCII.

A simple picture using the Bitmap format

If you look at the Bitmap format definition, you can see that the 3rd byte is the size of the file 8A in Hexadecimal is equivalent to 138 in Decimal, meaning our file contains 138 bytes.

Don’t hesitate to play around using the binary reader here, and feel free to add any questions or remarks in the comments, I’ll do my best to answer!

I hope you found this second article of the Binary series interesting. In the next one, we’ll discuss new math operations related to the Binary system and what they are used for.

--

--