️Bits on the Move

Nikhil Mahajan
6 min readDec 25, 2023

--

Hey there, tech adventurer! 🚀 Get ready for a byte-sized thrill ride through the wild world of streams, chunks, and binary. Let’s dive right in and make your data dance! 😉

We all know at the end of the day, computers only understand 0s and 1s (two states basically high or low voltages 🪫). Now, when we want computers to remember the data (collections of 0s and 1s) for later, we store them in files. Hence, in the fundamental sense, files are collections of binary digits. Whether it’s an image, audio, video, text, or any other data type, the interpretation of those binary values determines what the data represents.

So, hypothetically, you can end up creating an interpreter that interprets the binary data of audio in textual form (a bunch of owdrs with no meaning). [Refer]

When playing an audio file on your computer, you use a specific software to interpret that binary data. The music player interprets 0s as low energy (low amplitude) and 1s as high energy (high amplitude). Rest is the physics of membrane vibrations that we end up listening to music.

Every color can be created from the combination of RGB (Red, Blue, and Green called primary colors). So, when we open the image in the computer, the image viewer interprets the binary data. 111 instructs the computer to “light all three RGB bulbs at the pixel”, 110 instructs the computer to “light Red, Blue bulbs at the pixel”, and again, the rest is the physics that when thousands of such pixels group together, we end up seeing a beautiful images.

Therefore, two things are clear, ~ It is how software interprets the binary data that we end up having different things. ~ Files are nothing but the collection of binary data in memory. [Refer]

Technical Jargons

Before moving further, let's be familiar with the following technical terms.

Data formats

In JavaScript, binary data is typically stored in a Buffer object, and in Python, it’s commonly stored in a bytes object. These are the standard ways to store and manipulate binary data in their respective languages. [Refer]

HTTP Content types

The content-type is an HTTP header field that specifies the media type or format of the data in an HTTP message, whether it’s in the request or the response. It informs the recipient, typically a web browser or a web server, about how to interpret and process the content of the message. Following are the common content types:

  1. Images: content-type: image/png, content-type: image/jpeg
  2. Videos: content-type: video/mp4
  3. Audio: content-type: audio/mpeg
  4. The application/ in content-type is used for data that doesn’t fall into the text, image, audio, or video categories. It’s a general-purpose category for data formats. Example: application/pdf, application/zip.
  5. The text/ in content type is used for textual data that can be interpreted as human-readable text. Example: text/html, text/plain, text/css, text/javascript, text/csv.
  6. Binary data: When a server responds with content-type: application/octet-stream, it’s essentially telling the client to handle the data as raw binary without trying to interpret it as text or any other specific content type.
  7. Form Data: multipart/form-data is used when you need to submit forms that include file uploads. This allows for the transmission of binary data (files) along with regular form fields. If you have only text data associated with forms content-type: application/x-www-form-urlencoded is suitable.

[Refer]

Encoding

Encoding refers to the process of representing data, such as text or binary information, in a specific format or scheme that can be stored, transmitted, or processed by computers or communication systems.

  1. Base64 Encoding: A binary-to-text encoding scheme that represents binary data as a string of ASCII characters, primarily used for encoding binary data in a text-based format. Example: Copy and paste data:text/plain;base64,VGhhbmtzIGZvciByZWFkaW5nIHRoZSBibG9nIDopCkF1dGhvcjogTmlraGlsIE1haGFqYW4KQG1kZ3NwYWNlIA== in the browser. It is the base64 encoded form of a text. I first tried sharing the base64 encoded form of the binary data of an image (which image? 🫣), but the string is so big that it crosses the maximum word count limit of Medium.

⚠️Pitfall: You might think that if base64 encoding converts binary data into a string so you can store the files directly in the text field in your database table. But the problem with this approach is that storage required to store encoded string is much higher than raw binary data. Base64 representation of “0110000101100010” (2-byte) binary data is “YLI” (3-bytes). The size will grow exponentially with increasing binary data size.

[I don’t like the markdown of medium 😤]

  1. URL Encoding: A method for encoding characters in URLs to ensure that special characters do not interfere with the structure of a URL. For example, spaces are encoded as %20. Example: Original URL https://example.com/query?name=John Doe&age=30 Encoded URL https://example.com/query?name=John%20Doe&age=30.
  2. UTF-8 encoding is used to represent text characters from various languages and scripts. It is a variable-length encoding that uses between 1 and 4 bytes to represent a character.

Decoding is the process of converting encoded data back into its original form so it can be understood and used as intended. Decoding is essentially the reverse of encoding.

Streaming

  1. Chunking is a technique used to divide a large piece of data into smaller, more manageable pieces or “chunks.” Each chunk is a subset of the original data.
  2. Streams: It’s an abstraction that allows data to be processed piece by piece as it becomes available, rather than loading the entire dataset into memory at once. Streams are commonly used for reading from or writing to files (Input/Output I/O Operations) and network sockets.

Demonstration

Alright, it’s time to get our hands dirty with some coding! We’re going to create a simple backend in Node.js and a vanilla JS frontend. The idea here is to allow users to request the file, upload the file, and have the backend store them. We will not use any third-party package, not even express 😮. Let’s dive in and start building this!

Backend

  • Setup nodejs server
  • We can read a file in Node.js using the fs (file system) module to create a readable stream and then pipe that stream to an HTTP request using the http module. This simply means that we are reading the file in small pieces (chunks) and sending these small pieces in the response simultaneously.
  • We are extracting the content type from the header. The file extension is necessary so that the computer can suggest the appropriate software to open the file.
  • I’ve got a task for you — can you try how to send the requested file in the response?

Yeah, that’s all for the backend. Pretty simple, right?

Frontend

Alright, so, we’ve got the backend sorted, but now we need to tackle the frontend part. We want to make it so we can send and receive the file in chunks. Let’s dive into that!

  • Setup index.html
  • Receive the chunks as they are streaming from the backend and make the binary data available for download through a link.
  • Send the file

With this, we are done with our basic Bits on the Move application.

Sign Off

Binary data might seem a bit strange, but it’s the backbone of our digital world. Don’t worry, you usually won’t have to mess with it directly, thanks to helpful libraries. So, with that, it’s time to say goodbye!

Further Readings

  1. Buffer in nodejs
  2. Blob storage
  3. Nodejs Streams
  4. Browser Fetch API

--

--