Exploring Video Streaming #1

Sanil Khurana
4 min readJan 15, 2022

--

How it started

A few days ago, in a conversation with my friend when we were discussing technology, we both realized how none of us had any idea of how video streaming works. How do platforms like Youtube, Netflix, Prime Video stream video? How do they detect and switch resolutions automatically based on our internet bandwidth? How does live streaming work?

Sure, the underlying fundamentals are the same, send data to a server over a protocol, store the data somewhere, and then send it back to whoever wants to watch the video. But the specifics were a mystery. What protocol would you use to send this data? What type of database is storing this data? Is the data being stored as flat files? How is this adaptive resolution switch happening(when youtube automatically switches you to 1080p once it detects you are on a fast network)? What is the system architecture like? How are these files distributed to clients? and so much more…

So, I decided to take some time to list down my questions, try to demystify some of this, and see where that takes me.

Just a side note, if you want to build a simple audio/video calling app, this might not be the right series of articles for you. Platforms like Agora are much easier to work with and provide far better performance and reliability than a custom solution ever would. However, if you would like to explore an interesting piece of technology that we are surrounded by, delve into how Netflix, Youtube, and the like work, then you are at the perfect place!

Asking the questions

Which protocol is used to stream audio/video?

I know that data is streamed a few bytes(from looking at the gray bar on youtube) at a time. This makes sense since I might skip ahead in the video(and the player would not waste bandwidth loading video that I am not going to watch) or the video player needs to switch to a higher or lower resolution or maybe I don’t watch the entire video. Ok, so there is a continuous data download happening. But what is the protocol on which this data is being sent/received? Surely it cannot be HTTP, which has much higher latency in comparison to more lightweight full-duplex protocols like WebSockets.

After a lot of googling, I found a multitude of standards and ways people stream audio/video.

HTTP Live Streaming

Apparently, one of the most popular, maybe the most popular protocol for streaming videos is built on top of HTTP! It is called HTTP Live Streaming or HLS for short. In fact, this is what websites like Youtube and Netflix use as well.

The idea is that audio/video is sent in small chunks(for example, 10 seconds), and a static file, which is of the extension M3U8, is sent to the client which details information about these chunks, such as the total duration of the video of all the chunks combined, the number of chunks, and links for the client to download these chunks. These chunks can then be requested by the client from the server and played to the user. So the client may play the first 10-second chunk and loads the next 2 or 3 of them(which you notice as the gray bar). As you progress through the video, more and more chunks are loaded. When you skip ahead(beyond the grey bar), the client requests the specific chunks to where you just skipped to and so on.

Not just that, the protocol also supports live streaming and makes it easy to adaptively switch resolutions based on bandwidth. Everything seems so simple, yet so effective.

To further explore the protocol, I decided to experiment with it a little. I set up a very simple server that would stream video content via HLS. The plan here was not to build an actual system but to see how it works.

To start with, I needed a simple MP4 file. I found this one online that worked for my use case.

With my MP4 file ready I had to generate an M3U8 file and multiple TS files(the chunks I talked about before). FFMPEG can help with that! For those that don’t know about FFMPEG, it is an extremely powerful tool that allows you to do a huge amount of operations on audio/video files.

So I simply execute an FFMPEG command to split my MP4 file into multiple chunks and generate an M3U8 file.

Running this command creates a bunch of ts files and an m3u8 file as well.

I can also run all of these video files in VLC and by running the M3U8 file I can see the entire video, so it works!

Let’s set up a simple frontend on which we can stream this video! There is a popular JS library, hls.js that cn help us here.

We don’t need to do anything fancy here, so let’s go with the simple <video> HTML5 tag.

Create an index.html file and add this code-

And that is pretty much it!

Go to the terminal and run the built-in python server from the directory where you put your video, TS and M3U8 files, and index.html, and you are done!

python3 -m http.server

python3 -m http.server and you are done.

Go to http://localhost:8000 and you should see your video starting up but instead of loading the entire video, it loads the segments of the video over time so you don’t waste bandwidth.

This is just a simple idea to get my hands dirty, but I plan on exploring this in much more detail. I want to check out other protocols a bit, how to store this data, how to distribute it and so much more! Once I have some basic idea of what I am doing I might try something fun with all the bells and whistles, like building a video calling application, with features like adaptive resolution as well!

--

--