Developing a Progressive Fetch YouTube Downloader

Hola JS lovers, after a long while☺️. Lately, I’ve been itching to develop an Electron App for downloading YouTube videos in performant way by using streams rather than keeping everything in memory and flushing at once in the end.

The technologies I’ve chosen for it are Electron, because it’s easy to access it as a dmg installed app on mac and VueJS because it’s simply brilliant and I found no reason to not love it. It just hasn’t stolen the thunder of React yet as per me :P

Agenda

To build an Electron app that takes a YouTube video URL and downloads it over the wire chunk by chunk (streaming) and shows a progress bar with a bonus of download speed as well. You can find the github link here

Snapshot of the Final result of YouTube Downloader Electron App

Without much discussion, let’s get into technicalities.

Let’s get started

Saving time is important and hence, the folks at Vue have created such cool scaffolding tools for lazy developers like me namely vue-cli. Just install it globally using npm or yarn.

npm install -g @vue/cli @vue/cli-init

And for creating project, simply run

vue init simulatedgreg/electron-vue youtube-downloader-app

I’m using this electron boilerplate for easy setup. You can run your electron app using

npm run dev

If you wanna skip electron part, you can simply initialize your project using

vue create youtube-downloader-app

Creating Main Component

Time to create our component as Downloader.vue inside the components folder. Markup will be written inside the <template> tag which is normal html (or you may choose pug as well) interspersed with Vue interpolation syntax as {{variableName}} or attributes bindings as <img v-bind:src="video.url" />

For styling, I’m using SCSS with some styles copied from codepen for Material UI kind text input box.

Single File Vue Component

Now, the approach is to get YouTube url as an input from the user which looks something like this https://www.youtube.com/watch?v=videoId&list=listId&foo=bar

All we are interested in is the v=videoId part. So, we’ll create a little validation utility as

const validateUrl = url => /^(?:https?:\/\/)?(w{3})?\.?youtube\.com/.test(url);

Moving ahead, we need to get a download link of the youtube video which is not very straight forward. But as I’ve researched, there exists a legacy api that lets you retrieve all the video information from it’s videoId as follows:

http://www.youtube.com/get_video_info?html5=1&video_id=d8NmkSQOdTc

If you hit it in your browser, you’ll observe a text file getting downloaded as

This is nothing but a very long query string which doesn’t seem legible at all in the first glance but it’s just a combination of key value params concatenated with ampersands. Also, the values in the key value pairs are URI encoded which we need to decode as well. Here’s an implementation for the same.

Now, let’s try to fetch the video info using the same url through AJAX using fetch api.

fetch('http://www.youtube.com/get_video_info?html5=1&video_id=d8NmkSQOdTc')

Soon, you’ll realise that you intercept a CORS Error

CORS Error

What this means is that youtube.com doesn’t allow it to be loaded in other web apps running on the browser. In other words, it hasn’t whitelisted your http://localhost:9080 to fetch resources from http://www.youtube.com This is called CORS (Cross Origin Resource Sharing) that is implemented on web apps to prevent unsolicited access to its resources.

Creating a proxy server

As I said earlier, this api of get_video_info is sort of deprecated, the only easy way to tackle this is create your own little server and proxy this url through it.

Here’s a Node server implementation which we are running on http://localhost:8082 and we’ve allowed all the other origins to access it by setting response header as

res.set("Access-Control-Allow-Origin", "*");

Hence, we won’t face an error now if we try to access localhost:8082 from our localhost:9080

Going forward with our downloader.js util we’ll now hit our own localhost:8082 than hitting youtube host directly. This will bypass our cors error.

We’re just validating the url and grabbing the videoId from it and passing it to our express server route which just relays the response it gets back from the youtube server to our client app. And then we parse it through our qsToJson function we created earlier.

If you apply a debugger here or log out qsToJson(res) you’ll find the video download links hidden in a property called url_encoded_fmt_stream_map which is exactly what it says. It’s a map/object of various formats of that video and their corresponding streaming urls but it’s an encoded string in itself, hence we’ll need to apply our qsToJson functionality on top of it to reveal what’s inside it.

Cool, this is what we’ve been chasing after for long. We found the urls where the videos are hosted. Yet again, if we try to fetch those urls using Ajax, we’ll face the same CORS error because *.googlevideo.com has probably only whitelisted youtube.com, hence allowing only network requests from youtube.com host. So, same way as we did for video_info api, we’ll create a proxy route in our express server.js to fetch it.

You see what we’re doing here, encoding the youtube video download url and passing it in the query param as ?url=... and after receiving the first chunk on the network, we’re doing response.write(chunkData) instead of response.send(data) Using send method flushes the data at once when the whole response body is available. What this means is, for smaller resources like tiny videos or images or text, it’ll work fine but when it comes to say some resource in huge sizes of MBs or GBs, it’ll load whatever response will be there for those resources in memory, hence slowing down your system where the server would be running.

Better approach would be to stream the data from chunk emit event progressively than sending the body entirely at once by using response.write(chunkData) In the end, when the end event is fired, we can terminate the response stream by calling response.end() In other words, you can conclude that

response.send = response.write + response.write + ... + response.end

Consumption on Client Side

Now, we’ve built a stream source for our video on the server. We need to consume it progressively chunk by chunk on the client vue app.

Using fetch api the way we do normally would suffer from the same problem of overloading memory with response data. Hence, we’ll need to read the fetch api response as a stream as follows:

It works like this

/* Fetch the resource */
fetch('https://picsum.photos/4000/2000')
/* Retrieve its body as ReadableStream */
.then(response => response.body)
.then(body => {
const reader = body.getReader();

response.body is a getter which exposes the body contents as a readable stream. It’s not an asynchronous operation, hence we don’t need to chain it actually in then constructs and can re write it as

fetch('https://picsum.photos/4000/2000')
.then(response => {
const reader = response.body.getReader();

Now, we have our response body reader available, we can read chunks out of it one by one using read() method which returns a promise with a resulting variable consisting of two keys as done and value

reader.read().then(({ done, value }) => { ... }

value key gives you the chunk value at that moment and done indicates whether it’s end of the stream and there are no more chunks available to be read.

Please mind that all we’re doing here is trying to tap into the response handling of the fetch api. In the end, our fetch construct has to provide a full resource as a blob (in this case) to used in our app.

Hence, we need to create our own custom readable stream with its source as the reader body chunks emitted periodically and then create a Response object out of it which takes a stream argument in it constructor. You can observe this as way of again proxying or tapping into our response body since we need to get finer control on the way our response is being downloaded by publishing it’s downloaded percentage and speed on our UI.

ReadableStream constructor takes an object containing start method that defines the source of the stream actually. The start method provides a controller argument that’s used to control the stream operation by enqueuing the contents into it’s internal queue.

We recursively read the contents of the our reader until the value of done is true. When done is true we indicate the termination of our ReadableStream by calling controller.close() and returning from our read function.

For calculating the download progress, we need to know the content-length response header value which gives us the whole length of the data to be downloaded. On reception of each chunk inside read() promise, we increment our loaded variable with the chunk byteLength and calculate the percentage simply as (loadedBytes/totalByteLength) * 100

For the download speed over a time of Δt as bytes per second, we can find it as

speed = Δd/Δt      // Δd is the data loaded over the time Δt

We maintain the timestamp before processing reading of our next chunk and when it’s read inside the promise body we calculate the time diff in seconds. This time diff could return 0 seconds due to fast network speeds. Hence, having it in the denominator can give ∞, so just apply a little check.

Now that we’re approaching completion of our requirements as download progress and speed in place, we need to pass our ReadableStream into Response object construction in order to consume it as

   .then(readableStream => new Response(readableStream))
.then(response => response.blob())
.then(blob => URL.createObjectURL(blob))

Chaining these operations one by one as making a Response object and then calling blob() method on it which is async and returns a promise by reading the response stream to its entirety and finally calling URL.createObjectURL which creates a url for the specified blob that can be passed to methods that expect a url.

In our case, we need to trigger a download. One way to do this is y creating an anchor element in the document without actually appending it in the DOM and setting it’s href to the url obtained above and triggering a click on it as

And, that’s it!

Gotchas

Please be mindful that we’re using a deprecated or legacy api for getting youtube video info. The purpose of developing this app is to learn some of the cool concepts of playing with streams and not distributing it or commercialising it.

Also, some of the videos on youtube like Vevo videos don’t support download, hence you may not able to download each an every video. The url_encoded_fmt_stream_map property simply returns a blank object in those cases sometimes. That’s just some restrictions imposed by YouTube. Apparently, youtube also tracks the origin IP address for an incoming download request probably. So, be watchful of downloading the same video over and over.

References

Hope you enjoyed the article. Let me know your thoughts on this.

Happy Coding 😀🤓

UI Engineer at Swiggy, Ex-Flipkart. JS enthusiast. Cynophilist. GrammerNazi. Environmentalist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store