Creating a Saavn Downloader Bot for Telegram — Part 2

Pranav Gajjewar
5 min readMar 13, 2018

--

In this part, we will see the problem with Saavn’s URL obfuscation mechanism. And exploit it to gather direct download links for any song.

  1. How Saavn encodes URLS?

If you visit Saavn.com, open the page for any album and examine the source code, then you can observe how Saavn loads the songs info into the HTML page.

Generally such a task is done asynchronously to load dynamic content but in case of Saavn, all the info about the songs is embedded in the source code itself.

To demonstrate, if you go here and examine the source code, then you will find numerous HTML tags like these. (I’ve cleaned the code a little for better readability )

<div class="hide song-json">
{"title":"Boond Boond Mein",
"album":"Hate Story IV",
"perma_url":"https:\/\/www.saavn.com\/p\/song\/hindi\/Hate-Story-IV\/Boond-Boond-Mein\/Qw0-RkJcW1Q",
.
. "url":"NMKyboFo\/FhGPhYGl6xqPardID+JcIcVlIgK85q\/awGqOy285yGG9GE+zsSf5\/iO",
"songid":"3eTw6llg",
.
.
}
</div>

So as you can see, all the information about the songs is written in the source code itself.

The most interesting part of the above information is the url field.

"url":"NMKyboFo\/FhGPhYGl6xqPardID+JcIcVlIgK85q\/awGqOy285yGG9GE+zsSf5\/iO"

This is what we’re after. But this doesn’t look like any URL link at all. What can we do with this URL string?

The way Saavn ideally works, the above URL field contains the encrypted URL for that song. And this URL string is decrypted using their algorithm and the URL for the actual file on their servers is obtained. This file is then played on their app or website.

2. Problem with Saavn URL encryption :

I think it is pretty obvious that you would not want other people to know about how the URL is decrypted. Unfortunately for Saavn, they messed up. Since they have to decrypt the URLS to be able to play the songs, the URL is therefore decrypted on the client side i.e in the app and on the website.

Armed with this knowledge, it is only a matter of analysis to figure out how exactly the decryption occurs. And this is what happened. One Github user figured out the decryption method.

You can find the original analysis and its breakdown here.

I sent a total of 3 mails i suppose to their only email id that i could possibly find from their website and app — “feedback@saavn.com”, along with the PoC but in return i got no reply, which shows that they don’t check their mails, God bless those customers, or they don’t believe me.

On 30th October 2016 i sent them my first mail and till the date of writing (6th December 2016) i didn’t receive any mail from Saavn, so i thought to do a full disclosure along with PoC.

The person also reported the issue to Saavn. Since they don’t care, why should we?

Next, we will use this decryption method to gather download links for our songs.

Building a Scraper and Downloader :

We will use the decryption code from the original analysis of Saavn loophole and build our own Scraper and Downloader.

  1. Gathering song information:

For scraping, we will use a very popular Python module known as BeautifulSoup

To install this module, use the command — pip install bs4

We will also need the requests module to make HTTP requests. So go ahead and install that too.

From the above example where we discussed the Saavn webpage source code, we saw that the song information is present in a <div> tag and it is present in JSON format. It is a simple thing to load the song info —

from json import JSONDecoder
import requests
from bs4 import BeautifulSoup
json_decoder = JSONDecoder()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0'
}
def get_songs(url):
r = requests.get(url)
soup = BeautifulSoup(r.content.decode("unicode_escape"))
all_song_divs = soup.select('div[class="hide song-json"]')
songs = []
for i in all_song_divs:
song_info = json_decoder.decode(i.text)
songs.append(song_info)
return songs
songs = get_songs(input("Input URL: "))
for i in songs:
print(i['url'])

If you run the above code for any album link on Saavn, then you should see an output like this —

NMKyboFo/Fj+fDcCIDWbmCWWp2A3//RoILw8Wdei0eNjiqQyCinjRjvyLEde7/s9
NMKyboFo/FgZ+/Zzj8X5E/TQpIdAHGfBjZJLDBxgfrJR8ysOuLoZHQNGoknJjaqa
NMKyboFo/FgJgsxQpq8K14vtJGURMzUQAjZGehZJLbDlI71AVEC3F6jo+5DC2fmf
NMKyboFo/FjYP0H/vdtPr7aR7R/NRvsVwN01byBtYAxKIAFq3RDhc0QI41bR4FUx
NMKyboFo/Fi/cA5i1Se3RP6WQI+KKn+gp07vJnwlz66GlaSn0jLIPhftNSe1tFrY
NMKyboFo/FjIJzyLaURJlAP2Xesb3aZTNaPXzbd//W1jSk51aucHDv4tPdyWZhRc
NMKyboFo/Fi/cA5i1Se3RM45Vvm9NQdzXcWI94lKG78yoyv8GfoaYrO455oaMejR
NMKyboFo/FgFeu7RYx34PZxPScC9GXRKA733NxOLZmON+6cowgF3Zd9I5tOVLaHz
NMKyboFo/FiWR8cRSio6UsLUYRfodscNMaUX978n9ERavbn0zS2qMO6c2ZAZkiOS
NMKyboFo/FiC56FwJ5eEMJxF6Jou77hQwdrhUtAOh/XPq+OCPALS86rHVmd6GiKM
NMKyboFo/FhObrlpnNwO0nlQc6CoSHA6PRGJapCGC4ZcMiy14Fli70hmazpW/sl0

These are all encrypted URLS for all the songs in the albums.

Next up, we’ll be decrypting these URLS to get actual download links.

2. Decrypting URLS:

I could figure out that it was using DES with ECB with no IVs.

If you have no background knowledge about different cryptographic techniques, then you don’t have to bother with the explanation.

Understanding how the encryption and decryption works is beyond the scope of this post. We will only see how to use this code to decrypt our URLS.

To use this decryption, you need the module pyDes installed on your machine. Or somewhere in your project directory.

Do the usual — pip install pydes

Or better yet, download and save this file as pyDes in your working directory.

from pyDes import *
import base64
des_cipher = des(b"38346591", ECB, b"\0\0\0\0\0\0\0\0" , pad=None, padmode=PAD_PKCS5)
base_url = 'http://h.saavncdn.com'
def decrypt_url(url):
enc_url = base64.b64decode(url.strip())
dec_url = des_cipher.decrypt(enc_url,padmode=PAD_PKCS5).decode('utf-8')
dec_url = base_url + dec_url.replace('mp3:audios','') + '.mp3'
return dec_url
urls = []
for i in songs:
urls.append((i['title'], decrypt_url(i['url'])))
print(decrypt(i['url']))

Note that the songs list is the song information we retrieved in the above part.

Running this code will get you the following output —

http://h.saavncdn.com/493/ce49ccc5221d814fbdd61c69a40e0b15.mp3
http://h.saavncdn.com/732/9c793c4172fd78b0261d44cc56791896.mp3
http://h.saavncdn.com/607/a4a802eb8166cfeaca911aa5f3ae1b5a.mp3
http://h.saavncdn.com/607/e9b00c927bb2cd5cd42f30f2b0f0a245.mp3
http://h.saavncdn.com/607/6052c9bb691d6c9e56313fbb1bb50b3c.mp3
http://h.saavncdn.com/607/39f226f9045275e0d286d55147d40e0f.mp3
http://h.saavncdn.com/607/6c26f1b4c64b783e86a3f4765be4e369.mp3
http://h.saavncdn.com/607/da05782db7dabbf3cd355e2460d00bea.mp3
http://h.saavncdn.com/628/507409d1e9bf92911f65bee5b82c7bb0.mp3
http://h.saavncdn.com/022/f428ebf540d9a6689513c9e397a5bdc7.mp3
http://h.saavncdn.com/022/753701a191f3ebd035f41adf5b44bcad.mp3

And that’s it! These are the direct download URLS for our songs.

3. Downloading Songs :

If you tried to download these songs using some external downloader, you will get a bunch of .mp3 files with coded names. This isn’t helpful at all and you don’t want to have to rename each file individually.

So a better way to do this would be to download these files through our program itself. You can do this using the wget module for Python.

Install the module the usual way — pip install wget

import wget
from urllib.parse import unquote
for i in urls:
wget.download(i[1], i[0]+'.mp3')

This will download all the songs in your current working directory.

Note : These downloaded files still the contain the DRM. To remove the DRM, you use the explanation and bash script given by the person who did the original analysis. You can find both of those here.

In next part, we will equip our Telegram bot with functionality to download these songs.

So stay tuned!

--

--

Pranav Gajjewar

Software Engineer who loves digging into code. Best way to learn something is to teach it to someone else.