Latest TV show episodes and where to find them

Hey there!
And welcome to a new episode of “holy shit, this guy’s lazy”.
And in this episode, we’re going to shed light on some of the pesky tasks that a TV nerd has to go through on a daily basis and one such nerd(yours truly) managed to automate such tasks.
This story dates back 2–3 years(coming up with exact dates has a lot of math involved), back when things like Amazon prime didn’t exist and Netflix pricing made you say ‘But why though’.
Either way, i was still living in the college dorm at the moment and i might have solved bandwidth restrictions with some neat little TRICKS of my own invention (shameless plug) but the connection wasn’t exactly stream worthy. Of course you can think ‘But dude you got all that download speed in SECOND EPISODE(yep, still shameless, still plugging inside a make believe quote unquote) while everyone else was capped at 1 Mbps”. In that case, streaming is not really a job for parallel processing so here’s a “BIG NO NO”, a “Go read the second episode properly” and “JUST GOOGLE IT” to you. And even if i did all that, larger bandwidth limits doesn’t really mean 480p to 1080p increment if you can’t stream the JPEG thumbnails on the website.
So what do you do ?
Normally, i would use some website with an episode guide like http://epguides.com/ and keep on checking it for the shows i watched to see which all shows are going to be released today. And then I’d look for the released episodes the next morning(i live in IST timezone) on some really awesome websites for p2p sharing. Not proud but yeah.
And then the number of TV shows started increasing and suddenly a small problem became an hourly routine.
So how do you solve this problem ? Like any other problem, break it into smaller problems.
So 2 step process:
- Check daily for shows to be released today.
- Check if the episode torrent has released or not.
That’s it right ? For Now.
Little did i know, I’d reinvent the term “blowing out of proportion”.
So simple process, simple automation tasks.
Python and its cronies(pip modules) to the rescue:
NOTE: Can’t share code for this one because you know, cyber laws, jail time, family shame and other such unpleasantries.
First problem:
I used a website called pogdesigns.co.uk/cat. Do not remember how i got there but i did and happy ever since. No APIs, no TVDB nonsense. Plain old scraping. So here’s how i went about my business-
- Search the episode guides for episode being released today. The URL pogdesigns.co.uk/cat/today has to be paid a visit by making a GET request using REQUESTS module.
- In response, you get what you see(quite literally), the HTML. Here you have to find the elements that interest you. You can probably just REGEX your way through like a psychopath but might not be as easy as expected. Instead you can parse your HTML using a markdown parser named BEAUTIFULSOUP like a sane person and filter out the TV show names.
- Once you get the TV show and episode number, format them neatly in the format happily accepted by rippers and pirates i.e SxxEyy which means season number xx and episode number yy. so Pilot(first) episode would be S01E01.(Technically, pilots should be S01E00 but i got over it and so should you).
- Create a DB/txt file(meh)/Variable(ughh) where you store the TV shows you want to track.
- Out of all the TV shows releasing today, filter out the ones you are tracking and voila problem one solved.
Second problem is where shit really hits the fan(sorry for the mental image *cringe* )
So after you know what TV shows are getting released today and which episode, you know what torrents to look for.
For instance assume Sherlock Season 4 premiers today, filtered shows list has an element called ‘Sherlock S04E01’ so you search that exact string on a torrent tracking website(Shouldn’t name any but it’s the bay of the pirates that needs treasure maps naming the proxy islands you need).
So you query the website and if any results show up with exact ≥ 70% word match(number of words), and had ‘720p’ or ‘1080p’ in the name(cos i’m so fancy) then it is a candidate torrent. The candidate with the largest number of seeders wins.
WHY 70%, because rippers often like to flex in the torrent names about their codecs and their aliases and cult names.
So you have the torrent, now what ? Now rewind your brains because this shit is about to get ELABORATE
See, torrents are as illegal as the files that are being shared via them.
But can’t really (more like don’t wanna) track every person for all the files to see if they are copyright protected or not. So simple solution, block P2P. There are ways to bypass those blocks but easiest is to download P2P somewhere where access to them is unrestricted and then its like getting the episode from your friend(who lives in a datacenter in the United States and is Virtual)
So a remote server running Ubuntu 14 at that time was bought, rtorrent was installed and setup to pickup all torrent thrown into a specific directory called the ‘watch directory’.
Let us recap, you check the episode guide, find torrents for all filtered TV shows and download torrents on your server where you are allowed to. Problem solved right ?
WRONG, you still have to get the contents from the server.
So wrote another scraper(*shurgs* no big deal) running on my local machine(laptop for normal people, lappy for hipsters/kids/hipster-kids) that downloads any new torrents as soon as they are on the server’s public directory.
Solved ?
WRONG. The torrents can take some time to download, but they will be indexed and appear on the web directory as soon as the download starts and the scraper is DUMB and will try to download it and fail because files aren’t complete on the server yet.
So, rtorrent RPC to the rescue, See the part where they teach you how to send email alerts on download completion ? That’s cute but my version updated its ‘downloaded’ boolean in the torrents DB on the server.
The local scraper was updated to download only if the ‘downloaded’ boolean was True and voila problem solved.
Well, most of it. But you know rippers and pirates don’t like prison time so they use elaborate(word of the week) ways to release these episodes like compressing the files into an archive and splitting. How that works is, you first compress the file into an archive format like 7z, zip, rar and then you split the archive into multiple small files that are easier to circulate around in places say telegram channels, 4chan forums, etc(elaborate right, but not as elaborate as the cigarette barter system in prisons).
So i can extract the downloaded split archives every morning and wait for 3–4 minutes but i called it elaborate in CAPS for a reason.
So after the local scraper was done downloading an episode, it would check if there were any archives present in what it downloaded, and if it was an archive, classified it as a split or just regular archive by checking file extensions like *.r00-rxx followed by a *.rar. And then extracted it, checksum verified it and if all was good, deleted the archive files, leaving a clean directory full of every-day's TV show episodes and all i had to do after i woke up at 2 P.M in the afternoon(yes i got free attendance from my college teachers, deal with it) was double click and munch on my lunch.
AFTER CREDITS SEQUENCE(the part where they show the lead actor and actress got a dog, got married, had kids and bought a nice home with baby proofed kitchen and staircase and a back yard with a trampoline) [ELABORATE]:
Couple of additions i did was to buy another server to stream my TV shows away from laptop (This is when i finally moved out of the dorm and felt WHY NOT again). The second server ran the same scraper running on my local machine and just served those files with an on-the-fly transcoder with web/android and all sorts of awesome client applications.
Bought a Raspberry Pi and a 2TB HDD and created a local network NAS server for storing TV shows. So now my laptop and the second server were both running the same technology stack which consisted of the scraper scraping my first server for new torrents and a media server serving those TV shows on the client of my choosing.
In the house, i used the local server(my laptop) to get the best latency and overall bandwidth advantage, outside i bounced around between 480p-720p streaming from the second server.
Wrote some housekeeping scripts for first server to delete oldest torrents(5:1 seed ratio fellas) and similar things for the second server running the media server.
The local server had a 2TB HDD which is at 1.3TB at the time of writing this article, so i think I’m good there.
Wrote a flask application for first server with routes for adding new shows to be tracked or remove if i was done with a show *cough* big bang theory *cough*. HEY i am not a quitter, i am a survivor.
College students get a .edu google account that has unlimited gdrive, so wrote a heroku flask webapp that runs as a free backup service for all my precious torrents and other data. Think of it as another scraper, but after it downloads the torrent from the first server, it uploads to gdrive and stores the link to it in the DB.
Done a bunch of shit throughout my fun days, but this one is so stable, The system has an effective up time of more than 6 months with no failures.
But then i finished college(i was also surprised i didn’t drop out), i got a job and after 9 hours of discussions, meetings, coffee breaks, lunch breaks and poop breaks, a man needs a system like this to hook him up with the latest episodes at the end of the day.
