Webscraping with Python for Avengers Infinity War

Rohith Bandikatla
4 min readJun 11, 2018

--

I am a great fan of Ironman and the Avengers series and when Avengers Infinity war was about to release, I had to watch it, that too on an IMAX screen. But considering all the frenzy that the movie has garnered, it was close to impossible to get the tickets in the screen that was looking for. So, I thought it was the right time to put the skill that I have picked recently to use — Web Scraping!

Web Scraping : Web scraping is a technique employed to extract data from websites programatically whereby, the extracted data is either processed to perform some actions on or stored to a local file or database .

Prerequisites :

Python : All the code that we are going to discuss is in Python. Basic understanding of Python is required to follow this post.

HTML : Though we are not writing any html, we are going to inspect some html and pick some html elements to analyze. Basic understanding of html structure helps understanding this post easier.

Problem statement: Write a script to scrape a ticket booking site (bookmyshow in this case) and notify the user if a particular screen is open for bookings. And run this script on a scheduler to check occasionally.

We are going to take this very specific use case and see how scraping can hep us here and also learn some web scraping on the go.

Setup :

We are going to use Python v2.7 for this task. Once we have python installed, we need two Python libraries BeautifulSoup and requests. We can install these libraries by using pip or easy_install utilities.

Windows: Windows Python installation provides a utility to install Python modules called easy_install. We can find it in scripts directory under Python installation directory.

easy_install.exe requests
easy_install.exe beautifulsoup4

Mac or Linux: pip utility helps install python modules under mac or linux.

pip install requests
pip install beautifulsoup4

Now that we have the setup ready, let’s get started with code. We start off by importing requests module and read the html content from the page as below

url in my case was link to Avengers booking page url = "https://in.bookmyshow.com/buytic...". At this point we have all the html content of the page with us. We can check if the required screen is present in htmlContent and return success, but let’s go ahead and process the html to find out list of all theaters screening this movie.

BeautifulSoup package helps us filter out the content we need from the huge html we are holding. Import and initialize BeautifulSoup on our html content as below

Now to filter out the required elements from html, we need to understand the structure of the html holding our data. To do that, open the booking page in any web browser and right click on any one of the theater names and select Inspect element.

Our html tags of interest :

ul tag: ul is an unordered list in html

li tag: li is each individual entry in the list

We can see from the html below that all our screen names are part of a list <ul id="venuelist">. We now need to get each individual list item li from this unordered list and get data-name value for screen names that we are looking for.

From the soup variable we created above, we can now do a find of the ul tag with id venuelist, then do a find_all of li and get data-name attribute from each of the li tag to populate the availableScreens list as below

We can now look for required screen ‘Prasads: IMAX Screen’ in availableScreens and notify the user if it is available. Our script is now ready to take the responsibility of getting us the tickets. We can now schedule it to run in the background every few minutes using crontab or Windows task scheduler.

Note: Respect the service providers and be nice to servers. Do not spam servers with requests. Schedule the job to run only once every few minutes. Remember, do not get blocked.

And Bam! There you have it. We have just put the skill that we acquired to help us get tickets of our favorite movie. Enjoy the show. 🍟

Complete python script can be found here : https://github.com/rbkio/bms-web-scraper

Found this post helpful? Please click on 👏 below and share to help others find it.

--

--

Rohith Bandikatla

Software Developer | Technology Enthusiast | Constant Learner