Learn web app development while solving a real world problem

Ajay Sainy
Analytics Vidhya
Published in
4 min readJan 6, 2020
Photo by Luca Bravo on Unsplash

In this series, we will learn creating a web application from scratch, web scraping and storing the scraped data. All this while solving a real world problem.

This is going to be a fun practical series divided into 3 part:

1. Scrape the data

2. Store it

3. Create a web application

Note: I am not sure if web scraping is illegal or not. It’s a complex topic to discuss. This series does not discuss the legal aspects of web scraping. However, I believe web scraping done ethically (debatable what is ethical) should not be a problem for the websites being scrapped.

The problem we are going to solve

Assume that you are living in the USA and want to send some money to your friend or family in India. You would first google USD to INR rate, then you look for a money transfer service that allows you to send money from USD to INR. But, there are a lots of different services that provide different exchange rates, different service charges. First, you collect a list of such money transfer services and then you visit their websites to check what is the rate that they are providing. This takes a lot of time and effort.

In this tutorial we are going solve this problem by creating a web-application that will show the exchange rates provided by these services at one place only. So that, you have to open only one website to decide which service to use.

Lets solve the problem by breaking it into three parts:

  1. Collect data from various services
  2. Store the data
  3. Create web app using the data

This article is going to cover the #1.

Sounds interesting?

We have a lot to cover, so without wasting a moment, let’s get started.

Requirements

1. Python (web scraping)

2. BeautifulSoup (Python library for webscraping)

3. urllib.request (Python library for opening URLs)

In this part, we will cover the web scraping section.

Basics

In a typical client server scenario, client (eg. web browsers) sends a request to server. Server responds with data. For eg. When we open google.com in any web browser, the browser sends the request to google’s server to get the google search page. The google server returns the data in HTML and then browser renders the HTML and display beautiful UI to the user. The same thing happens when we open any other website. Web-Scraping is to read the HTML and get the required data/information from that HTML.

Lets understand this while solving our problem at hand. Follow the below steps:

Step 1: Select the target

As we are going to create an application where users can view the USD to INR exchange rate offered by various services that lets users send money from USA to India. The obvious requirement is the list of such services. We are going to use Remitly and Transferwise (randomly selected).

Step 2: Find the element to scrape in the website

We learned in the basics that all websites are in HTML(Hyper Text Markup Language). Underlying HTML of the website opened in commonly used browsers (chrome, safari, edge, firefox) can be easily seen by right clicking on the page and selecting Inspect option.

We want to get the USD to INR, so open the first website (Remitly), right click on the place where it shows INR rate that you would like to scrape. In the developer console that gets opened, right click on the selected element and copy -> Copy Selector.

The selector is copied to the clipboard, save it for now (!important), we will use it later. Similarly, open the second website (Tranferwise) and do the same.

Now we have the selectors (which is the CSS path to the element that displays the INR rate in the DOM). We will write Python script to programmatically make request (using urllib) to Remitly and Tranferwise web pages and read the HTML response (using BeautifulSoup library) and extract the INR rate using the selectors (obtained in previous step).

The above script contains all the comments to explain what each line is doing. If it needs more explanation, let me know in comments section :). In short, the above script is performing the below steps:

1. Call the page(url) that shows the exchange rate from USD to INR.

2. Get the HTML.

3. Create a BeautifulSoup object to navigate the html easily.

4. Extract the rate using the selectors we got in step2.

Execute

Copy the above script in a file and save it with .py extension (e.g. scrapper.py).

Run the script by executing below command in terminal/cmd (python3 should be installed already)

CopyCopyCopypython scrapper.py 

Now, we have the nice script that scrapes the exchange rate from two money exchange services (Remitly & Transferwise). This script can be easily extended to include more services without much changes in the code. Simply create new class for a new service and include the name of the service in the MONEY_TRANSER_SERVICES array. That’s it.

Conclusion

In this part, we saw how to extract the information from a web-page. In the next part we will see how to structure the data and store in MongoDB for long term storage. Stay tuned!

--

--