Python + BeautifulSoup : Simple Coronavirus Data Tracking App for beginners

As a beginner, I was super excited to step into the tech world and begin my coding journey. I wanted to create a application I would use everyday. Given the current situation of ‘Covid-19’ across the globe and the amount fake numbers floating across the web and media, I wanted to develop an app that tracks the global and country level data along with latest news articles.

Action Plan

  1. Tracking country specific statewise data
  2. Tracking global data
  3. Tracking trending news across the globe
  1. Country Specific Data: Government Official Website
  2. Global Data: Worldometers
  3. News: Google News API

Note: Use only trusted sources. Avoid using specific news articles to track numbers

  1. Frontend: HTML, CSS, Bootstrap & Javascript
  2. Backend: Python-Flask Framework

Web-scraping: BeautifulSoup 4

  1. Install Flask (Using Pip3): Used flask as it simplifies the process of designing a web application and takes away a lot of headache of creating endpoints and navigation across the pages. Flask lets you control what the users are requesting and the response to give back. This article will gives an idea as to how a Flask app works.
  2. Install BeautifulSoup4 (Using Pip3): Used BeautifulSoup as it has a simplified approach for scraping data from webpages and is almost perfect for beginners. Browse through this article to understand the nitty-gritty of Web-Scraping with BeautifulSoup

Open your coding environments. Create a file structure, specific to Flask framework as below. The ‘static’ folder is used to store all the static files such as icons, images used in the website. The ‘templates’ folder is used to store the html files that will be displayed.

1. Let’s start by building a skeleton for our python script and defining the endpoints/routes and functions required for the application:

2. We have defined 3 routes and will need to build 3 HTML pages to be created under the templates folder. @app.route acts as a decorator function for our specific page functions.

3. I always like to work on the backend first. So I know exactly the kind of data I can extract and accordingly decide the html page structure. Now lets start scraping our data.

4.Begin by scraping the global data. I have defined functions that I can call within my methods later. Lets look at the data_scrape function

5. BeautifulSoup is used to extract the content from html webpage(base_url). To scrape the data from the table on the worldometers/coronavirus page, we need to provide the table id and save it ‘table’ so that we can use it to extract row-wise data. The data for all table rows is stored in table_rows using find_all elements by <tr> tag.

6. Since every row has data across multiple columns by the tag <td>, a for loop is used to extract data per row and is stored in a list by the name ‘data’ . The data extract would be list of data contained within all the rows and would look like this: [[], [‘China’, ‘81,171’, ‘+78’, ‘3,277’, ‘+7’, ‘73,159’, ‘4,735’, ‘1,573’, ‘56’], [‘Italy’, ‘63,927’, ‘’, ‘6,077’, ‘’, ‘7,432’, ‘50,418’, ‘3,204’, ‘1,057’], [‘USA’, ‘46,168’, ‘+2,434’, ‘582’, ‘+29’, ‘295’, ‘45,291’, ‘1,040’, ‘139’]…..].

7. The first row is empty as we did not scrape the headers. All that’s left to do is just pass on the data to our ‘global_data.html’ page using render template.

8. The state-wise data specific to ‘India’ is extracted in a similar way using BeautifulSoup4

9. Just a few things to note in the above function, we have skipped the last row and hence the for loop iterates over all elements except the last row.

10. The last row is skipped as its the totals for the table. On line 50, we have converted the row data into integers only to simplify the sorting(line 52) of the data from higher cases to lower cases per state as the table data is not sorted and when we scrape it, all the data including the numbers is stored as a string.

https://www.mohfw.gov.in/

11.The data stored in the state_num list is passed on to the html page using render_template.

12. Lastly, coronavirus news scraping using Google NewsApi. This was one of the simplest things to do as documentation provided by Google has all the answers. I am just pasting my code below for reference:

I have used HTML and Bootstrap4 to design the pages as I am not to big on the designing using pure CSS. Another advantage of Flask framework is the ‘Layout.html’ template. This template contains the HTML skeleton, the header, navbar and bootstrap/css folder links. The {% block body %} block can be replaced by a block of the same name (body) in a child template.

Displaying the global and state-wise data on a html page:

Since the datasets that we have passed to the html pages is in the form of a list, we will have to iterate over the list display data for each row on a separate line. I am not going into much depth over how to display the data on html pages.

Final Product

Market Researcher | Software Developer