Graphical view of coronavirus live update — Using python
Web Scraping data from a Table in Web Page using python
In this article, we are going to extract data from the table on a website (https://www.worldometers.info/coronavirus/) and store it into a CSV or JSON and visualize using D3.js
What is web scraping?
In simple terms, it is the process of gathering information or data from different webpages (HTML sources). The information or data thus gathered can be used in building datasets or databases for different applications like (Data Analysis, Building a price comparison application, etc. )
Prerequisite:-
1. Basic understanding of Python 3.0 programming.
2. Python 3.0 or above installed in your pc(Don’t forget to ADD python to the path while installing).
Libraries we are using:-
1. BeautifulSoup.
2. Pandas.
3. Requests.
The following are the steps to proceed with the project.
Step-1:- Creating the Virtualenv( Same for Windows and Linux ).
Creating the Virtualenv enables us to make our project independent (we install all the libraries required for this project into this Virtualenv.)
#Upgrading pip
python -m pip install — upgrade pip
#installing Virtalenv
pip install virtualenv
#creating Virtualenv
virtualenv [Name of environment] #enter the name of env without [].
Ex:- virtualenv env
Step-2:- Activating the Virtualenv and installing the required libraries.
Windows:-
If required
( Open Windows PowerShell as administrator and ‘Set Access for activating env in PowerShell window By below command.)
Set-ExecutionPolicy RemoteSigned
Now to activate the env :-
env/Scripts/activate
Now if the env is activated you will See (env) at the beginning of the next line.
In Linux(env/bin/activate)
Installing Required Libraries:-
#installing BeautifulSoup
pip install bs4
#installing pandas.
pip install pandas
#installing requests.
pip install requests
It is always best practice to freeze required libraries to requirements.txt
pip freeze > requirements.txt
Step 3:- Open web page and navigate to the table you want to collect data from > right-click > click on Inspect.
Understand the HTML structure now.
Step 4:- now proceed with the program.
D3.js Chart template:-
Python Programming:-
D3.js image output
Data.json Output file
Refer Code in GitHub here:- https://github.com/saicharankr/WebScrap
Originally published at https://just-python.blogspot.com.