COVID19-API.org Data Exploration and Visualizing Using Python

Dwi Gustin Nurdialit
Analytics Vidhya
Published in
5 min readOct 14, 2020

Covid19 is a pandemic that has spread throughout the world. Most of the countries in the world have been infected. The handling of each country is different, according to government policy. This results in differences in the trend of increasing or decreasing COVID cases in different countries.

In this case, we will try to dig up covid19 data from one of the available open fires, namely https://covid19-api.org/.

In simple terms, API can be understood as a way for computers to communicate with other computers so that data exchange transactions can occur. So that in this project, we will learn techniques for retrieving data from the API, techniques for preparing data, and data analysis and visualization.

Import Libraries

First of all, import the required libraries.

import jsonimport numpy as npimport pandas as pdimport requestsimport matplotlib.pyplot as pltimport datetime

Load The Data

Create a python get_json function with the api_url parameter. This function will return a python dictionary value if the resulting status_code is 200. Otherwise, the return value is None.

def get_json(api_url):response = requests.get(api_url)if response.status_code == 200:return json.loads(response.content.decode('utf-8'))else:return None

Global COVID19 data is at covid19-api.org. We use the record_date variable by “2020–09–7” and put the API response results into the df_covid_worldwide variable.

record_date = '2020-10-12'covid_url = 'https://covid19-api.org/api/status?date='+record_datedf_covid_worldwide = pd.io.json.json_normalize(get_json(covid_url))

To get the COVID19 data frame, we should normalize and call the previously created function. Then, we can see the first 5 data from our data frame.

df_covid_worldwide.head()country          last_update    cases  deaths  recovered
0 US 2020-08-16T23:27:50 5401167 170019 1833067
1 BR 2020-08-16T23:27:50 3340197 107852 2655017
2 IN 2020-08-16T23:27:50 2589682 49980 1862258
3 RU 2020-08-16T23:27:50 920719 15653 731444
4 ZA 2020-08-16T23:27:50 587345 11839 472377

Data Exploration and Visualization

— Cases on Each Continent

Let us take a different data frame from my GitHub named countries.csv, then load as a pandas data frame. Combine the previous data frame with a new data frame

countries_df = pd.read_csv("countries.csv")countries_df = countries_df.rename(columns = {'location': 'name'}, inplace = False)covid_df = countries_df.merge(df_covid_denormalized, on='name')

Through the visualization, we can see the continent with a high percentage of cases.

continent_case = covid_df.groupby('continent')['cases'].sum()plt.style.use('fivethirtyeight')plt.figure(figsize=(13,7))plt.title("Percentage of Confirmed Cases on Each Continent")g = plt.pie(continent_case, labels=continent_case.index,autopct='%1.1f%%', startangle=180)plt.show()

As we can see in the pie chart above, Asia takes first place for high confirmed cases of Covid19.

—Case Fatality Ratio (CFR)

In epidemiology, there is a fatality ratio that means the proportion of deaths from a certain disease compared to the total number of people diagnosed with the disease for a particular period.

So, let’s make a new column for the fatality ratio.

df_covid_denormalized['fatality_ratio'] = df_covid_denormalized['deaths']/df_covid_denormalized['cases']

Through the visualization, we can see 20 countries with a high fatality rate.

plt.figure(figsize=(17, 7))y = df_top_20_fatality_rate['name']x = df_top_20_fatality_rate['fatality_ratio']plt.ylabel('Country Name')plt.xlabel('Fatality Rate')plt.title('Top 20 Highest Fatality Rate Countries')plt.xticks(rotation=45)plt.hlines(y=y, xmin=0, xmax=x, color='indianred', alpha=0.8, linewidth=10)plt.plot(x, y, "o", markersize=8, color='#007acc', alpha=0.8)plt.tight_layout()

— Mortality Ratio

Mortality rate or death rate is a measure of the number of deaths in a particular population, scaled to that population's size, per unit of time. We were calculating the mortality rate as the number of deaths cases divided by the number of population.

covid_df["mortality_rate"] = round((covid_df['deaths']/covid_df['population'])*100,2)covid_mortality_top_20 = covid_df.sort_values(by='mortality_rate', ascending=False).head(20)

Through the visualization, we can see 20 countries with a high mortality rate.

plt.figure(figsize=(17, 7))y = covid_mortality_top_20['name']x = covid_mortality_top_20['mortality_rate']plt.ylabel('Country Name')plt.xlabel('Mortality Rate')plt.title('Top 20 Highest MortalityRate Countries')plt.xticks(rotation=45)plt.hlines(y=y, xmin=0, xmax=x, color='darkblue', alpha=0.8, linewidth=10)plt.plot(x, y, "o", markersize=8, color='#007acc', alpha=0.8)plt.tight_layout()

— Recovered Ratio

We calculated the recovered ratio as the number of recovered cases divided by the number of confirmed cases.

Through the visualization, we can see 20 countries with a high recovered rate.

covid_df["recovered_rate"] = round((covid_df['recovered']/covid_df['cases'])*100,2)covid_recovered_top_20 = covid_df.sort_values(by='recovered_rate', ascending=False).head(20)plt.figure(figsize=(15, 7))y = covid_recovered_top_20['name']x = covid_recovered_top_20['recovered_rate']plt.ylabel('Country Name')plt.xlabel('Recovered Rate')plt.title('Top 20 Highest Recovered Rate Countries')plt.xticks(rotation=45)plt.hlines(y=y, xmin=0, xmax=x, color='dodgerblue', alpha=0.8, linewidth=10)plt.plot(x, y, "o", markersize=8, color='#007acc', alpha=0.8)plt.tight_layout()plt.show()

COVID-19 Data by Timeline

This section, we want to see the timeline of the development of COVID-19. From the previous API source, we begin to create the API function

timeline_url = ‘https://covid19-api.org/api/timeline'covid_timeline_df = pd.io.json.json_normalize(get_json(timeline_url))

Now, we visualize the difference in timeline

plt.clf()fig, ax = plt.subplots(figsize=(15,7))covid_timeline_df.plot(x='last_update', kind='line', ax=ax, lw=3, color=['salmon', 'slategrey','olivedrab'])ax.set_title("Dynamic COVID-19 Cases")ax.set_xlabel('')plt.grid()plt.tight_layout()

Conclusion

From the data and graphs displayed, we can conclude that COVID-19 cases in the world are increasing and increasing. Therefore, we must maintain our health and cleanliness to avoid COVID-19. Stay home and stay safe.

--

--