[TUTORIAL] A BASIC DATA ANALYSIS WITH POSITIVES COVID 19 PERU USING PYTHON

Data analysis on covid 19 peru data until 19–08–20. Using libraries such as Pandas, Matplotlib and Seaborn.

Alexander Roman
3 min readFeb 5, 2022

We are currently experiencing a very unforeseen situation. That’s right, I’m referring to the covid 19 pandemic. Where, so far, there were a lot of infected and dead people in many countries, unfortunately.

Well, let’s go back in time. And today is 19–08–20. So, you work at MINSA (Peruvian Ministry of Health), and your boss gives you the task of analyzing the data from Covid 19 to date to find out which departments, provinces and districts have the highest number of positive cases in order to provide biosecurity measures.

Data is provided by Kaggle and can be found in this link: Peru COVID-19 | Kaggle

You can visit my notebook in this link: AlexRoman938/POSITIVES-COVID-19-PERU-BASIC-ANALYSIS: Hello everyone, here is my code about the tutorial of basic analysis with python. (github.com)

Step 1: Let’s see the peru covid 19 dataframe

First, the dataset you have to choose is positivos_covid.csv, and it will be in the variable called df_positives.

Next, let’s see df_positives and its information.

Can you see something wrong there? You found the column “FECHA_RESULTADO” is in a format that doesn’t help you. Also, its dtype is int64, shouldn’t it be a date type?.

Hence, you need to deal with “FECHA_RESULTADO” in the next step…

Step 2: Data Transformation

Now, you will covert from int64 to string the column “FECHA_RESULTADO”. Next, you will apply the method “pd.to_datetime” to convert to datetime data.

This last one is very important because it will help you to create a new column “month”.

If you want to see the changes, then look the dataframe.

Step 3: Data Analysis

It’s time for you to think… How will you know which departments, provinces, districts, etc. are the ones with the highest number of positives?

You could start with basic questions.

Q1: What are the top 10 positive cases of departments?

Q2: What are the top 10 positive cases of provinces?

Q3: What are the top 10 positive cases of district?

Q1: What are the top 10 positive case of departments?

You can see in Lima there are a lot of positive cases. And it has a big difference with respect to the other departments.

The big difference may be because Lima is the most populated city in Peru by Census 2007 and population estimated 2017.

Q2: What are the top 10 positive case of provinces?

Q3: What are the top 10 positive case of district?

You can see almost all districts belong to Lima… That is because in the department of Lima there are a lot of positive cases [You saw in question 1].

At the moment, you have answered 3 questions. But, you want to know more. Then, you decide to see your quantities data and make a analysis along the time of the top district.

Age distribution

You can see 50% of the positive cases are barely over 42 years old. And there are people over 100 years of age infected as well.

Sexo distribution

There are more male positive cases than female positive cases.

Analysis along the time of the top district

You can see the positives cases of the top district each month.

The points drop in the month of August because there are still 12 days left in the month.

CONCLUSIONS

During the analysis you wanted to know how the situation of covid 19 positive cases in Peru. According to the results you could suggest to your boss:

  • Implement drastic measures in the top 10 department and province, e.g., continue quarantine, increase curfew time, etc. Specially, Lima.
  • To increase the number of police and military in the top 10 districts. For citizens to comply with sanitary measures. Especially, San Juan de Lurigancho for the latest upward trend from June to July.

RECOMMENDATIONS

This is a basic analysis, so we recommend further analysis of the data. Ask yourself more questions.

Remember the dataset is until August 19, 2020. Then, we recommend that you should choose a current dataset.

Finally, thank you for reading this post. If you would like to contact me. This is my LinkedIn: Alexander Daniel Roman Gabriel | LinkedIn

REFERENCES

List of regions of Peru by population — Wikipedia

--

--

Alexander Roman

Machine Learning Engineer. I enjoy discussing about MLOps, NLP & Chatbots. Follow me at: https://www.linkedin.com/in/alexanderdroman/