Data Analysis: Countries of the World

Surabhi Basak
CampusX
Published in
6 min readAug 19, 2019

Wouldn’t it be fun if from a data you can get answer of some questions? For that we do data analysis. In this article, I will be doing a data analysis on a dataset, and conclude some points from it.

Data analysis is the process of evaluating data using analytical and statistical tools to discover useful information and aid in decision making. It is a process of collecting, transforming, cleaning, and modelling data to discover the required information. The results so obtained are communicated, suggesting conclusions, and supporting decision-making. Data visualization is at times used to portray the data for the ease of discovering the useful patterns in the data.

In this article, I will be demonstrating the process of data analysis using a dataset.

Dataset Overview

This is a csv file that contains details of different Countries in the World. Let us see, what is the length of the data?

So we can observe that the data set contains 227 rows and 20 columns. Now we will see what are the columns present in the dataset ?

The columns are:

  • Country: It stores the name of the country.
  • Region: It stores the region where the country is situated.
  • Population: It stores the population of the country.
  • Area (sq. mi. ): It stores the area of the country.
  • Pop Density ( per sq. mi.): It stores the population density of the country.
  • Coastline (coast/ area ratio): It stores the ratio between coastline area to the total area.
  • Net migration: It stores the difference between the number of immigrants (people coming into an area) and the number of emigrants (people leaving an area) throughout the year.
  • Infant mortality (per 1000 births): It stores the number of death of children under the age of one year.
  • GDP ($ per capita): It stores the GDP.
  • Literacy (%): It stores the literacy percentage of the country.
  • Phones (per 1000): It stores the number of phones in the country.
  • Arable (%): It stores the arable percentage of the country.
  • Crops (%): It stores the percentage of crops in the country.
  • Others (%): It stores percentage of other things in the country.
  • Climate: It stores the climate of the country.
  • Birthrate: It stores the birthrate of the country.
  • Deathrate: It stores the deathrate of the country.
  • Agriculture: It stores amount of agriculture of the country.
  • Industry: It stores the amount of industry in the country.
  • Service: It stores the amount of service available in the country.

First we will remove the data which has NaN value.

We deleted those rows which has NaN value in it, and we can now observe that the dataset decreased to 179 from 227 rows. Next, our job is to see, the data types of the column attributes.

Here we can observe that leaving Population, Area and GDP, all the attributes are of datatype object, moreover, by looking at the data, we can also observe that the data contains ‘,’ and ‘.’. So now we will change the datatype as well as the data.

Here a is a dataframe which contains attributes without ‘,’ and ‘.’, and of datatype float and int. After that we merge both the dataframes, and get a final dataset containing 20 columns, with only Country and Region of datatype object, and rest are float and integer respectively. Hence we finally complete the process of cleaning the data and making it useful for analysis.

Now after cleaning the data, we will start asking questions to the data.

Asking Questions and Communicating Result

Which are the top most populated Country of the world?

From the above bar graph, we can observe that, China is the most populated country followed by India, the second most populated country in the world.

Which country has the biggest area?

From the above graph, we can observe that China and United States has the biggest area among all, followed by Brazil and Australia.

Which country has largest coastline area?

So, from the above graph it is clear that Micronesia Fed. St. has the largest coastline area followed by other countries. Micronesia has largest coastline because it is an island.

Which are the top countries with highest GDP?

The countries with highest GDP are :

United States, Norway, Bermuda, Cayman Islands, Switzerland, Denmark and Iceland.

Which are the countries that are 100% literate?

The countries that are 100% literate are Australia, Denmark, Finland, Liechtenstein and Norway.

Is there any correlation between birthrate and deathrate?

From the above heatmap, we can observe that there is a correlation of 0.45 between birthrate and deathrate, which is quite good.

Is there any correlation between climate, crops and arable?

From the above heatmap, we can observe that crops neither depend on how arable the land is nor the climate, but on the other hand how arable the land is somewhat dependent on the climate with correlation of 0.38.

How much GDP is dependent on industry and service?

From the above heatmap, we can observe that GDP do not depend on industry that much, but GDP depends on Service somewhat by 0.54.

How birthrate varies?

The above plot shows how birthrate varies.

How birthrate varies?

The above plot shows how deathrate varies.

Conclusion

After going through the whole data analysis process, I can observe that United States is bigger in size and GDP is more, but less populated. Again country like China is bigger in size as well as very much populated. Moreover Australia, though it is a not a big country, neither it has population like China, but it is 100% literate. So I can conclude that Australia is a better country. Hence we can see that data analysis makes it so easy for us to analyse the data and conclude points from it.

Thank You

--

--