Analytics Vidhya
Published in

Analytics Vidhya

EDA on The USA Unemployment Rate Dataset

Picture from pexels-nathan-cowley

Unemployment is a term used to refer to individuals who are employable and are actively looking for a job but are not able to. People who are also underemployed are included in this group. Unemployment rate can be measure by dividing the total number of people in the workforce by the number of unemployed people. Usually, unemployment serves as one of the indicators of a country’s economic status. A person is defined as unemployed in the United States if they are jobless, but have looked for work in the last four weeks and are available for work.

To analyze this dataset which we got form Kaggle. The dataset contains the rate (in percentages) of unemployment in different counties for 47 states in the US at different months from the year 1990 to 2016.

In this analysis, we check for the trends in the US unemployment rate from 1990 to 2016, Which states had the highest unemployment rate and why. What counties had the highest employment and what month in the year had was the unemployment rate the highest and why.

Now to the analysis,

1. Import the Libraries

Numpy is used for mathematical calculations in python, Pandas is used for data manipulation while Matplotlib and Seaborn is used for data visualization.

2. Import the dataset and inspect it

3.Explore the Data

We continue by exploring the data by finding out the shape of the dataset (which shows the number of rows and columns) , we find out the names of the columns and also use the describe() function to get more details about the dataset.

From the image above, we see that the dataset has

  1. 885548 rows and 5 columns
  2. From the results of the describe function, we see that the minimum value of the year is 1990 and the maximum is 2016. That is our dataset has values from 1990–2016.
  3. For the Months category, there are 12 months in total and the most frequently occurring is March.
  4. Same goes for the States category. The dataset contains 47 unique states, even though USA has 50 states. This means that 3 states are missing from the dataset. Also, the most occurring state is Texas.
  5. A county is a territorial division of some countries, which forms the chief of local administration. It is an administrative division in a state or country. From the image above we have 1752 unique counties.
  6. Lastly, we notice that the highest rate of unemployment is about 54% and the lowest is zero (meaning that some counties/states had zero unemployment rate.)

Moving forward, to check for the states that were not included in the dataset, we call out the States column and we realize that the states that are missing are Alaska, Florida and Georgia.

To get the state with the highest count,

State_row count

We notice that Texas has the highest frequency and Delaware has the lowest.

To categorize the rate of unemployment according to year, we first group the dataset according to the average unemployment rate per year.

Notice that there was a drop in unemployment rate in 2000 and a spike up in the year 2010.Research says that the drop in the rate of unemployment in 2000 occurred when the government reported that worker-starved companies raised wages and went on a hiring spree that created 340,000 new jobs. According to The Wall Street Journal in 2010, the unemployment rate increased so much because during that time, salaries increased and a lot of people began to look for jobs.

Looking more deeply into the data and the states in that year, we discovered that California had the highest unemployment rate. It was said that during that period the state’s unemployment rate was nearly 3% higher than the national rate. In August 2010 California’s unemployment rate was 12.4% compared to the national 9.6% unemployment rate. The state experienced massive job losses because of the construction decline — job losses that were larger than experienced nationally — and which explain why the rate of unemployment in the state and job losses was way ahead of the national average. In the general outlook of the given data in figure 2, Arizona has the highest unemployment rate, and this may be due to it’s high population.

figure 1
figure 2

Moving forward, visualizing the rate of unemployment per month, we can infer that unemployment increase in the first month of the year, that is January. It is the month of the year with the most firings and layoffs, according to research.

My complete codes can be gotten from my Github page. I trust you learnt a thing or two on how to analyze this data set and make inferences from it. Share your comments and don’t forget to register your claps too.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store