All was well and normal (as it could have been) until December 2019 when COVID-19 first made national news in Wuhan, China. COVID-19 originated in China when people first started experiencing shortness of breath and fever.
I remember it was my freshman year in the dorms at CU Boulder. Around the
holidays and before winter break, I remember a lot of people getting sick in the
dorms. I thought to myself it is that time of the year, it’s the flu and cold season.
But from a few stories that I’ve heard, when people went to the doctors they
were told it was not strep, the common cold, or the flu. Nurses and doctors
didn’t know what it was. So I wonder….was it something we would soon call
COVID-19?
After a very nice and relaxing winter break, I started my second semester at CU. Starting in February 2020, I remember my roommate telling me about a new virus that’s trending on Twitter news that is spreading rapidly. There is a cruise ship trying to dock in San Francisco but no port wants them because that ship had infected COVID-19 passengers aboard. Since hearing of COVID-19 and cruise ships having outbreaks, the next couple of weeks at school were off and strange. It was getting harder and harder to focus in my classes, less students started showing up, and professors were talking about not ‘if we go remote’ but ‘when we go remote’ sensing that we will all be remote soon and sent home, we are just waiting for the school to announce it. Sure enough on March 11, 2020 CU Boulder sent out an email announcing remote teaching and learning will begin Monday the 16th, 2020.
I can imagine that from traveling domestic and international via trains, cruise ships, planes, car, bus, etc. over the holidays is how the COVID-19 virus entered the United States as for the same of other countries as well. This was the beginning of the global pandemic known as COVID-19. In terms of the U.S. I’m curious to know which states were hit the hardest and which states weren’t. Also curious to know how the numbers have changed from 2020 to 2021. Essentially what states had the biggest increase in COVID-19 cases and which states had the smallest difference?
I will compare 2020 to 2021 COVID-19 related deaths and their differences in increases or decreases. My control group is 2020 because this was the beginning numbers of when the pandemic started. I want to know the changes between states from 2020 to 2021. I’m going to focus strictly on the ‘COVID-19, Underlying Cause’ for deaths because I only want to know the states that had the most and least amount of deaths relating to COVID-19 being the ultimate cause of death. My hypothesis is that New York, California, Illinois, Washington D.C, Florida, and Texas are the states (not in any particular order) that accounted for the highest number of deaths in 2020. I will test this by using data from the CDC Deaths csv file for 2020. My other hypothesis is that California, Texas, and Florida will be the states that had the highest number of increases in COVID-19 deaths from 2020 to 2021.
First, I filtered through the CDC Deaths dataframe to only show the data for the year 2020 and 2021. Since I only want to know which states had the most COVID-19 deaths, I am only focusing on the ‘State’ and ‘COVID-19, Underlying Cause’ columns in my data. I then performed a groupby (on ‘State’) and aggregate function (on COVID-19, Underlying Cause’) using sum because I just wanted the total deaths per state for 2020 and 2021.
My first hypothesis was somewhat right to an extent. California being #1, Texas #2, and Florida #3 with the highest number of COVID-19 deaths. However, Illinois is ranked #6 with about half the number of deaths as California and Texas. I expected that California, Texas, and Florida would have high numbers because those states are very densely populated. New York ranks #7 which I found a little surprising, I thought New York would have a higher number, but I was thinking of New York City specifically when I first made my hypothesis. Same thinking I had for Washington D.C. too, which is actually in the bottom 10 of COVID-19 deaths.
Then, I renamed the column names so I could later combine (or add) the ‘2021 Death’ numbers next to the ‘2020 Death’ numbers into one DataFrame. After that I added another column called ‘Difference 2020–2021’ to the DataFrame containing both 2020 and 2021 data.
The answer to my second hypothesis was correct though. California, Texas, and Florida had the highest number of increases in COVID-19 deaths from 2020 to 2021.
For my visualization, I tried combining my columns ‘2020 Deaths’ and ‘2021 Deaths’ data into one graph all together showing the difference in deaths for each state. Instead, I have two graphs to show the differences between states of the number of deaths for each year.
From my visualizations you can see which states had bigger increases/decreases in COVID-19 deaths from 2020–2021. For example, Texas had the max increase in deaths from 2020 to 2021. Florida also had an increase in deaths of 15,140 from 19,705 (2020) to 34,845 (2021). New Jersey had a decrease in deaths of 8,613 from 16,806 (2020) to 8,193 (2021) decreasing from over 15,000 to under 10,000. Colorado’s deaths increased by 603 from 4,470 (2020) to 5,703 (2021). Puerto Rico stayed the same from 2020 to 2021 at 1,373 deaths and Vermont stayed at the bottom decreasing from 96 deaths in 2020 to 84 deaths in 2021. These visuals partially support my first hypothesis of CA, TX, and FL being the states with the most deaths in 2020 and fully support my other hypothesis that these three states had the highest increase in deaths.