New Year’s Eve Resolutions 2015

Photo by Sincerely Media on Unsplash

Phase 1: Ask

As part of my resolution to do at least ten projects and 5 case studies this year, I thought the best way to begin would be to do a project on new year’s eve resolution tweets from 2015. Some of the questions guiding this project are:

  1. What is the most popular resolution category? Least popular?
  2. Which resolution category was retweeted the most? Least?
  3. What was the most popular hour of the day to tweet? How many resolutions were tweeted at those times and in total?
  4. What U.S. State tweeted the highest number of NYE resolutions?

Purpose

By answering these questions, I hope to find some interesting insights into the mindset of Americans on NYE 2015.

Phase 2: Prepare

Data location

Maven analytic’s data playground is the source of the data used for this project.

Data organisation

2015 New Year’s Eve resolutions tweets. Each record represents a single tweet and contains information about the tweet’s date & time, geographic location, original text, and resolution category.

Data quality

Data quality was assessed using the ROCCC criteria and determining the credibility and level of Bias of the data. ROCCC is an acronym that stands for reliable, original, comprehensive, current and cited. The results of the assessment are as follows:

  • Reliable: Initial exploration of the data revealed the following reliability issues:

Some redundant fields won’t be used for the analysis.

The retweet count has a lot of NULL values.

  • Original: The data is third-party data from data.world
  • Comprehensive: The data is complete, and its content is relevant to the analysis.
  • Current: The data is not current. It was created six years ago by CrowdFlower
  • Cited: The data is cited by Maven analytics

Data ethics

Data integrity

The majority of the data is complete, accurate and consistent with minor omissions.

Data relevance

The data contains information relevant to completing the project.

Phase 3: Processing

The guiding questions informed all changes to the data, and these changes were tracked in the change log.

Phase 4: Analysis

This section contains the results from the exploratory data analysis. The SQL code used to perform the analysis can be found on my GitHub.

Discoveries and Surprises

  1. The most popular tweet category is personal growth, and the least popular is philanthropic.
  2. The most retweeted tweet category was personal growth and the least retweeted time management/organisation
  3. The most popular time of day to tweet was at 9:00 am.
  4. Most tweets were tweeted from California.
  5. Females tweeted about NYE resolutions more than males.

Trends and Patterns

  1. Most tweets were made between the early morning and noon of new year’s eve.
  2. The majority of the tweets were tweeted during the week. People tweeted less about resolutions on the weekend.

Phase 5: Sharing

The interactive dashboard developed for this project can be found on my Tableau profile. However, an image of the dashboard can be found below.

New Year’s Eve Resolution Tweets (2015)

Phase 6: Conclusion

Though the data set isn’t current it offers a nice glimpse into the mindset of the American people heading into 2015. I found it interesting how people rarely tweeted on the weekends compared to weekdays. Maybe Twitter really just helps people vent during the week and use the weekend to relax.

If you’ve made it this far, thank you for taking the time. I appreciate it and if you’d like to collaborate or just connect don’t hesitate to reach out!

--

--

--

A budding data enthusiast and explorer, looking to grow and collaborate! you can check out my portfolio here: https://noelogbuagu.github.io/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Scaling our business with data science, algorithms, and expert insights

How to use the Bollinger Bands indicator for tick trading binary options — Part 2

Creating a Choropleth Map of the US Eye Diseases Prevalence Rates by Plotly

Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python

Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python. How the time series model performance is impacted by seasonalities, holidays, special events, and additional features.

Top Data Science Tools Explored in 2019

Data Science Tools

COVID-19 Trend Analysis : The News Is Good

Documenting the path to becoming a data scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Obinna ogbuagu

Obinna ogbuagu

A budding data enthusiast and explorer, looking to grow and collaborate! you can check out my portfolio here: https://noelogbuagu.github.io/

More from Medium

A Book Review — Punctuation

A table of 114 numbers. Odd numbers are in blue and even numbers in red.

How to know your audience?

Don’t Study Harder, Study SMARTER the Easy Way Without Cheating!

Analytical work and order