New Year’s Eve Resolutions 2015
Phase 1: Ask
As part of my resolution to do at least ten projects and 5 case studies this year, I thought the best way to begin would be to do a project on new year’s eve resolution tweets from 2015. Some of the questions guiding this project are:
- What is the most popular resolution category? Least popular?
- Which resolution category was retweeted the most? Least?
- What was the most popular hour of the day to tweet? How many resolutions were tweeted at those times and in total?
- What U.S. State tweeted the highest number of NYE resolutions?
By answering these questions, I hope to find some interesting insights into the mindset of Americans on NYE 2015.
Phase 2: Prepare
Maven analytic’s data playground is the source of the data used for this project.
2015 New Year’s Eve resolutions tweets. Each record represents a single tweet and contains information about the tweet’s date & time, geographic location, original text, and resolution category.
Data quality was assessed using the ROCCC criteria and determining the credibility and level of Bias of the data. ROCCC is an acronym that stands for reliable, original, comprehensive, current and cited. The results of the assessment are as follows:
- Reliable: Initial exploration of the data revealed the following reliability issues:
Some redundant fields won’t be used for the analysis.
The retweet count has a lot of NULL values.
- Original: The data is third-party data from data.world
- Comprehensive: The data is complete, and its content is relevant to the analysis.
- Current: The data is not current. It was created six years ago by CrowdFlower
- Cited: The data is cited by Maven analytics
- Licensing: Creative Commons Attribution 4.0 International License has made the data available.
- Privacy: The data is anonymised. No names or addresses can link the data to the Twitter account holders.
- Security: The data is secured by data.world.
- Accessibility: Under the Creative Commons Attribution 4.0 International License, the data is open and accessible.
The majority of the data is complete, accurate and consistent with minor omissions.
The data contains information relevant to completing the project.
Phase 3: Processing
The guiding questions informed all changes to the data, and these changes were tracked in the change log.
Phase 4: Analysis
This section contains the results from the exploratory data analysis. The SQL code used to perform the analysis can be found on my GitHub.
Discoveries and Surprises
- The most popular tweet category is personal growth, and the least popular is philanthropic.
- The most retweeted tweet category was personal growth and the least retweeted time management/organisation
- The most popular time of day to tweet was at 9:00 am.
- Most tweets were tweeted from California.
- Females tweeted about NYE resolutions more than males.
Trends and Patterns
- Most tweets were made between the early morning and noon of new year’s eve.
- The majority of the tweets were tweeted during the week. People tweeted less about resolutions on the weekend.
Phase 5: Sharing
The interactive dashboard developed for this project can be found on my Tableau profile. However, an image of the dashboard can be found below.
Phase 6: Conclusion
Though the data set isn’t current it offers a nice glimpse into the mindset of the American people heading into 2015. I found it interesting how people rarely tweeted on the weekends compared to weekdays. Maybe Twitter really just helps people vent during the week and use the weekend to relax.
If you’ve made it this far, thank you for taking the time. I appreciate it and if you’d like to collaborate or just connect don’t hesitate to reach out!