Birappa Goudanavar
4 min readJul 1, 2021

5 Data Analytics projects for beginners

As an aspiring data analyst, you’ll want to demonstrate a few key skills in your portfolio. These data analytics project ideas reflect the tasks often fundamental to many data analyst roles.

1. Web scraping

While you’ll find no shortage of excellent (and free) public data sets on the internet, you might want to show prospective employers that you’re able to find and scrape your own data as well. Plus, knowing how to scrape web data means you can find and use data sets that match your interests, regardless of whether or not they’ve already been compiled.
If you know some Python, you can use tools like Beautiful Soup or Scrapy to crawl the web for interesting data. If you don’t know how to code, don’t worry. You’ll also find several tools that automate the process (many offer a free trial), like Octoparse or ParseHub.
If you’re unsure where to start, here are some websites with interesting data options to inspire your project:

Reddit

Wikipedia

Job portals

Tip: Anytime you’re scraping data from the internet, remember to respect and abide by each website’s terms of service. Limit your scraping activities so as not to overwhelm a company’s servers, and always cite your sources when you present your data findings in your portfolio.

Example web scraping project: Todd W. Schneider of Wedding Crunchers scraped some 60,000 New York Times wedding announcements published from 1981 to 2016 to measure the frequency of specific phrases.

2. Data cleaning

A significant part of your role as a data analyst is cleaning data to make it ready to analyze. Data cleaning (also called data scrubbing) is the process of removing incorrect and duplicate data, managing any holes in the data, and making sure the formatting of data is consistent. 
As you look for a data set to practice cleaning, look for one that includes multiple files gathered from multiple sources without much curation. Some sites where you can find “dirty” data sets to work with include:

CDC Wonder

Data.gov

World Bank

Data.world

/r/datasets

3. Exploratory data analysis (EDA)

Data analysis is all about answering questions with data. Exploratory data analysis, or EDA for short, helps you explore what questions to ask. This could be done separate from or in conjunction with data cleaning. Either way, you’ll want to accomplish the following during these early investigations.

Ask lots of questions about the data.

Discover the underlying structure of the data.

Look for trends, patterns, and anomalies in the data.

Test hypotheses and validate assumptions about the data.

Think about what problems you could potentially solve with the data.

10 free public datasets for EDA

An EDA project is an excellent time to take advantage of the wealth of public datasets available online. Here are 10 fun and free datasets to get you started in your explorations.
1. National Centers for Environmental Information: Dig into the world’s largest provider of weather and climate data.
2. World Happiness Report 2021: What makes the world’s happiest countries so happy? 
3. NASA: If you’re interested in space and earth science, see what you can find among the tens of thousands of public datasets made available by NASA.
4. US Census: Learn more about the people and economy of the United States with the latest census data from 2020.
5. FBI Crime Data Explorer (CDE): Explore crime data collected by more than 18,000 law enforcement agencies.
6. World Health Organization COVID-19 Dashboard: Track the latest coronavirus numbers by country or WHO region.
7. Latest Netflix Data: This Kaggle dataset (updated in April 2021) includes movie data broken down into 26 attributes.
8. Google Books Ngram: Download the raw data from the Google Books Ngram to explore phrase trends in books published from 1960 to 2015.
9. NYC Open Data: Discover New York City through its many publicly available datasets on topics like the Central Park squirrel population to motor vehicle collisions.
10. Yelp Open Dataset: See what you can find while exploring this collection of Yelp user reviews, check ins, and business attributes.

4. Sentiment analysis

Sentiment analysis, typically performed on textual data, is a technique in natural language processing (NLP) for determining whether data is neutral, positive, or negative. It may also be used to detect a particular emotion based on a list of words and their corresponding emotions (known as a lexicon). 
This type of analysis works well with public review sites and social media platforms, where people are likely to offer public opinions on various subjects.
To get started exploring what people feel about a certain topic, you can start with sites like:

Amazon (product reviews)

Rotten Tomato (movie reviews)

Facebook

Twitter

News sites

5. Data visualization

Humans are visual creatures. This makes data visualization a powerful tool for transforming data into a compelling story to encourage action. Great visualizations are not only fun to create, they also have the power to make your portfolio look beautiful.

Recommended 📚

Birappa Goudanavar

Data Engineer at Alcon | Freelance Data Guru, Helping You Unlock Insights