Identifying Good Data Sources

Stacie Kipruto
2 min readJun 15, 2022

--

Photo by Artem Maltsev on Unsplash

As I continue my data science journey, I needed to understand some of the fundamentals, which made me enroll in the Google Data Analysis professional certificate.

I attended a datathon a week ago and, my teammates and I had a huge challenge acquiring data ethically. Most data sets were from third party sources, while the rest came from social media — which is not a reliable data source. It is extremely important to look out for good data as bad data can have long lasting impacts on business decisions and running of processes. Every good solution is found by avoiding bad data.

Tips to identify a good data source.

The Google Data Analytics professional course(3), focuses on preparing data for examination. The course uses the ROCCC system to identify good data sources.

R — Reliable: Is the data source reliable? Has it been vetted and proven fit for use?

O — Original: Is the data from a second or third party source? Are you able to validate the data with the original source?

C — Comprehensive: Does the data source contain all the critical information needed to answer the question or find a solution?

C — Current: Does the data decrease in usefulness as time goes by?

C — Cited: Who created the data? Is it part of a credible organization? When was the data last refreshed?

In a nutshell, good data should ROCCC! The life of an analyst involves asking a lot of questions. Moving forward, using this acronym will help me in choosing the data for all my analyses.

--

--