A Mortal’s Guide to Data Science Life Cycle

Charmie Ranodya
Nerd For Tech
Published in
7 min readApr 14, 2023

explained to a five-year-old kid.

Photo by Robo Wunderkind on Unsplash

Welcome back to the series “A Mortal’s Guide to Data Science." In the previous article, we learned what data science is and why it is important. If you haven’t read it yet, you can find it here:

https://medium.com/sliitwif/a-mortals-guide-to-data-science-139407487414

In this article, we will learn how data science works and what steps are involved in a data science project. We will use simple examples and fun analogies to make it easy and enjoyable for kids (and adults) of all ages.

So, let’s get started!

What is a Data Science Project?

Photo by Tingey Injury Law Firm on Unsplash

A data science project is a way of using computers and math to answer questions and make decisions based on data. Data is any kind of information that can be measured, such as numbers, words, pictures, sounds, or even emojis. 🤓

For example, you might want to know how many books you can read in a year, which ice cream flavor is most popular among your friends, or how to make your own video game. These are all examples of questions that data science can help you with.

But how do you go from a question to an answer using data? That’s where the data science life cycle comes in.

What is the Data Science Life Cycle?

Photo by Firmbee.com on Unsplash

The data science life cycle is a process that involves many steps and skills. You can think of it as a life cycle, like the life cycle of a butterfly. A butterfly starts as an egg, then becomes a caterpillar, then a cocoon, and finally a beautiful butterfly. Similarly, data science starts with a question and goes through different stages until it reaches a solution.

The data science life cycle has six stages:

1. Problem Definition

2. Data Collection

3. Data Processing

4. Data Analysis

5. Data Visualization

6. Data Interpretation

These are not linear stages. You don’t have to follow them in order or finish one before starting another. You can go back and forth between them as needed. The idea is to be flexible and agile and adapt to the problem and the data.

Let’s see what each stage does and how it works.

Stage 1: Problem Definition

Photo by Emily Morter on Unsplash

The first stage of data science is to define the problem that you want to solve or the question that you want to answer. This is the most important stage because it sets the direction and goal for the rest of the project.

To define the problem, you need to do some research and talk to people who know about it. You also need to think about why the problem is important and what value it has for you or others. You need to identify the risks and challenges that might come up along the way. And you need to plan how you will approach the problem and what resources you will need.

For example, if you want to know how many books you can read in a year,

You might ask yourself:

- Why do I want to know this?

- How will this help me or others?

- What are some possible difficulties or obstacles?

- How will I measure my reading progress?

- What kind of books do I like to read?

- Where can I find books to read?

- How much time do I have to read?

By answering these questions, you will have a clear idea of what your problem is and how you will solve it.

Stage 2: Data Collection

Photo by Mika Baumeister on Unsplash

The second stage of data science is to collect the data that you need to solve the problem or answer the question. Data can come from many sources, such as books, websites, surveys, interviews, observations, experiments, or sensors. You need to choose the best way to get the data that is relevant and reliable for your problem.

For example, if you want to know how many books you can read in a year, you might collect data by:

- Keeping track of the books that you read every month and writing down their titles and pages.

- Asking your friends or family members how many books they read in a year.

- Searching online for statistics or reports on reading habits

- Visiting a library or a bookstore and browsing their collections

By collecting data from different sources, you will have more information and evidence to support your answer.

Stage 3: Data Processing

Photo by Markus Spiske on Unsplash

The third stage of data science is to process the data that you have collected. Processing means cleaning, organizing, transforming, and summarizing the data so that it is easier to work with and understand. You might use tools like spreadsheets, charts, graphs, tables, or maps to help you with this stage.

For example, if you have collected data on the books that you read every month, you might process the data by:

- Sorting the books by genre, author, or publication date

- Counting how many pages each book has and how long it took you to read it

- Calculating the average reading time per book and per month

- Finding the minimum and maximum reading time and book length

- Comparing your data with the data from other sources

By processing the data, you will have a better overview and understanding of your reading habits and patterns.

Stage 4: Data Analysis

Photo by UX Indonesia on Unsplash

The fourth stage of data science is to analyze the data that you have processed. Analysis means finding patterns, trends, relationships, and insights in the data that can help you solve the problem or answer the question. You might use tools like statistics, math formulas, algorithms, or machine learning models to help you with this stage.

For example, if you have analyzed the data on the books that you read every month, you might find out:

- Which genre is your favorite, and which one is your least favorite?

- Which month was your most productive and which one was your least productive?

- How does your reading speed change over time and depending on the book length?

- How do your reading habits compare with those of other people?

- What factors influence your reading choices and preferences?

By analyzing the data, you will discover new and interesting facts and knowledge about yourself and the book you’re reading.

Stage 5: Data Visualization

Photo by Luke Chesser on Unsplash

The fifth stage of data science is to visualize the data that you have analyzed. Visualization means creating pictures or graphics that show the data and the insights that you have found. You might use tools like charts, graphs, maps, diagrams, or animations to help you with this stage.

For example, if you want to visualize the data on the books that you read every month, you might create:

- A pie chart that shows the percentage of each genre

- A line graph that shows the number of pages and reading time per month

- A scatter plot that shows the relationship between reading speed and book length

- A word cloud that shows the most common words in the titles of the books

- A collage that shows the covers of the books

By visualizing the data, you will make it more attractive and engaging for yourself and others.

Stage 6: Data Interpretation

Photo by NEW DATA SERVICES on Unsplash

The sixth and final stage of data science is to interpret the data that you have visualized. Interpretation means explaining what the data and the insights mean and how they relate to the problem or the question. You might use tools like reports, presentations, stories, or recommendations to help you with this stage.

For example, if you want to interpret the data on the books that you read every month, you might:

- Write a report that summarizes your findings and answers your question.

- Make a presentation that highlights your findings and shares them with others.

- Tell a story that showcases your findings and makes them memorable.

- Give a recommendation that suggests how to improve your reading or choose better books.

By interpreting the data, you will communicate your results and conclusions clearly and effectively.

Congratulations! You have completed the data science life cycle and solved your problem or answered your question using data. But remember, data science is not just one thing. It is a process that involves many steps and skills. And it is also a cycle that never ends. You can always learn more from your data and improve your solution or answer. You can also use your data to ask new questions and solve new problems. Data science is like a puzzle that never gets boring. It is always fun and interesting.

So, what are you waiting for? Start your own data science project today and see what you can discover!

Stay tuned for the next article in this series, where we dive deep into the data science life cycle and learn about some real-world examples of data science projects and how they make a difference in our lives.

--

--

Charmie Ranodya
Nerd For Tech

Just a girl in her 20s trying to figure life out. Writing about Data Science, mental health, productivity and just life in general. Come along for the ride ! 🙂