Meaning of Data Science From A Rookie’s Eyes

A year ago , like many others, I found interest in this hot field, called Data Science. I thought it was cool, and it solved problems so it was best I joined the wagon, plus I have plans of building AI and Machine Learning Applications. But then, right from the start I got lost in the unstructured world of Data Science. So many things to learn, so many skills to master. Where should I go, how should I start. What is Data Science anyway?. How is it practiced and what do I really need to thrive in the world of Data Science without beating about the bush. I didn’t get all these answers a year ago. Now I have a gist of what I have to do, I just started the journey and I would like to share what I understand as Data Science.

So What Is Data Science Anyway ?

So it seems the world is generating more data than before, exponentially. Big data is everywhere and big data seems to have big questions. Sometime ago, Statisticians were the ones who dealt with data, they analyzed it and made meaning out of it. But now, the amount of data generated in a period is growing fast. When I say fast, I mean blazing fast. Barry Allen’s kinda fast.

This means statistics can’t do the job alone, it needs team-mates. When dealing with large data, computing science come to play. Programming sets in. We need to write programs to help us retrieve and digest big data, automate some of the things we do. Then we also need to store this data. Storage and accessing of big data comes with its own challenges. It means we need special database systems and techniques to accumulate big data.

Also, we know computers are faster at processing information than our minds, they don’t get tired, they are computing beasts. This means we can use their features to help us answer most of the big questions out there. Which springs up a field called Machine Learning. When we apply Machine Learning techniques, it helps us to delegate the “thinking” part of work to the machine, because the machine can learn things faster, find patterns really fast and give us answers we would spend hours, maybe years, or centuries looking for in a shorter timespan. This is a big plus for us.

Humans are visual creatures. this is why, most of the times, answers generated from big data are presented visually. To help us get the whole picture clearly. This makes Data Visualization also play a big part in the field of Data Science.

So right now, I have been talking about big data , and how we can process it effectively and efficiently to generate knowledge to solve problems around us. Well that’s it, Data Science is a field, that deals with that.

The Data Science Process

Data Science as its name suggests uses a scientific approach to solving problems. When starting a data science project whether at work or as a hobby, there is a standard process, which can help your work flow. However this process can be iterative and can be customized to suit your needs. The stages are;

1. Identify Your Goal

This is where you figure out why you are taking up the project and your end game. Knowing your end game will guide you to finding a better solution

2. Gather Your Data

This is where you retrieve data you are going to use to find the knowledge you need. You can use existing data, or mine your own data from the sources available to you.

3. Prepare Your Data

Sometimes data you generate doesn’t come in an easy-to-work-with form. There can be errors in the data, and inconsistencies. This is the part you clean it all up. This is a very important stage in the process because, if you don’t get it right. It can lead you to the wrong direction or answer.

4. Explore Your Data

One interesting stage. This part you apply statistics to get deep understanding of your data and build simple models. Its all about understanding the data and finding patterns here.

5. Data Modeling

Now you select the technique you wan to use to find answers. It can be statistics, machine learning, and some others. Normally, it is a combination of these. Now, you build data models and use it to find solutions to achieve your goal. Its all about finding relationships between the data variables.

6. Presentation/Automation

This is where you visualize the data. You present it in human readable form.And sometimes, you automate the process, if you find yourself, repeating the steps to solve similar problems.

Well My Journey Just Started

And I hope to nail it by year end, and write a pro’s version of this post. Wish me luck !! I will be writing more on Data Science, as I go through my journey. Thank you for your time.