What is Data Science?

Published in

--

The basic concept with a simple explanation.

Data Science is one of the most burning topics in the world right now. Data is dominating the world, this is the best I could say to emphasize the importance of data science. But so often it is possible to make a wrong conception of data science from many sources. Some say it’s merely statistics, nothing else. But things are a bit different. I have discussed data science in this article with the simplest explanation possible.

Mr. X likes to wake up late in the morning during the holidays. Friday and Saturday is his weekly day-off. He mostly spends his free time with his family. Sometimes likes to go on a trip with his family. On the working days, he is too sincere to reach his office on time. He goes to his office at 9:00 a.m. sharp and leaves after 5:00 p.m. Sometimes goes out at lunchtime. But he does not go too far to do this.

Now if I ask you where Mr. X will be next Monday at 4 p.m.? The answer won’t be too difficult. “He’ll be at his office” you’d say and yes you’re almost correct. But how could you say that he is in the office? Do you know magic? Or you can tell that you already have some information above from which you can make predictions about Mr. X. Exactly! You’ve just analyzed the data of Mr. X to predict where he will be Monday at 4p.m.

You have a bunch of information about Mr. X. When the question was asked to you, your brain used this information and automatically matched the possible results. And finally came up with the most appropriate answer. Data Science is not only about this answer though, but this is a simple example to go deep into the topic. This can be called “Data analysis”.

The process of data science is not as simple as this is. It requires a unified knowledge from many fields like mathematics, statistics, computer science/IT, business studies. These studies provide many methods or tools to extract some information from existing raw data. And a data scientist works with those methods, tries to utilize those to deal with a dataset or even to improve and find another new method. Thus the role of a data scientist is not just to analyze a dataset but also to improving and finding new methods.

The figure contains a very basic workflow of a data scientist. Lots of talking done now let's go deep into each step of data scientist workflow.

1. Data Collection:

This is the first step for a data scientist workflow. It’s very easy to understand why. If we do not have any data, you simply can analyze nothing. So firstly we have to collect some data. In this process, we must make sure that our collected data is true and accurate. And the collection process should be done under reliable methods and sources. Sometimes wrong result or conclusion is provided due to a lack of stability in the data.

One of the most important parts of this first step is also to ask questions. Yes, asking question is very important in data analysis or in data science. You have data and your question will lead you to go through the next steps of this process.

2. Data Cleaning:

Once data is collected and prepared according to the questions that have been asked to the data, the next step is data cleaning. Data cleaning involves the techniques to make data prepared for the analysis. The quality and category of the data can decide the method of analysis. Sometimes some changes may be brought on to the data like removing or modifying incorrect or irrelevant data.

Data cleaning is considered one of the basic tools of data science. It includes fixing spelling and syntax errors, standardizing data sets, and correcting mistakes such as missing data, and identifying duplicate values.

3. Data Visualization:

Data visualization helps us to get a better understanding of the data. The analysis so often depends on the behavior of the data. And this visualization gives the proper understanding of the data.

Formally, data visualization transforms the data into some graphs or maps to get a visual context out of so much information that is easier to understand and helpful to extract some useful information.

Sometimes it’s really hard to get some information from raw data. To overcome this kind of complex situations, data visualization is performed to get something out of it. Machine learning models are now being used by many companies to do many sorts of analysis. Before and after performing machine learning algorithms, data visualization gives a very useful way to understand how the data is behaving and how well the model performs.

4. Data Analysis:

This is one of the most vital steps throughout the process. Analyzing the data provides some results depending on which further steps are taken to solve any problem. A company has a dataset on its services depending on which their profit is being measured. Analyzing data, it is possible to find out the causes of loss and also to find a way how to minimize the loss. This is how data analysis tells how to go forward or what to do in the next steps.

Data analysis includes mathematical and statistical analysis. After a complete analysis, we come to a conclusion and proper answers to the questions asked to the data at the first step of the process.

5. Report and Develop:

A complete analysis leads us to the conclusion to the question has been asked. Now in the last stage of the process is to make a report which helps people to communicate with the analysis without knowing every step, but only with the conclusions has been drawn out of it. Further, reporting may also include developing data products so that it is easy to get access to it by anyone.

Required Skills:

A lot of talk about data science has been done. I made it as simple as I could actually. Now I think you have the slightest idea about data science. It’s time to find out the answer to another very important question. What are the skills required to be a data scientist? I’m sure you are thinking about this one. We will discuss this in short in this part of the article.

1. Programming Skills:

To be a data scientist, it is a common cause to get encountered with a lot of coding. The programming skill is very useful and mandatory to have if you want to be a data scientist. The common question in this stage “R or Python?”. Well, this debate will go on I think, but learning both of these will be more effective. There are many other languages too but R and Python are two of the most used.

2. Data Visualization:

Some will say, no this is not that important but for me, data visualization is one of the most important skills. Getting a visual concept out of the data helps us to get a proper understanding of the data. How the variables are behaving, on which variables our outcome depends the most, which are the causes for the negative outcome — these sorts of questions are very easy and meaningful to be answered by the visualization. It helps to perform the analysis with more perfection.

3. Mathematics and Statistics:

Fundamental knowledge of mathematics and statistics is required to perform analysis. These include such as calculus, linear algebra, regression analysis, modeling, clustering, dimension reduction, inferential statistics, and some more topics. All of these topics are used to conduct analysis and provide some useful results.

4. Machine Learning:

Machine learning models are very useful for making predictions out of the analyzed data. Throughout the process, keeping it simple is the key and machine learning does not go against the key. The simplest possible models or methods are being used to make forecasting or predicting a recent future outcome. Even it is possible to make long-term predictions depending on the quality of the data.

Data Scientist vs Data Analyst:

So often we get confused with the two terms “Data Scientist” and “Data Analyst”. What is the role of a data scientist? What makes the difference between these two?

Assume that you have a dataset with some information on different kinds of chocolates. And you are looking to find out the most popular category. This is the job of a data analyst. Let's change the question a little bit. Now you are going to find out why that category was most popular. That is the job of a data scientist. This is the fundamental difference between data scientists and data analysts.

Conclusion:

Data science is one of the most interesting things to learn. From a simple dataset, you can find out some information which can be really very useful for your study. Data science is not an easy task to learn either. You have to be really dedicated and must spend time to do this. A piece of good knowledge of programming language and some theories do not mean that you are a very good data scientist. Some researchers spend their life-span researching a single algorithm. So the life of a data scientist is not that easy. But there are many fields and scopes that are being opened up and is growing up as an essential skill, some may wish to learn something about data science to boost up their skill levels.

--

--

Data Science, ML, Image processing. Good hands in R, MATLAB, Python, SPSS, C/Cpp. Always free to connect : https://www.linkedin.com/in/aashiq-reza-2030b516a/