Analysis for everything?
Recently I was intrigued by the term ‘business intelligence’, I was like what type of intelligence is this now, sigh?. But was soon more curious as I read about it, in simple words, BI is taking data from past, comparing with present, churning the damn numbers and making correct decisions for the future. Sounds cool right? It is basically another application of data science and surprisingly it existed in the companies even before data analysis! Well, BI is whole another domain to cover and learn. I just wanted to get you excited for my article about data science.
I was very excited as I enrolled for the course of data analytics for the semester as this topic has always fascinated me one way the other. Doing the ‘Introduction to Data Science’ on Cognitive AI platform just enlightened me with different questions like, “What is data science?”, “What skills to be acquired to get into this domain?”, “What tools and languages are used?” and many more! Data science is a field where we use different statistical and logical measures, methods, and algorithms for analyzing the given set of data and finally getting the required conclusions about the same.
The term itself sounds so cool to me! It was coined by DJ Patil, an American mathematician and computer scientist.
Data is a collection of facts related to an object, numbers, and measurements from sensors, observations of an experiment, or even past conclusions made over the years. So then we just analyze it and we are done right? Well, it is a series of steps to get to the right analysis. First is, Data Requirements Specification as trivial it might sound, but targeting the problem and identifying the factors is definitely a big task. For example, stock market analysis, I mentioned this and lots of uncertain factors going in your head, yes so targeting a specific stock with current trading price, demand and supply factors or some unprecedented factors like a lawsuit, splits, mergers, etc has to be done carefully looking over the problem. This is just again whole another interesting thing to look into.
The second step of data analysis is Data Collection. Yes, this step also feels trivial but collecting enough data is also a task. Almost 10–15 years back when companies had to make decisions regarding some product performance, they had to rely on data on papers, books, registers, etc. Now we are blessed with the Internet users that have increased over the years in turn increasing the data produced. Collecting the right data on the targeted variables in the previous step. Data is collected from various resources and hence is not structured and may contain redundant or irrelevant information, thus making the next step important i.e. Data Processing. This step includes structuring the data as required for the further analysis process. Like placing the data into rows and columns for ease of access, checking out for redundant information or wrong values for the same variable, etc. This step facilitates the next step of Data Cleaning. The processed data might be incomplete in some way or has errors. We correct or prevent the further errors through data cleaning methods. While I was learning to clean the data given, it’s actually useless when you have blank values or null values in the table. It implies that the data pertaining to the particular factor at that time isn’t correct and might induce the errors while training the model. Hence it was good to remove the null values completely or else replace them with a number that is the average or threshold or maybe just assign it ‘0’.
The final step is our topic of interest, Data Analysis. In this step we can apply statistical methods like regression analysis, correlation to identify how the target variables and our output is correlated. The data I worked on was the handwriting data of certain letters like ‘m’, ‘v’ etc. This data I could make sense of only when I used the techniques of visualizing it like plot, graphs, etc. Data visualization methods also help a lot in analyzing the data. If we get biased output or get something haphazard then we need to iterate through these steps again to get a clear nonbiased output.
Well that was just one aspect of data science, Wikipedia says that:
It is a “concept to unify statistics, data analysis, machine learning, domain knowledge” in order to “understand and analyze the actual phenomena” with data.
Jim Gray a Turing Award Winner (Prize given by Association of Computing Machinery) claimed data science to be the “fourth paradigm” of science and asserted that “everything about science is changing because of the impact of the IT and data deluge”. In conclusion, I would like to mention that big data is gaining quite a lot of attention and becoming a really vital tool for industries these days. So even though we might be from totally different career backgrounds, we need to have a little insight about data science and how it can help us and make the product/service we provide better!