What are the differences between data analysis, statistics, and machine learning?

Published in

ailys

7 min readJan 24, 2022

These terms derived from the current trends of the fourth industrial revolution. What do these terms mean in the first place? Leaving beyond the abstract term and getting straight to the point, how are they actually being used in business areas?

Putting complex facts aside, here are the inquiries that could be referred to for creating effective business strategies.

1. We would like to closely examine the current market situation or trends and create suitable strategies.
2. We would like to forecast how future strategies would unravel in the future.
3. If the prediction results aren’t great, we want to revise how the strategies should be fashioned in order to generate the optimal effects.

From the methodology crafted to solve all these questions, the terms big data, statistics and machine learning appear. Here are the details.

● Current market status and trends — data analysis

As you would already be very aware, analyzing big data is the fastest way to examine market trends. We can easily confirm this through the keywords, conversations or opinions discussed in all sorts of portal services. Simultaneously it is possible to figure out the product demands of different companies or customer action patterns via data analysis. This is precisely why we collect data from companies through websites, CRM, mobile, IoT, social media, purchase information, surveys, and log information.

Strictly speaking, data analysis is something that utilizes past data rather than current data. Even though data may be collected in real time, by the time point of analysis it already ends up becoming outdated. Along with that, to analyze trends from past to present, there should have been enough time in the first place for accumulation of the materials.

Typically, the more data we collect the easier it should become to track various client opinions or vast market trends. It’s hard to start with just 5 current keywords to come up with the major hot issues trending among the people in their thirties. However, as it is practically impossible to run a research on all 50 million people, we can’t help but dwell on the conclusive methodology to best represent the whole subject.

● Make estimation of the majority from a limited sample — Statistical technique

Here is where the statistical technique comes forth. For example, let’s assume an exit poll where a sample of only 10,000 people was available to help make a prediction in a national election. In this case, would we be able to guarantee an identical result between the 10,000 exit poll participants and the total of 20 million voters?

It’s prone for the results to have some minor contrasts. However, if narrowing down the error range is viable, there should be some way to attain a slight ‘hint’. Therefore a calculation happens in statistical estimation utilizing the confidence interval to figure out how large the error range of the majority would turn out to be. Statistical estimation proceeds with its calculation based on the sample mean, which is generated from a small group of the subjects. The main point is predicting the estimated value of the majority from the small sample.

https://www.omniconvert.com/what-is/sample-size/

Despite the mentioned context above, situations differ when trying to create a corporate business strategy. It’s due to the superior importance of predicting a client’s reaction according to corporate rules or strategies. A company does not yearn to earn just an average value of a certain group but intends to foresee the clients’ reactions.

Needs vary according to the industry or the department the client engages in. For example, insurance companies are likely to have the following inquiries: whose insurance should be issued, which are the client groups likely to cancel the contract, and how much should be the insured amount. In marketing departments, common questions could be which clients have high reaction rates and which events would stir a high reaction rate. Logistics companies will have questions on the apt condition for good sales output, and how much should be held as stock to prepare for unexpected turbulence. Finally, in manufacturing corporations, the major issue would be whether signal data that occurs in the manufacturing process could point out the defects.

As such, if the data portion is exceptionally large and the content is rather too complex for a human being to decipher, machine learning comes in very handy in effectively evaluating data and fabricating a target estimate. Machine learning is also incorporated to predict reactions of the market as well the clients once the corporate strategy is implemented.

Normally, statistics hold a purpose of drawing out an estimate of a large quantity from a minor one. Machine learning is practiced to run future predictions from the past.

● Predict the future from past data - Machine learning

How does machine learning operate to forecast the future? Comparing human learning with machine learning clarifies the concept.

The ultimate reason for humans training or learning something is to earn an agreeable result on a test. It’s best to try out as many trials as possible to be able to crack the given problems without panicking. The key is to train with problems that are along the latest trends. However, revising the 2001 SAT to prepare for one in 2021 would not be the best idea. A 20 years difference would surely imply a lot of changes made to the exam. That’s why it’s essential in ‘human learning’ to select workbooks after thoroughly reviewing the range of possible exam questions.

In that case, would it be fair to announce a student who solves the most practice questions as the best performing one? Of course it wouldn’t hurt to work on as many problems as possible, but it doesn’t always translate to obtaining the highest scores, as you may have gathered from past experience as well.

“Does getting the best grades mean having solved the most practice problems?”

Who were the ones with the best grades in school? Let’s walk down our memory lanes. Even if they might have worked on the same problems with all the rest of us, they somehow knew how to retrieve an insight from a question and apply it to other challenging questions. They didn’t merely memorize the solving method and the answer by heart; they acquired through the process the method and absorbed it to further utilize the idea in the future. Simply, it is a matter of learning, not saving. Along the same line, smart students do not panic when they see new problems, but instead proceed with application of their existing knowledge. They have already absorbed and mastered the solving algorithms.

“Smart students are those who have mastered lots of solving methods”

Let us convert this whole concept of human learning to machine learning. Just like the process flow of a student acquiring solving methods from past exams and then applying them to the final exam, the computer finds the optimal algorithm through data, then to utilize models containing algorithms for future predictions. It’s similar to how once you draw a trend curve in an Excel file, you can get the estimated value of y according to the changing x values. The underlying idea is the algorithm model executing those predictions. The critical difference is that Excel is limited to considering only a few variables, while machine learning can work with thousands of them to create an advanced model.

Human learning and machine learning operate in similar ways. Just as humans learn new solving methods of a problem, machine learning predicts the future through algorithms.

If so, is the role of machine learning solely focused on predicting the future? Business departments run not only future predictions but also aim to closely examine the consequent results of executed strategies. This is precisely where adaptive intelligence jumps into sight- a concept that goes beyond machine learning.

● Optimize strategies based on future predictions — Next autoML, Adaptive Intelligence

Why not get straight into an example? Let’s say we developed a model utilizing the following variables from past data- gender, age, credit grade, contract duration- then with this, we could foresee the probability of a contract with a certain client. After that we could create a cluster of clients with high contract rates to determine their common characteristics. Such an action is called ‘Supervised Clustering’.

Yet, what actions should be taken to see how the rates change if we add another feature apart from the already revised ones? For instance, if a new variable ‘talk time (call duration) with the client’ is imposed, how could we measure its effect on the contract accomplishment rates?

DAVinCI LABS achieves this with a feature, ‘Rule Optimization’. Referring to the example below, in the case of clients with longer than 15 minutes of talk time, the contract rate has shot up from 18.2% to 62.1%. Like so, we can put in diverse variables and monitor their values or adjust the values to view the responsive target change up close.

We view such optimization of business strategy as the unique path of how machine learning should advance. Adaptive intelligence sets the basics here, invigorating the strategic optimization, not just remaining within a simple range of future predictions.

What are the differences between data analysis, statistics, and machine learning?

● Current market status and trends — data analysis

● Make estimation of the majority from a limited sample — Statistical technique

● Predict the future from past data - Machine learning

● Optimize strategies based on future predictions — Next autoML, Adaptive Intelligence

Written by Lewi Kim