Learn Data Mining by Applying it on Excel

Albert Assaad
Analytics Vidhya
Published in
6 min readNov 25, 2020

In this series, I will be presenting different Machine Learning Methods by applying them to Excel to make them easier to understand…

Of course, when we say data mining which means those methods are applied to Big Data by using a computer program developed for example by Python or R language that reads huge data stored in several databases, we might apply those concepts by coding those ideas in a program in future posts.

This series is meant for beginners in data mining with medium skills in excel.

Our first method to be discussed is Linear Regression:

Linear Regression is used for prediction by training real data that we already have, get the formula, and use it to foretell results.

Linear regression means that the relationship between the target (dependent variable) and the attribute(s) (independent variables) is linear.

There can be one or more independent variable; If we have 1 independent variable the formula to get the dependent variable y=mx +b, nevertheless when we have more than one independent variable the formula will be y=m1x1 + m2x2 + … + mnxn + b (don’t worry everything will be clear briefly when we start our examples)

1- Example with one independent variable:

Let say that a grocery store manager decides to predict cold water bottle sales based on the weather forecast, he already gathered some data that relate weekly averaged daily high temperatures to the cold water bottle sales:

The above data must be entered into an excel sheet please download and open it from here.

When we have one independent variable like in this example there is one easy way to get the formula by creating a chart right-click on its value and select Add Trendline then choose Linear and tick the Display Equation on chart checkbox.

Now you will see a line drawn in the chart have the equation in it:

This is the easiest there is another function even easier called Linest that will mention it later in the document but now we will go to the hard part (this is the purpose of this series to understand the process)by calculating the formula manually using excel functions and cells.

Move the chart far to the right and add the following descriptions and formulas to the corresponding cells:

  • Cell C1 add the text: sum(x)
  • Cell D1 add the text: sum(y)
  • Cell C4 add the text: sum(xy)
  • Cell D4 add the text: sum(x ^ 2)
  • Cell C7 add the text: n=
  • Cell C8 add the text: m=
  • Cell C9 add the text: b=
  • Cell C2 add the formula: =Sum(A:A)
  • Cell D2 add the formula: =Sum(B:B)
  • Cell C5 add the formula: =SumProduct(A:A,B:B)
  • Cell D5 add the formula: =SumProduct(A:A,A:A)
  • Cell D7 add the formula: =Count(A:A)
  • Cell D8 add the formula: =(D7*C5-C2*D2)/(D7*D5-C2 ^2)
  • Cell D9 add the formula: = (D5*D2-C2*C5)/(D7*D5-C2 ^2)
Formula of m
formula of b

You realize that the result in Cell D8 and D9 are the same that we had in the graph…

Now we need to predict the revenue next to each temperature in Cell C12, C13, C14 enter in C12 the following formula and autofill till C14: =D$8*C12 + D$9(which apply our formula y= m*x + b) and here are the predicted values based on our data.

You can find this exercise final result by clicking this link.

Solving the same problem by using the array function Linest :

We’ll gonna delete all the results we made before in our previous exercise and we’ll keep the values in Columns A and B.

Now follow the following steps in order to get m and b using the LINEST function:

1- In cell C1 write m and in D1 write b

2- select both C2 and D2

3- write in the formula bar: =LINEST(B2:B9,A2:A9,TRUE,TRUE) then press ctrl+shift+ enter

4- C2 and D2 will be filled by their values as like we obtained before now the variables of the formula is completed get the count of the temperatures(in our case 8) and proceed with the prediction formula y=mx +b

We can see that the formula has been surrounded automatically by {} the Linest function will only work on the same Row.

2- Example with 2 or more independent value:

Let's say that this same grocery store manager needs to be more accurate and has assembled in addition to his data the average number of customers and the number of holidays so now he has 3 independent values temperature and no. of customers and number of holidays so the equation will be y= m1x1+m2x2+m3x3+b.

We’re gonna solve it by using the LINEST function, use the following instructions you can follow up with me by downloading the excel click this link:

1- In E1 insert the text Predicted

2- In F1 insert the text Error

3- In A11 insert the text Holidays and In B11 insert 1

4- In A12 insert the text Avg number of Customers and B12 insert 2

5- In A13 insert the text Temperature and in B13 insert 3

6-In A14 insert the text Y intercept and in B14 insert 4

7-In E11 insert the text sum of errors

8- In C11 insert the following equation =INDEX(LINEST(D$2:D$9,A$2:C$9,TRUE,TRUE),1,B11) and drag it till C14

The Index will get the calculation of Linest from the 4 Columns selected, Index 1 will give us the value of Holidays, Index 2 will give us the value of Avg number of Customers, Index 3 will give us the value of Temperature and Index of 4 will return the Value of Y-intercept which is b.

following results must appear:

9-Now based on the above values we will calculate our prediction values, In E2 enter the following equation: =A2*D$13+B2*D$12+C2*D$11+D$14 and drag and drop it till E9

10-In F2 enter the following equation and drag it till F9: =POWER(D2-E2,2)

11- In F11 enter the following equation:=SUM(F2:F9) this will calculate the sum of errors, you can always try the values from D11 till D14 in order to have a less amount of error so we will have a better prediction:

You can always check the final result by downloading this link, that’s all for this session I will be soon publishing the next section.

Thank you for reading!!

Learn Data Mining by applying it on Excel(Part 2) ==>

--

--

Albert Assaad
Analytics Vidhya

Multi-platform developer (Android,IOS,Dynamics Nav,C#.net,Vb.net,Python) interested in everything technology.