These statistical tests allow researchers to make inferences because they can show whether an observed pattern is due to intervention or chance. There is a wide range of statistical tests. The decision of which statistical test to use depends on the research design, the distribution of the data, and the type of variable.

In general, if the data is normally distributed, parametric tests should be used. If the data is non-normal, non-parametric tests should be used. Below is a list of just a few common statistical tests and their uses.


These tests look for an association between variables.

1. Pearson Correlation

Tests for…

Data Pipeline

Data volumes have increased substantially over the years, as a result of that business needs to work with massive amounts of data. In this case we need data pipeline to handle data flow efficiently, because handle storage, analysis and visualization of data in same system is not a good idea.

Moving data between systems requires many steps: data mining, storage in cloud systems, reformating or merging with other data sources like so.

A data pipeline is the sum of all these steps, it’s job is to automate all these steps and make that these steps all happen reliably to all…

Simple, logistic regression algorithm is used for solving the binary classification problem. Logistic regression can be used for lots of real world application like Email spam or not, Online transaction fraud or not fraud, Tumor Malignant or Benign.

The outcome of logistic regression is dichotomous in nature. Dichotomous means there are only two possible classes.

Linear Regression Equation:

Multiple linear regression can model the relationship between two or more features and response by fitting a linear equation to observed data.

Simple vs Multiple Linear Regression

Simple linear regression have one dependent and one independent variable, but in multiple linear regression the dependent variable is one but there may be two or more independent variables. The steps to perform simple and multiple linear regression are almost the same, the difference lies in the Evaluation. By using evaluation we can find out which factor has the highest impact on the predicted output.

Y= b0 + b1*x1 + b2*x2 + b3*x3 +…… bn*xn Y = Dependent…

It is a powerful field in statistics and machine learning that helps to find the relationship between two or more variables of interest. While there are multiple regression analytics methods are available, at their core they all examine the influence of one or more independent variables on a dependent variable.

Linear Regression
Linear regression helps to model the relationship between two variables by fitting a linear equation to observed data. The two variables are Independent variable(X) and dependent variable(Y).

Regression Model Yi = b0 + b1Xi + ei

Yi — -> Dependent variable for observation i

Xi — ->…

Abin Joy

Secretly loves the story behind the data. :)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store