Throughout the ages, mathematics has helped us to better understand the world, and likewise, we used the world to better understand math. The discipline of mathematics is built on human experiences and common ideas developed over thousands of years. Some new mathematical discoveries have changed the world, in the sense that by the use of math, we are able to explain more phenomenons and connections that appear in this world.

In this article, I will show 10 of the most important and beautiful mathematical equations. …

**Scikit-learn (sklearn) **is a powerful open source **machine learning library **built on top of the Python programming language. This library contains a lot of efficient tools for machine learning and statistical modeling, including various classification, regression, and clustering algorithms.

In this article, I will show 6 tricks regarding the scikit-learn library to make certain programming practices a bit easier.

To generate random ‘dummy’ data, we can make use of the `make_classification()`

function in case of **classification data**, and `make_regression()`

function in case of **regression data**. …

**Linear algebra** is a field of mathematics that is widely used in various disciplines. The field of **data science** also leans on many different applications of linear algebra. This does not mean that every data scientist needs to have an extraordinary mathematical background, since the amount of math you will be dealing with depends a lot on your role. However, a good understanding of linear algebra really enhances the understanding of many **machine learning algorithms**. Foremost, to really understand **deep learning algorithms**, linear algebra is essential. …

The process of differentiating categorical data using predictive techniques is called **classification**. One of the most widely used classification techniques is the **logistic regression**. For the theoretical foundation of the logistic regression, please see my previous article.

In this article, we are going to apply the logistic regression to a **binary classification** problem, making use of the **scikit-learn (sklearn) **package available in the Python programming language.

We will use the Titanic dataset (available on Kaggle), where the goal is to predict survival on the Titanic. …

The process of differentiating categorical data using predictive techniques is called **classification**. On the basis of training data, consisting of observations whose category membership is known, the **classifier **(the algorithm that implements classification)** **should learn, on the basis of **explanatory variables **(features), to which category new observations belong.

In this article, we consider the **logistic regression**, which is one of the most fundamental and widely used classification methods.

The logistic regression models a **binary categorical dependent variable **Y, which can take on two possible values; “0” or “1”. These two values represent the two categories to which the observations could…

**Pandas** is a powerful open source data analysis and manipulation tool, built on top of the Python programming language. The **Series** and **DataFrame** data structures in the pandas library are almost inevitable to use when performing tasks related to data analysis in Python. In many cases, when making use of these data structures, we want to access only a selected group of rows and columns (also known as **subset selection**). To perform this task, we can make use of the `.loc`

and `.iloc`

methods. What exactly is the difference between these two methods, and how do we use them? …

**Pandas** is a powerful open source data analysis and manipulation tool, built on top of the Python programming language. In this article, I will show 10 tricks regarding the pandas **DataFrame** to make certain programming practices a bit easier.

Of course, before we can use pandas, we have to import it by using the following command:

`import pandas as pd`

`countries = pd.DataFrame({`

'country': ['United States', 'The Netherlands', 'Spain', 'Mexico', 'Australia'],

'capital': ['Washington D.C.', 'Amsterdam', 'Madrid', 'Mexico City', 'Canberra'],

'continent': ['North America', 'Europe', 'Europe', 'North America', 'Australia'],

'language': ['English', 'Dutch', 'Spanish', 'Spanish', 'English']})

**Data visualization** is the presentation of data in an accessible manner through visual tools like graphs or charts. These visualizations aid the process of communicating insights and relationships within the data, and are an essential part of data analysis. In this article, we treat **Matplotlib**, which is the most popular data visualization library within the Python programming language.

**Preliminaries****Scatter plots****Bar charts****Histograms****Boxplots**

Matplotlib is a very well documented package. To make the plotting easier, we make use of the **pyplot** module, that makes Matplotlib work like MATLAB. Essentially, all its functioning can be found HERE. …

**Regression analysis** is an important statistical method for the analysis of data. By applying regression analysis, we are able to examine the relationship between a dependent variable and one or more independent variables. In this article, I am going to introduce the most common form of regression analysis, which is the **linear regression**. As the name suggests, this type of regression is a **linear **approach to modeling the relationship between the variables of interest.

Linear regression is used to study the linear relationship between a **dependent variable** (y) and one or more **independent variables** (**X**). The linearity of the relationship…

Graduate student in Econometrics | Data Science | Machine Learning from The Netherlands.