A selection of the most important and beautiful mathematical equations

Picture by Antoine Dautry on Unsplash

Throughout the ages, mathematics has helped us to better understand the world, and likewise, we used the world to better understand math. The discipline of mathematics is built on human experiences and common ideas developed over thousands of years. Some new mathematical discoveries have changed the world, in the sense that by the use of math, we are able to explain more phenomenons and connections that appear in this world.

In this article, I will show 10 of the most important and beautiful mathematical equations. …


Tricks to improve your machine learning models in Python with scikit-learn (sklearn)

Scikit-learn (sklearn) is a powerful open source machine learning library built on top of the Python programming language. This library contains a lot of efficient tools for machine learning and statistical modeling, including various classification, regression, and clustering algorithms.

In this article, I will show 6 tricks regarding the scikit-learn library to make certain programming practices a bit easier.

1. Generate random dummy data

To generate random ‘dummy’ data, we can make use of the make_classification() function in case of classification data, and make_regression() function in case of regression data. …


Picture by Michael Dziedzic on Unsplash

Linear algebra is behind the powerful machine learning algorithms we are so familiar with

Linear algebra is a field of mathematics that is widely used in various disciplines. The field of data science also leans on many different applications of linear algebra. This does not mean that every data scientist needs to have an extraordinary mathematical background, since the amount of math you will be dealing with depends a lot on your role. However, a good understanding of linear algebra really enhances the understanding of many machine learning algorithms. Foremost, to really understand deep learning algorithms, linear algebra is essential. …


Learn how to apply the logistic regression for binary classification by making use of the scikit-learn package within Python

Photo by Pietro Jeng on Unsplash

The process of differentiating categorical data using predictive techniques is called classification. One of the most widely used classification techniques is the logistic regression. For the theoretical foundation of the logistic regression, please see my previous article.

In this article, we are going to apply the logistic regression to a binary classification problem, making use of the scikit-learn (sklearn) package available in the Python programming language.

Titanic Dataset

We will use the Titanic dataset (available on Kaggle), where the goal is to predict survival on the Titanic. …


Get to know one of the most widely used classification techniques

The process of differentiating categorical data using predictive techniques is called classification. On the basis of training data, consisting of observations whose category membership is known, the classifier (the algorithm that implements classification) should learn, on the basis of explanatory variables (features), to which category new observations belong.

Binary Classification Procedure

In this article, we consider the logistic regression, which is one of the most fundamental and widely used classification methods.

Logistic Regression — What are we modeling?

The logistic regression models a binary categorical dependent variable Y, which can take on two possible values; “0” or “1”. These two values represent the two categories to which the observations could…


Selecting subsets of Pandas Series and DataFrames using .loc and .iloc

Pandas is a powerful open source data analysis and manipulation tool, built on top of the Python programming language. The Series and DataFrame data structures in the pandas library are almost inevitable to use when performing tasks related to data analysis in Python. In many cases, when making use of these data structures, we want to access only a selected group of rows and columns (also known as subset selection). To perform this task, we can make use of the .loc and .iloc methods. What exactly is the difference between these two methods, and how do we use them? …


10 basic tricks to make your pandas life a bit easier

Pandas is a powerful open source data analysis and manipulation tool, built on top of the Python programming language. In this article, I will show 10 tricks regarding the pandas DataFrame to make certain programming practices a bit easier.

Of course, before we can use pandas, we have to import it by using the following command:

import pandas as pd

1. Select multiple rows and columns using .loc

countries = pd.DataFrame({
'country': ['United States', 'The Netherlands', 'Spain', 'Mexico', 'Australia'],
'capital': ['Washington D.C.', 'Amsterdam', 'Madrid', 'Mexico City', 'Canberra'],
'continent': ['North America', 'Europe', 'Europe', 'North America', 'Australia'],
'language': ['English', 'Dutch', 'Spanish', 'Spanish', 'English']})


Explaining the basics and essential functions of the Matplotlib library through examples with code

Data visualization is the presentation of data in an accessible manner through visual tools like graphs or charts. These visualizations aid the process of communicating insights and relationships within the data, and are an essential part of data analysis. In this article, we treat Matplotlib, which is the most popular data visualization library within the Python programming language.

Contents

  1. Preliminaries
  2. Scatter plots
  3. Bar charts
  4. Histograms
  5. Boxplots

1. Preliminaries

Matplotlib is a very well documented package. To make the plotting easier, we make use of the pyplot module, that makes Matplotlib work like MATLAB. Essentially, all its functioning can be found HERE. …


Introduction to the core concepts of simple linear regression and OLS estimation

Background

Regression analysis is an important statistical method for the analysis of data. By applying regression analysis, we are able to examine the relationship between a dependent variable and one or more independent variables. In this article, I am going to introduce the most common form of regression analysis, which is the linear regression. As the name suggests, this type of regression is a linear approach to modeling the relationship between the variables of interest.

Method

Linear regression is used to study the linear relationship between a dependent variable (y) and one or more independent variables (X). The linearity of the relationship…

Maurizio Sluijmers

Graduate student in Econometrics | Data Science | Machine Learning from The Netherlands.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store