If you took a down payment on your mortgage, most likely you are taking an Amortizing Loan.
An amortization loan refers to an exact amount you pay monthly so that by the end of the loan term you paid off the debt and the interest.
The monthly amortization consists of interest payments and principal payments. The interest payment goes toward the interest while the principal payment contributes to your actual debt.
Notably, when your debt goes down, the interest of the following month reduce, and the principal increases. …
One of the most joyful activities in analytics is working with beautiful visualization. With the Variable Factor Map, you can explain Principal Component Analysis with ease. A picture worth a thousand words
Principle Component Analysis (PCA), is a dimensionality-reduction method that is used to reduce the dimensionality of large data sets. It transforms multiple features into a much less number of new features while maintaining most of the information and variability of the original data.
If the number of features is 2, we can put them in a 2D plot and visualize how different features factor in each new component…
The Monty Hall Problem is a famous probability puzzle in statistics. It is named after Monty, the host of the television game show “Let’s Makes a Deal”. The brain teaser loosely replicates the game show concept and it goes like this:
There are 3 doors. You will have to choose a door, and you will win whatever behind it. There is one door with a car. Each remaining door has a goat. First, you are asked to pick one of the doors. Next, Monty, who knows what’s behind each of the doors, opens up one of the two doors you…
When talking about the decision trees, I always imagine a list of questions I would ask my girlfriend when she does not know what she wants for dinner: Do you want to eat something with the noodle? How much do you want to spend? Asian or Western? Healthy or junk food?
Making a list of questions to narrow down the options is essentially the idea behind decision trees. More formally, the decision tree is the algorithm that partitions the observations into similar data points based on their features.
The decision tree is a supervised learning model that has the tree-like…
Natural language processing is an interesting field because it is thought-provoking to disambiguate the input sentence to produce the machine representation language. Take a look at the famous Groucho Marx’s joke:
One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.
At the human level, there are several interpretations of this sentence. But it is almost impossible for the computer to comprehend.
Nonetheless, the learning curve of NLP is not so steep and can be captivating. In this project, I will explain an introductory level of natural language processing with the Beatles…
A data science team is the core of every big company. A successful data scientist needs to be the head for business strategy, can make discoveries and vision through data, and can convince stakeholders through communication and visualization. However, the amount of data nowadays increase exponentially and it makes data science job more sophisticated. Every company starts to understand the importance of data and collect them as much as they can. Data size increases from a few gigabytes to several petabytes. Cloud storage and computing becomes more popular. End-user organizations are adopting cloud storage solutions as primary storage options. Therefore…
In the job searching process, I found that many companies in my area looking for data scientists having knowledge of oil and gas. My background is mostly about mathematics, so I decided to go on an adventure along the oil pipeline.
One thing that I noticed is oil and gas is a big industry with old infrastructure. In 2014, Inside Energy reported that 45% of the U.S. crude oil pipeline was more than 50 years old. Some pipeline even laid in and before the 1920s is still in operation. …
The Receiver Operating Characteristic (ROC) curve is a probability curve that illustrates how good our binary classification is in classifying classes based on true-positive and false-positive rates.
The Area Under Curve (AUC) is a metric that ranges from 0 to 1. It is the area under the (ROC) curve.
Why is understanding the ROC curve and AUC important for a data scientist? To answer this question, let’s take a look at the breast cancer data set below:
from sklearn.datasets import load_breast_cancer
import pandas as pddata=load_breast_cancer()
The most fascinating about generative deep learning, such as auto-encoder is that the machine can teach itself to be creative. The algorithm simply mimics the way humans learn and innovate. When first encounter a new concept, one needs to read, listen, memorize what important, and then practice. The more training that one undergoes, the more one’s creativity becomes cohesive and logical. Creating new things base on previous knowledge is exactly what an auto-encoder is capable of!
In this blog, I will introduce my understanding of autoencoder (AE), what it does and why it is useful. I will first introduce the…
In this notebook, I will introduce different approaches to encode categorical data. I want to write it in a simple language with a little math involved. They are
12 basic encoding schemes that you should put in your tool kit. For convenience purposes, I will also provide the
scikit-learn version and the library corresponding to each method.
By the end of this post, I hope that you would have a better idea of how to deal with categorical data. You can find my code here.
There are three main routes to encode the string data type:
Classic Encoders: well known…