Jul 21Support Vector Machine (SVM) in PythonIntroduction Support vector machine (SVM) is a supervised model used in Binary classification problem. In particular, SVM projects data to higher dimension, finds the optimal hyperplane that can maximize the soft margin, and uses that hyperplane as a threshold to classify new data points. I know that’s a lot. In…Svm7 min read

Jul 6XGBoost (Classification) in PythonIntroduction In the previous articles, we introduced Decision tree, compared decision tree with Random forest, compared random forest with AdaBoost, and compared AdaBoost with Gradient boosting. It has been quite a journey. Unfortunately, everything has an end. This article will end the tree algorithm series. In particular, we’ll first look…Xgboost11 min read

Jul 3AdaBoost vs. Gradient boosting (Classification) in PythonIntroduction In the previous article, we talked about how to use AdaBoost to build a boosting ensemble model. Generally speaking, the weight of each sample is modified during each iteration to reduce the prediction error, and the weight of each tree is different when making final classification. AdaBoost is the…Gradient Boosting11 min read

Jun 30Random forest vs. AdaBoost in PythonIntroduction In the previous article, we talked about how to use Bagging to build a random forest. To recap, random forest combines multiple independent decision trees and makes the final classification based on the majority vote. However, we can do more — we can build a set of dependent decision…Adaboost9 min read

Jun 29Decision tree vs. Random forest in PythonIntroduction In the previous article, we talked about decision tree in terms of its structure and parameters, as well as the example of Titanic dataset. However, decision tree is essentially a weak learner since the accuracy of a tree model is generally low (i.e., due to limited tree depth and…Random Forest7 min read

Jun 27Why and How to normalize a relational database (with simple example)?Introduction Technically speaking, we do not normalize a database, we normalize tables in the database. In specific, normalization is a way to organize data in the table to reduce redundancy and eliminate anomalies, and it contains multiple normal forms (i.e., 1NF — First normal form). Each normal form has a…Normalization10 min read

Published in CodeX·Jun 24Basic SQL statements you need to know (GROUP BY/HAVING/INNER JOIN)Introduction In the previous article, we talked about the very basic SQL statements. In this article, we’ll focus on slightly more complex ones — GROUP BY, HAVING, and INNER JOIN, yet they are very powerful. SQL statements In the following demonstration, we’ll use the same table as an example. The…Sql8 min read

Published in CodeX·Jun 14Basic SQL statements you need to know (SELECT/DISTINCT/ORDER BY/LIMIT/WHERE)Introduction In this article, we’ll go through basic SQL statements that are frequently used (and the tips), including SELECT, ORDER BY, LIMIT, DISTINCT, and WHERE. In the next article, we’ll go through GROUP BY, HAVING, and INNER JOIN. …Sql10 min read

May 21Evaluation of regression modelsIntroduction In the previous article, we talked about how to use confusion matrix to evaluate classification models (i.e. accuracy, precision and recall). However, regression model predicts numbers, not the label. Thus, we have to apply different methods to evaluate the model’s performance. …Regression Modeling3 min read

May 4Association mining — Leverage in PythonIntroduction In the previous article, we looked at how to use lift to measure the correlation between item sets. In fact, leverage also measure the correlation between item sets, except they use a slightly different formula. Leverage Leverage measures the correlation between item sets by comparing the support of item…Association Rule3 min read