PinnedSHAP Value Dilution: How XGBoost Feature Sampling MisleadsInterpreting machine learning models is critical in high-stakes applications like finance, healthcare, and fraud detection. SHAP (SHapley…Feb 23Feb 23
Demystifying Gibbs Sampling: Ergodicity, Detailed Balance, and ConvergenceMarkov Chain Monte Carlo (MCMC) methods, such as Gibbs sampling, are widely used in Bayesian inference and probabilistic modeling. However…Feb 28Feb 28
Feature Importance Dilution in XGBoost: Just Like SHAP or Different?Gain-based feature importance is another popular metric to evaluate in machine learning models for both variable reduction and model…Feb 8Feb 8
No Hype, Just Results: DeepSeek Educated Me Better on Correlation and VIFIn the world of data science, having too many variables can be both a blessing and a curse. Doing variable reduction helps to improve model…Jan 28Jan 28
Causal Inference 102 EP04: Implementation of Inverse Probability WeightingIn this post, I will demonstrate how to implement Inverse Probability Weighting (IPW) from scratch. For simplicity, I use the well-known…Jan 6, 2023Jan 6, 2023
Forecasting 101 EP08: CointegrationIn this post, we are going to talk about cointegration. It is a concept closely related to the unit root process in a multivariate model…Nov 24, 2022Nov 24, 2022
Forecasting 101 EP07: Multivariate ModelsIn this series, we have spent a lot of time talking about ARIMA models, which are essentially univariate models. In this post, we discuss…Nov 24, 2022Nov 24, 2022
Forecasting 101 EP06: Model Identification and EvaluationIn this post, we will talk about how to identify the model and evaluate them. The full notebook can be found here. Let’s first take a look…Nov 8, 2022Nov 8, 2022