PinnedSetting-Up Kaggle Environment in few LinesHey there! I have just started to write blog posts about the tricks I would like to share about data science and machine learning. I will…Oct 16, 20222Oct 16, 20222
Leveraging Distributional Statistics in Anomaly DetectionAnomaly detection is the process of identifying rare or unusual patterns in data that do not conform to expected behavior. Applications…Nov 7Nov 7
Standardize and Correct Your Text, Fast!When you want to use text data in your projects, there may be some points where you encounter difficulties. Let’s say you have noisy…Oct 1Oct 1
My Competition Summary: ISIC 2024A competition is just ended! ISIC 2024 - Skin Cancer Detection with 3D-TBP challenged us to develop advanced image-based algorithms capable…Sep 13Sep 13
Quadratic Weighted Kappa (QWK) Metric and How to Optimize ItAccuracy, precision, recall, and F1 score are commonly used metrics for most of the classification problems. But some specific scenarios…Jul 19Jul 19
Nested Cross-Validation Against OverfittingIn machine learning tasks, we check our models with a validation set so that they do not overfit. In fact, we use the cross-validation…Dec 4, 2022Dec 4, 2022
Speeding up I/O: Parquet and FeatherSome of our problems consist of data we read from local storage. Read-process-write operations can be comfortable ,n relatively small…Nov 27, 2022Nov 27, 2022
F-Beta: Weighting Precision and RecallWe are using some standard metrics / evaluation functions to get an insight on robustness and reliability of our classifier models. The…Nov 15, 2022Nov 15, 2022
Adversarial Validation: a Sanity Checker and an ExploiterIdeally, we would expect our training and test data to come from similar distributions. However, the opposite can happen in some real-life…Oct 30, 20221Oct 30, 20221
Stratification on Regression ProblemsHi! In this article I am going to try to make an example on how to generate splits on regression problems with preserving the…Oct 23, 20221Oct 23, 20221