Tagged in

Data Science

Data Science & Design

All about Data Science, Machine Learning, and Design. Also, lot of things about Statistics, Data Visualization, Benchmarking, and funny stuff.

More information

Followers

1.1K

Elsewhere

More, on Medium

Data Science

Laurae in Data Science & Design

Apr 23, 2017

Visiting: Categorical Features and Encoding in Decision Trees

12 responses

Laurae in Data Science & Design

Jan 10, 2017

Benchmarking LightGBM: Float vs Double

We have seen previously that LightGBM was extremely fast, much faster than xgboost with default settings in R. Recently, to fix a prediction bug in LightGBM, a switch from float to double (for prediction-related functions) was made to fix that issue. Now that…

1 response

Laurae in Data Science & Design

Jan 9, 2017

Benchmarking LightGBM: how fast is LightGBM vs xgboost?

This post is about benchmarking LightGBM and xgboost (exact method) on a customized Bosch data set. I have seen xgboost being 10 times slower than LightGBM during the Bosch competition, but now we got back with some numbers to…

2 responses

Laurae in Data Science & Design

Jan 7, 2017

xgboost’s New Fast Histogram (tree_method = hist)

Laurae: This post is about the new feature of xgboost: the histogram tree grow method. Currently, it provides error in R but works in Python (?). You can find the pull request #1940 here. I’ll get benchmarks on my customized Bosch data set when…

1 response

Laurae in Data Science & Design

Dec 7, 2016

“This leaderboard is calculated on all of the test data.” (meme)

Laurae: the topic post can be found on Kaggle.

Laurae wrote:

(until we all read: “(not available until the second stage of the competition)”)

Laurae in Data Science & Design

Dec 7, 2016

Understanding a bit xgboost’s Generalized Linear Model (gblinear)

Laurae: This post is about xgboost’s gblinear and its parameters. Elastic Net? Generalized Linear Model? Gradient Descent? Coordinate Descent?… The post was originally at Kaggle.

1 response

Laurae in Data Science & Design

Dec 7, 2016

CPU Maximum Power for $1,000 (non desktop)

Laurae: This post is about what Intel CPU to look for if you want a powerful server/workstation. This is an insight from my “IT background” I do not even have (I hold over 40+ IT certifications, duh). If you do not know what to look for, ask on Reddit on…

1 response

Laurae in Data Science & Design

Nov 25, 2016

Virtualization & (Hyperthreading) Machine Learning Performance (Windows) (Part 2)

Non-Kaggle post about the impact of Virtualized CPU cores / Sockets on Machine Learning / Optimization problems, specifically on xgboost and VMware (Linux host, Windows client). I found this…

Laurae in Data Science & Design

Nov 14, 2016

Virtualization & (Hyperthreading) Machine Learning Performance (Windows) (Part 1)

Non-Kaggle post about the impact of Virtualized CPU cores / Sockets on Machine Learning / Optimization problems, specifically on xgboost and VMware (Linux host, Windows client). I found this…

Laurae in Data Science & Design

Nov 13, 2016

Maximum & Fast readability of multivariate data vs Label

Laurae: This post is about plotting data to maximize readability so you can read fast multivariate data vs a single label. Obviously, if there are interactions, it will be harder to notice them and you would go with regression…