Feature Engineering; Fast Randomized SVD; Visualizing Percentage Change

Weekly Reading List #4

Source

Issue #4: 2018/05/07 to 2018/05/13

This is an experimental series in which I briefly introduce the interesting data science stuffs I read, watched, or listened to during the week. Please give this post some claps if you’d like this series to be continued.

Feature Engineering

Someone brought up this set of slides by Dmitry Larko when talking about “weight of evidence” encoding in Kaggle TalkingData AdTracking Fraud Detection Challenge:

Link to the Slides (PDF)

It’s a very good resource especially when you ran out of feature engineering ideas.

Using Randomness to Make Code Much Faster by Rachel Thomas

I ran into this video of a talk given a while ago. Rachel Thomas used fast randomized SVD as an example to show us how adding randomness can greatly improve performance of a program that does not require too much precision. Also motivated me to refresh my knowledge of SVD and general linear algebra.

Mike Bostock on Visualizing Percentage Change

Here’s one way to understand the logic — if a quantity went down by 50%, to go back to the previous level, it requires a 100% increase instead of 50% increase. So in this sense, -50% is as big as +100%.

Or as Mr. Bostock put it, (as an example) comparing 200% and -200% makes no sense when counting people.

Fast.ai 2018 DL Course Notes by Hiromi Suenaga

I’ve started doing a speed run through the new Fast.ai (based on PyTorch) course (Part 1) this week. Coincidentally, Part 2 was also officially launched on May 7th. So the timing couldn’t be better.

Hiromi Suenaga has shared her detailed course notes. It’s a bit hard to go between lessons, so here’s a quick list:

Part 1

Part 2