5 useful Python packages from Kaggle’s kernels you didn’t know existed
Burst your efficiency, speed and models understanding by using them during competitions
Kaggle’s kernels are great — they offer computing power for everybody to be able to take part in Data Science competitions. Do you use them? If yes, are you sure that you employ their whole potential?
In this article, I will show you 5 interesting but not very known Python packages that are available with Kaggle’s kernels (as for Aug 21st, 2018). Let’s start the show!
bcolz
Columnar and compressed data containers by Blosc.
Category: Data Storage
Why bother? Great memory saving and speed improvement due to columnar storage. It allows also faster I/O operations.
Cool stuff example: Reduction of Pandas DataFrame memory usage with bcolz:
Arrow
Better dates and times for Python by Chris Smith.
Category: Standard Library Enhancements
Why bother? One import for all date and time management
Cool stuff example: You can get humanized date interval in English:
And in a less popular language:
langdetect
Port of Google’s language-detection library to Python by Mimino666
Category: Natural Language Processing
Why bother? Simple and fast classification of text’s language.
Cool stuff example: Language detection with probability:
Bottleneck
Bottleneck is a collection of fast NumPy array functions written in C by kwgoodman.
Category: Data Manipulation
Why bother? You can simply use Bottleneck functions instead of NumPy to obtain faster computation.
Cool stuff example: I cannot find anything cooler than the author did. You just
and get all the speed comparisons (above is just partial output).
ELI5
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions by TeamHG-Memex.
Category: Visualization
Why bother? The package helps you to visualize the way your model works. It may help you debug your models more efficiently.
Cool stuff example: Visualization of a programming language classifier inference on one code sample.
This is the end of the list. Do you know any other packages that are useful but barely anybody knows them? Let me know in the comment section!
EDIT: There is Part 2 of the article here.