5 useful Python packages from Kaggle’s kernels you didn’t know existed

Burst your efficiency, speed and models understanding by using them during competitions

Piotr Gabrys
LogicAI
Published in
3 min readAug 21, 2018

--

Kaggle’s kernels are great — they offer computing power for everybody to be able to take part in Data Science competitions. Do you use them? If yes, are you sure that you employ their whole potential?

In this article, I will show you 5 interesting but not very known Python packages that are available with Kaggle’s kernels (as for Aug 21st, 2018). Let’s start the show!

bcolz

Columnar and compressed data containers by Blosc.

Category: Data Storage

Why bother? Great memory saving and speed improvement due to columnar storage. It allows also faster I/O operations.

Cool stuff example: Reduction of Pandas DataFrame memory usage with bcolz:

repo: https://github.com/Blosc/bcolz

docs: http://bcolz.blosc.org/en/latest/

Arrow

Better dates and times for Python by Chris Smith.

Category: Standard Library Enhancements

Why bother? One import for all date and time management

Cool stuff example: You can get humanized date interval in English:

And in a less popular language:

repo: https://github.com/crsmithdev/arrow

docs: https://arrow.readthedocs.io/en/latest/

langdetect

Port of Google’s language-detection library to Python by Mimino666

Category: Natural Language Processing

Why bother? Simple and fast classification of text’s language.

Cool stuff example: Language detection with probability:

repo: https://github.com/Mimino666/langdetect

Bottleneck

Bottleneck is a collection of fast NumPy array functions written in C by kwgoodman.

Category: Data Manipulation

Why bother? You can simply use Bottleneck functions instead of NumPy to obtain faster computation.

Cool stuff example: I cannot find anything cooler than the author did. You just

and get all the speed comparisons (above is just partial output).

repo: https://github.com/kwgoodman/bottleneck

docs: https://kwgoodman.github.io/bottleneck-doc/

ELI5

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions by TeamHG-Memex.

Category: Visualization

Why bother? The package helps you to visualize the way your model works. It may help you debug your models more efficiently.

Cool stuff example: Visualization of a programming language classifier inference on one code sample.

repo: https://github.com/TeamHG-Memex/eli5

docs: https://eli5.readthedocs.io/en/latest/

This is the end of the list. Do you know any other packages that are useful but barely anybody knows them? Let me know in the comment section!

EDIT: There is Part 2 of the article here.

--

--

Piotr Gabrys
LogicAI

Data Scientist, traveler, books lover, husband and parent. Tries to understand the world but constantly fails.