5 useful Python packages from Kaggle’s kernels you didn’t know existed (Part 2)

Burst your efficiency, speed and models understanding by using them during competitions

Kaggle’s kernels are great — they offer computing power for everybody to be able to take part in Data Science competitions. Do you use them? If yes, are you sure that you employ their whole potential?

In this Part 2 article, I will show you 5 interesting but not very known Python packages that are available with Kaggle’s kernels (as for Sep 11th, 2018).

You may also be interested in my first article about my Python packages available here.


pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm by Wojciech Muła.

Category: Natural Language Processing

Why bother? Fast multi-pattern string search.

Cools stuff example: I’ve searched all the occurrences of US cities names in scikit-learn 20newsgroups articles:

The Aho-Corasick algorithm performed the searched over 500x faster than the naive implementation.

NOTE: The Automation trie has to be computed. For a long list, it takes some time. The good news is that it can be pickled and saved for future utilization.

repo: https://github.com/WojciechMula/pyahocorasick

docs: https://pyahocorasick.readthedocs.io/en/latest/


Pandas-profiling

Create HTML profiling reports from pandas DataFrame objects.

Category: Data Exploration

Why bother? Data exploration in just one line of code. It can save much time and effort. The report is interactive and looks great.

Cools stuff example: I have nothing to add to these outputs:

Dataset summary:

Single variable exploration:

repo: https://github.com/pandas-profiling/pandas-profiling

full analysis example: http://nbviewer.jupyter.org/github/JosPolfliet/pandas-profiling/blob/master/examples/meteorites.ipynb


funcy

A collection of fancy functional tools focused on practicality by Sour.

Category: Standard Library Enhancements

Why bother? Big collection of useful functions which save time and add clarity to your code.

Cools stuff example: Dictionaries merging in one line of code:

And pairs generator:

There are plenty of functions like these above. I strongly recommend you checking the docs.

repo: https://github.com/Suor/funcy

docs: https://funcy.readthedocs.io/en/stable/


fancyimpute

A variety of matrix completion and imputation algorithms implemented in Python by iskandr.

Category: Data Preparation

Why bother? Easy access to 8 imputation algorithms.

Cools stuff example: Imputation with MICE (Multiple Imputation by Chained Equations):

repo: https://github.com/iskandr/fancyimpute


missingno

Missing data visualization module for Python by ResidentMario.

Category: Data Exploration

Why bother? Fast missing values visualization.

Cools stuff example: Quick data completeness overview:

Everything should be made as simple as possible, but not simpler. Albert Einstein

repo: https://github.com/ResidentMario/missingno


Have you already used the packages? Are you planning to? What packages should be covered in Part 3? Let me know in the comments or just clap if you liked the article!