3 Python Packages that make Data Science Simple
Mito is a spreadsheet front-end for Python. You can call Mito into your Jupyter Notebook and each edit you make in the front-end will generate the equivalent Python.
Here is a video demo:
To install Mito, use these three commands:
python -m pip install mitoinstaller
python -m mitoinstaller install
python -m jupyter lab
Here is a link to the full install instructions.
Mito is a great package for slicing and dicing your data. Mito allows the users to create interactive pivot tables and graphs with just a few clicks.
Mito pivot tables are a great way to see relationships between different variables and group the data in a way that makes insights more apparent.
You can configure a Mito pivot table by selecting the Pivot button from the toolbar and then choosing your rows, columns, values and aggregation types.
Each edit in Mito generates the equivalent Python in the code cell below. It is a much faster way of producing code than constantly heading to Stack Overflow to find the correct syntax.
The pivot table above generates this code and auto-comments it as well!
Mito does not just generate the code for pivot tables. In Mito, you can merge datasets, filter, sort, use functions, look at summary statistics, and more — and Mito will generate the equivalent Python for each of these edits.
Mito also allows the users to generate dynamic Plotly charts without any coding required. Plotly is an amazing Python graphing package.
To create a Plotly chart, all the user has to do is click the graph button and select their axes.
Here is Mito’s full documentation.
Pandas Profiling takes the df.describe() function from Pandas and elaborates on the functionality, providing amazing summary information for a dataframe quickly and efficiently.
Pandas Profiling is a great tool for exploratory data analysis.
You can install the package locally with these commands:
import sys
!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension
Pandas Profiling provides advanced summary statistics and information for a dataset with out having to write very much code at all.
Here it the full description of the Pandas Profiling functionality, as described on the documentation website:
Two really powerful features are the reports on Missing Values and Descriptive Statistics. When analyzing a new dataset, handling missing values can be a pain. Pandas Profiling makes this process much easier. The descriptive statistics are great for understanding the dataset more in depth before proceeding in your analysis.
Lux is a great package for visualizing data. The tedium of getting the code exactly correct to make the chart you want can be a large time sink. Lux recommends graphs for you, that you can select with a click of a button.
Lux can be applied to any dataframe and will automatically suggest graphs that the user can choose from.
The users can also use the intent function to pass in the columns they are interested in exploring and Lux will automatically suggest graphs.
Lux can be installed with a simple line of Python:
**pip install lux-api**
I hope these packages are helpful. If you have any comments or questions, please leave a reply — or reach out to me: jake@sagacollab.com :)