3 Python Tools Every Data Scientist Should Use

Jake from Mito
trymito
Published in
3 min readSep 20, 2021
  1. Mito

Mito is a free Python package that allows the users to call a spreadsheet interface into their Jupyter environment. Every edit you make in the spreadsheet will generate the equivalent Python in the code cell below.

Mito is great for Python users who want to generate their syntax more quickly, without needing to go to Stack Overflow or google. It is also used by Excel users, who want to transition their skills to Python.

Here is a demo video:

To install Mito, run these commands:

python -m pip install mitoinstaller
python -m mitoinstaller install

Then open Jupyter Lab, and this code should appear:

import mitosheet
mitosheet.sheet()

Just run those commands to render the Mitosheet.

Mito has lot’s of great functionality for exploratory data analysis, data cleaning, and data analysis, including:

  • Generating graphs and the equivalent code
  • Creating pivot tables
  • Merging datasets together
  • Using spreadsheet formulas
  • Filtering and sorting datasets
  • Looking at summary statistics
  • Filling null values
  • and much more!

Here is the full documentation.

2. Streamlit

Streamlit recognizes that data science work is meant to be shared and understood by people outside of your data science team. Streamlit makes it possible to turn a Python script into a user friendly data app.

Here is their website.

Streamlit makes it simple to add interactive widgets to any analysis.

https://streamlit.io/

To install Streamlit, run these commands:

pip install streamlit
streamlit hello

They also make it easy to deploy your apps. Figuring out deployment can often be the most tedious part of creating a data app. All you have to do is select your repository, branch, and main file path, and Streamlit will deploy it.

Data science teams can often work insularly. Streamlit allows the team to focus on their analysis, but gives them the ability to share their analysis effectively without spending a lot of time on the sharing process.

3. Lux

Lux is a great tool for data visualization and exploratory data analysis. Lux will take any data frame and automatically recommend visualizations that help you explore and share the data. You do not need to write any of the visualization code yourself.

To import lux:

import lux
import pandas as pd

Below you can see how Lux provides visualization options for any data frame. All you need to do is select the chart you want. No coding required. This is a huge time saver, as getting the exact correct syntax from a package from like matplotlib or seaborn can be time intensive.

https://github.com/lux-org/lux

Lux allows you to export your visualizations as well, making the sharing process smooth and simple. You can export the visualizations to HTML, or you can convert them into the equivalent matplotlib code, so you can edit them further.

I hope you found these packages helpful and they save you time in your analysis :)

--

--

Jake from Mito
trymito

Exploring the future of Python and Spreadsheets