3 Python Packages Making Data Science Easier

Jake from Mito
trymito
Published in
3 min readNov 15, 2021
From Author
  1. Mito

One aspect of using Python for data science is that it is not very visual. You have to keep a great mental image of the state of your data or your model. Mito provides a visual spreadsheet for data science. You can call the Mitosheet into your Python environment and each edit you make will generate the equivalent Python in the code cell below.

From Author

Mito can be used to easily extract information from your data in Python, but it can also be used to take repetitive spreadsheet tasks and transition them to Python.

To install Mito, run these commands:

python -m pip install mitoinstaller
python -m mitoinstaller install

Then open Jupyter Lab and call the Mitosheet:

import mitosheet
mitosheet.sheet()

Here are the full install instructions.

One of the most popular features in Mito is the pivot table feature:

From Author

Here is some example pivot table code that is generate by Mito:

# Pivoted Airport_Pets_csv into df2
unused_columns = Airport_Pets_csv.columns.difference(set(['Zip']).union(set(['Division'])).union(set({'Zip'})))
tmp_df = Airport_Pets_csv.drop(unused_columns, axis=1)
pivot_table = tmp_df.pivot_table(
index=['Zip'],
columns=['Division'],
values=['Zip'],
aggfunc={'Zip': ['median']}
)

Notice that the code is auto-documented.

Mito also has feature such as:

  • graphing
  • merging
  • filling null values
  • summary stats
  • filtering
  • adding and deleting columns
  • spreadsheet formulas
  • and more!

The graphing features in Mito are great for quickly visualizing your data and generating the equivalent code. Mito uses the Plotly graphing package.

From Author

Here is the Mito website.

2. Bokeh

To start using Bokeh:

import numpy as np from bokeh.io 
import output_notebook, show
from bokeh.plotting import figure

Bokeh is an amazing package for generating interactive charts. Graph/chart generation is a part of data science that many users struggle with — even advanced ones. Bokeh doesn’t just make the chart generation process easier, but it allows the user to make interactive charts that will be much more valuable for sharing and presentations. Interactivity inside data analyses is becoming more and more important, as organizations are putting a larger emphasis on keeping business users apprised of data trends.

Here are some of the options they offer:

https://demo.bokeh.org/

Here is the full documentation

Here is a demo video from the Data Professor that walks through great graphing packages like Bokeh:

3. PyCaret Classification Package

To install Pycaret:

# install the full version of pycaret
pip install pycaret[full]

Classification work is an integral part of data science, but it is something that many users struggle with. PyCaret Classification provides a low code environment to do some of this work, which makes the workflow much more accessible.

Below you can see how the package provide interactive buttons to for plot types to represent your classification.

https://www.youtube.com/watch?v=2xAgLKUN6Xs&t=105s

PyCaret makes training models incredibly easy. One of the best features of the package is their awesome documentation. Here is the link to the full documentation.

Pycaret also provide functions for deploying the models:

Here is some example code on model deployment, taken from PyCaret deployment documentation:

# Importing dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# create a model
lr = create_model('lr')

# finalize a model
final_lr = finalize_model(lr)

# Deploy a model
deploy_model(final_lr, model_name = 'lr_aws', platform = 'aws', authentication = { 'bucket' : 'pycaret-test' })

I hope these packages are helpful and make data science easier for you!

--

--

Jake from Mito
trymito

Exploring the future of Python and Spreadsheets