4 Python Packages for Better Data Science

Jake from Mito
trymito
Published in
3 min readMay 18, 2022

1. Mito

Mito is a spreadsheet front-end for Python. You can call Mito into your Jupyter Notebook and each edit you make in the front-end will generate the equivalent Python.

Here is a video demo:

To install Mito, use these commands:

python -m pip install mitoinstaller
python -m mitoinstaller install

Then to open the Mitosheet interface:

import mitosheet
mitosheet.sheet()

Here is a link to the full install instructions.

You can configure a Mito pivot table by selecting the Pivot button from the toolbar and then choosing your rows, columns, values and aggregation types.

docs.trymito.io

Each edit in Mito generates the equivalent Python in the code cell below. It is a much faster way of producing code than constantly heading to Stack Overflow to find the correct syntax.

The pivot table above generates this code and auto-comments it as well!

Mito does not just generate the code for pivot tables. In Mito, you can merge datasets, filter, sort, use functions, look at summary statistics, and more — and Mito will generate the equivalent Python for each of these edits..

To create a Plotly chart, all the user has to do is click the graph button and select their axes.

Here is Mito’s full documentation.

2. Plotly

Plotly is the graphing library of the future. It is the best package for quickly and easily making interactive charts and graphs. Packages like matplotlib and seaborn are certainly intuitive as well, but they lack the interactivity that makes Plotly so strong.

To install Plotly, run this command:

$ pip install plotly==5.2.1

In Plotly, there are a whole host of interactive charts to choose from. There are simpler charts, where changing the color of the bars is the interactivity.

import plotly.graph_objects as go
fig = go.Figure(data=go.Bar(y=[2, 3, 1]))
fig.show()
https://plotly.com/python/

They also offer more advanced, dynamic charts:

https://plotly.com/python/

3. Tensorflow

Tensorflow is an open source machine learning package that was originally developed by Google. It has made machine learning in Python much more accessible, and continues to do so as new updates come out.

To import the package, run these commands

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Here is an example of a simple model you can run with Tensorflow:

class SimpleModule(tf.Module):
def __init__(self, name=None):
super().__init__(name=name)
self.a_variable = tf.Variable(5.0, name="train_me")
self.non_trainable_variable = tf.Variable(5.0, trainable=False, name="do_not_train_me")
def __call__(self, x):
return self.a_variable * x + self.non_trainable_variable
simple_module = SimpleModule(name="simple")simple_module(tf.constant(5.0))

Here is the full documentation and datasets for this model.

Tensorflow allows you to easily build neural networks. Here is a screenshot from their neural networks tutorial.

Here is a link to the full guide.

4. Selenium

Web-scraping can be an integral part of certain Data Science workflows. Selenium is making this process much easier.

To install the package, run this command:

pip install selenium

Using selenium, you can select what page you want to scrape:

driver.get("URL")

From here, you can use different strategies that the package provides to scrape the data you want. Here is a link to the full documentation

I hope you found these packages helpful :)

--

--

Jake from Mito
trymito

Exploring the future of Python and Spreadsheets