Elevate Notebooks into Shareable Data Apps

Introducing Zero-True, the new notebook for collaborative machine learning and data science.

Red Giuliano
Zero-True
9 min readOct 24, 2023

--

Table of Contents

· The Beginning
· Jupyter Notebooks vs Zero-True Notebooks
· How to Try Out Zero-True
· Looking Ahead

The Beginning

A screenshot of our upcoming cloud platform, image by author

We have officially open sourced our pre-release versions of the Zero-True notebook: a new open source computational notebook that helps you share your insights faster. With a built-in python, SQL and rich text editor, reactive cell updates and a built-in UI library, Zero-True makes working with data a breeze.

Background Story

In 2019, I landed my first job as a data scientist at a Fortune 100 company with a multi-petabyte scale dataset. I was working closely with a team of 6 experts on projects that involved several external data sources, thousands of lines of existing code, complex business requirements and dozens of poorly-documented SQL stored procedures.

Progress was slow. Much of our work was experimental and involved exploring new data sources and algorithms. Like many other data teams, we gravitated towards Jupyter notebooks to manage our workflows.

Logo for the Jupyter Foundation, “Jupyter” and the Jupyter logos are trademarks or registered trademarks of NumFOCUS, used by Red Giuliano with permission.

However, anytime we had an idea that could drive value for the business, implementing it was always easier said than done. I couldn’t help but feel like the biggest bottlenecks were always the same: communication with other stakeholders and collaboration among data scientists.

I left the company to try to start a new venture with my co-founder Carson who is more experienced in traditional software engineering best practices. I was very much still a Jupyter notebook evangelist — but, despite its numerous advantages we quickly realized that Jupyter notebooks were slowing us down. We believed we’d stumbled on a real problem, and that we could build a better solution - so we pivoted to create Zero-True.

But how exactly were Jupyter notebooks failing us?

Jupyter Notebooks vs Zero-True Notebooks

Zero-True, image by author

Let me start by saying that I have a deep admiration for the Jupyter project, and any criticisms I have come from a place of love. I’ve spent literally thousands of hours using the software, but I am not the first to voice my frustrations with the tool. If you want an in-depth explanation I recommend watching the talk that was given at JupyterCon titled “I don’t like notebooks”¹ by Joel Grus (or the shorter summary by Towards Data Science)². However, here’s my hot take and what Zero-True is doing differently:

Problem 1:

  • Sharing notebooks with any non-technical stakeholders is very time-consuming. This usually involves generating a csv file and screenshots, and writing an email with an attached PowerPoint and spreadsheet. All of this culminates in a meeting with a screen-sharing session. Putting all of this together can take a lot of time, and oftentimes it’s hard to present your solutions this way because users would have to rewatch a screen recording to see any kind of demo. It’s no wonder that only 13% of data science projects make it to production³.
Zero True vs Current Sharing Stack, image by author

Solution:

Zero-True includes a built-in UI library with an intuitive syntax, so you can create and publish beautiful interactive visualizations based on your work. This allows you to communicate the value of your analysis much more easily, without wasting time creating pngs, pdfs, decks and spreadsheets. By actually interacting with your code, non-technical stakeholders and other data scientists are much more likely to understand and retain knowledge. Some studies by CMU suggest that learning interactively like this can be up to 6 times more effective than reading or watching lectures⁴.

Problem 2:

  • The “Data Science Death Spiral”. This spiral is a sort of catch-22 you can wind up in when going from a great idea that you validate in a notebook to a production dashboard or service. Since it’s hard to communicate the value of certain insights in a static format, it’s hard to convince managers to dedicate the resources to carrying forward a project.
The Data Science Death Spiral, image by author

Solution:

By making it so easy to go from notebook to a more shareable, interactive format, Zero-True aims to help you avoid the death spiral so that you can help drive your company’s AI strategy from the ground up.

Problem 3:

  • Most notebooks don’t run out of the box due to issues with dependencies (both data and package dependencies) and notebook state. Sometimes these issues can be very subtle and hard to debug and it’s not uncommon to have to spend a few hours getting an unfamiliar notebook to run (if you don’t give up beforehand). In one study analyzing this problem it seems that the researchers were only able to use ~600 notebook sessions out of a total of over 50k notebook sessions scraped from various GitHub repos⁵.
No module named pandas? Source: https://stackoverflow.com/questions/65870276/import-pandas-is-not-working-in-jupyter-notebook

Solution:

Zero-True helps with this by adding an integrated SQL editor to your notebook powered by duckdb so you can version queries directly from your notebook instead of storing them in a database. If you are using Zero-True cloud (coming soon) to develop, we additionally package your dependencies along with your notebook using docker so that you can start working immediately without having to set up a virtual environment on your machine.

Problem 4:

  • Once you do get approval to migrate your notebook, the process can take weeks. This is mainly due to the previous issue. A migration usually involves handing over a notebook to someone with more web app development experience who then has issues running your notebook either with state management, or data/package dependencies.
The Data Science Feedback Loop, image by author

Solution:

With Zero-True, you can skip this headache by publishing your notebook directly as an app in pure python, without needing any JavaScript, CSS, HTML and the likes.

How to Try Out Zero-True

To check out Zero-True for yourself, simply simply navigate to the directory you want to start your first Zero-True project in and run the following commands:

#install the package
pip install zero-true

#run your first notebook
zero-true noteboook

A file will get created in that directory with the contents of your notebook, and a link will pop up in your terminal. Navigate to “localhost:1326” in your browser and you will see your Zero-True notebook! Our notebook comes packed with features, such as:

  • Reactive cell updates: We use static analysis tools to determine which notebook cells need to be updated when you make a change. This allows users to immediately find and fix breaking changes when they occur, so they can share notebooks quickly and confidently. No more clicking “Restart Kernel and Clear Output” just to make sure nothing is breaking.
Zero-True has built-in reactive cell updates, image by author

To reproduce this example go ahead and run:

Cell 1 (code cell):

a = []
print(a)

Cell 2 (code cell):

a.append(1)
print(a)

You can go ahead and change the value of a and in the first cell and watch the output of the second cell automagically update. If you change any variable names any downstream cells will be run so that you can catch the error right away.

  • No hidden state: one of the most common issues with Jupyter notebooks is having to clear the kernel and restart so that the state of the notebook is cleared and you can be sure that there are no variables floating around from downstream cells. Zero True eliminates hidden state from your notebook so that you never get a different result when running the same code cell twice.
Notice how jupyter output changes for the same code, image by author

To reproduce this example simply run a Jupyter notebook with the same code in the example above. Notice how every time you rerun the cell 2 in the Zero-True notebook you get the same answer, while the Jupyter notebook continues to append to the list it has in memory.

  • Integrated Frontend UI Library: A built-in modern frontend UI library allows you to create complicated but fast dashboards directly in your notebook. Gone are the days of saving plots and taking screenshots: create rich interactive experiences to let stakeholders see the impact of your work.
A zero-true notebook with a interface to the DALL-E api, image by author

To reproduce this example you need to create an account as well as an API key with OpenAI. Here are the developer docs for OpenAI in case you want to reproduce this example in your notebook and here is the code needed to do that:

import openai
import zero_true as zt

#set your api as an environment variable so it is not exposed
#see the openai docs for more information

text_input = zt.TextInput(id = 'txt',label="Enter your prompt")
button = zt.Button(id = 'btn', text="Generate Image")

if button.value == True:
response = openai.Image.create(prompt = text_input.value, n=1, size="1024x1024")
image_url = response['data'][0]['url']
image = zt.Image(id = 'image',src=image_url)
layout = ['txt','btn','image']
else:
layout = ['txt','btn']

card = zt.Card(id ='card', cardChildren=layout,width = 500, location = 'center')

As you can see the syntax is very concise. In only about a dozen lines of code we were able to create an interface to the api.

  • Seamless sharing: Data Scientists are not dev-ops specialists… setting up an app that doesn’t crash with a lot of traffic has nothing to do with the best algorithms for your use case. Publish web apps from your notebooks with one click on the Zero-True cloud (coming soon). We take care of all of the deployment details so you can sit back and just watch your app go live.
Mock ups from our upcoming cloud version, image by Author

To see what your notebook would look like if you shared it, simply quit out of your notebook and type the command below in your terminal:

zero-true app

Navigate to “localhost:2613” and you’ll see what your app looks like! If you want to see an example deployed in our cloud, check out our unofficial NYC citybike-leaderboard! Zero-True is designed to be an intuitive platform for anyone familiar with notebooks.

Looking Ahead

When Carson and I started, neither of us had any front-end or dev-ops experience (or any experience starting a company) so it’s been a journey to get to where we are now. And, I’m excited to share that Zero-True is now at a place where we can let the wider community test it out for themselves.

We are proudly open sourcing our product so that anyone can try it out, give feedback or contribute. You can easily install Zero-True by running `pip install zero-true`. Then, to get started simply run `zero-true notebook` — you can explore a new dataset or model and create a dashboard to showcase your insights.

For more detailed information and examples, check out our docs. You can also find us on Github at https://github.com/Zero-True/zero-true where you can open an issue if you run into any problems or give us a star if you don’t! We are excited to see what people are able to build using this tool so please make sure you share your work with @ZeroTrueML on twitter/X.

References

[1] Joel Grus (October 10 2018). I don’t like notebooks. https://www.youtube.com/watch?v=7jiPeIFXb6U&ab_channel=O%27Reilly

[2] Joes Grus (August 21 2019). The case against the jupyter notebook (TDS Podcast - Clip) https://www.youtube.com/watch?v=1ISrRp6n2Tg&ab_channel=TowardsDataScience

[3] Ari Joury, PhD (November 8th 2020). Why 90 percent of all machine learning models never make it into production. https://towardsdatascience.com/why-90-percent-of-all-machine-learning-models-never-make-it-into-production-ce7e250d5a4a

[4] Kenneth R. Koedinger, Jihee Kim, Julianna Zhuxin Jia, Elizabeth A. McLaughlin, and Norman L. Bier. 2015. Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a MOOC. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale (L@S ‘15). Association for Computing Machinery, New York, NY, USA, 111–120. https://doi.org/10.1145/2724660.2724681

[5] Macke, Stephen and Gong, Hongpu and Lee, Doris Jung-Lin and Head, Andrew and Xin, Doris and Parameswaran, Aditya (2021). Fine-grained lineage for safer notebook interactions. Published in: Proceedings of the VLDB Endowment, Volume 14, Number 6, Pages 1093–1101.

--

--