Notebooks have become the go-to tool for data scientists in both academia and industry, but they have also introduced a new set of challenges. Many conference talks and white papers have described the drawbacks. Notebooks don’t support collaboration, hinder reproducibility, encourage sloppy coding practices, don’t play well with the rest of your stack…you get the idea.
At Deepnote, we recognize the power of notebooks, but we see a lot of areas in which they can be improved upon. We want to create an interface that makes data scientists more productive and helps them reason and collaborate. We are building on top of the familiar Jupyter experience, but also making some significant design changes. In this article, we will discuss how Deepnote addresses the following 4 key challenges:
- Environment set up and management
- Code intelligence
- Understanding data quickly
- Maximizing your speed
Challenge 1 — Environment set up and management
The potential of data science teams is often limited by the engineering support that’s available to them. Managing Python packages, researching connectors to disparate data sources, and maintaining data pipelines is particularly difficult. The learning curve is steep and the experience is often exasperating. Setting up a development environment and maintaining its consistency takes up valuable time, and it can end up being an expensive and frustrating task.
At Deepnote, we aim to reduce this overhead and free up the hands of data scientists to focus on what they are best at, which is why we abstracted all this complexity away. Deepnote is built for the browser and is platform-agnostic. No pre-installs are needed — you can simply sign in, create a new notebook, and get to work while Deepnote handles the rest (you can try it here).
All hardware is remotely managed, every hardware tier is connected to the internet and can execute long-running tasks. Your projects in Deepnote are always available, with the hardware being up and running in just a few seconds. No need to spend time picking the right Linux version and managing multiple Python environments. If you need to upgrade your hardware, you can do so in just one click. We believe that by enabling a frictionless setup, we can move the data science community closer to reproducibility. Deepnote supports that in two major ways.
First, Deepnote makes dependencies management easy. When you pip-install a package in the cell of a notebook, we prompt you to move it into requirements.txt and append a specific version of the used package. Second, you can create Teams in Deepnote, which allows you to share datasets, integrations, projects, and environment configurations. This way, when your colleague shares a project with you, it includes the environment it runs in, not just the .ipynb. To learn more, take a look at my previous article discussing how Deepnote fosters collaboration in notebooks.
Challenge 2 — Code assistance
Although writing and managing code is a fundamental activity in the computational notebook paradigm, a lack of code intelligence in notebooks can make the experience difficult. Code editors and IDEs used by software engineers are not the right tools for the job either. As a data scientist, you most likely navigate function and class names by having another browser open to search for help or you get the job done by switching between software IDEs and notebooks.
Whether you are transforming your data, exploring, or building ML models, Deepnote helps with advanced code assistance. An IDE-style autocomplete system lets you work faster, and configurable linting tools point out bugs before they break your long training jobs.
Challenge 3 — Understanding data quickly
Discovering patterns in data takes up a lot of the time before we’re able to start using those insights and building out models. Initial data exploration lacks the immediacy and often ends up being a never-ending cycle of “copy-paste and tweaking bits of code made worse by feedback latency and kernel crashes”.
Deepnote has a built-in variable explorer so that you can instantly review the contents of your variables without having to print them. It contains additional information, including histograms for each column of a data frame so that you can quickly get an overview of the current state. Discovering patterns is also made easier with the help of interactive plots.
Challenge 4 — Maximizing your speed
As data scientists, we need interfaces that help us explore data efficiently, prototype quickly, and move towards actionable insights. With Deepnote, we’ve introduced a bunch of features that save your time and help you iterate on your experiments faster:
- Get a browser notification when a cell finishes executing.
- See when a cell was last executed and how long it took.
- If you’re executing a cell for the second time or more, Deepnote displays a progress bar showing you how far along in execution the cell is.
- Deepnote shows which cells were successfully executed with a green checkmark. If the code changes without then executing the cell, the check mark disappears. This way, you know when a cell’s output is up-to-date and avoid struggling with a hidden state.
We’ve also introduced a powerful Command palette, as well as shortcuts that provide quick access to all your files and the most popular actions. Simply press Ctrl+P (or ⌘+P if you’re on Mac) and start typing to switch to another file, open a terminal, or execute an action.
Like what you see? We’ve recently opened Deepnote for public beta, so you can try it out for yourself.
This post is Part II in a series on how Deepnote tackles the common challenges of data science notebooks. Check out our currently released articles below:
- Part I: Embracing collaboration in Jupyter notebooks
- Part II: Hacking your productivity in data science notebooks (this post)
There are more ways to learn from Deepnote and we’re always happy to share: