Can Marimo replace Jupyter notebooks?
I recently heard about Marimo, a reactive Python notebooks environment.
What’s wrong with Jupyter?
I have been using Jupyter notebooks a lot in my work, and enjoy their benefit of easy exploration, especially for research, data science and teaching purposes. However, I acknowledge their weaknesses (for an overview of Jupyter’s fundamental flaws, see Joel Grus’s I don’t like jupyter notebooks, and for completeness, Jeremy Howard’s I like Jupyter Notebooks):
- They do not support modularity and scaling
- They can be difficult to debug, and it is hard to keep track of the state of the program as cells get entangled in non sequential execution.
- Jupyter Notebooks are notoriously difficult for version control, as they are not plain text files and include other types of media.
- They can be slow to load and run, and can consume a lot of memory.
What is Marimo
To my view, Marimo is different from Jupyter in two main concepts: It is reactive, and it works with .py files.
Marimo is reactive
This basically means that changing a variable in one cell will cause all cells that use that variable (anywhere in the notebook), to re-run. This is a big thing and allows native Interactivity and reproducibility.
- Interactivity means that changing UI elements such as sliders will immediately affect code and results. This is possible in Jupyter too, but not natively (by using interactive widgets or callbacks).
- Reproducibility means that re running the code in the notebook will yield the same results since the execution in Marimo is fixed.
Marimo saves python files (.py) instead of notebook files (.ipynb).
This allows easy version control as files are pure code. It also allows better scalability as the notebook you are working on is already pure python and can be a part of a larger project. This has the obvious drawback of losing any outputs in the notebook which can be painful if outputs are slow to produce or if the notebook is designed to be a report for other people that don’t necessarily want to run again the code.
Marimo test drive
I decided to take Marimo for a test drive. Here are my thoughts.
Installation
I’m using Mamba as environment manager and I could easily install Marimo by typing
mamba install marimoHowever, notice that as for the time of writing, installing Marimo with Conda/Mamba does not allow using copilot in Marimo. Copilot is supported only with pip installation, i.e. pip install marimo.
First Impressions
Being a good boy, I first ran the introductory tutorial,
marimo tutorial introAs a Jupyter user, I felt familiar when a browser window opened with the intro tutorial notebook. The tutorial gives a good introduction to the basic features mentioned above (reactivity, interactivity, etc.).
I immediately observed a design improvement — the output of the cells is shown above the code, not below it as in Jupyter. This makes more sense when using the notebook for presentation purposes and follows the philosophy of see first (the result) and understand later (by reading the code) which is more natural for human learning.
I also noted the option to easily make a web app out of a notebook by running marimo run notebook_name.py from the command line. This shows the interactive parts of the notebook (UI widgets and outputs), and hides the python code and makes it uneditable. So far I’ve been using the panel library for building interactive web apps, and I think Marimo may allow an easier, more natural implementation. This alone can be a reason to use it, at least for projects that their main output is a web app (such as dashboards and other interactive views of data).
Some general notes from trying the tutorial:
- Keyboard shortcuts are different. For example, in Jupyter I use a lot the
Esc+aandEsc+bshortcuts to create a new cell above or below the current cell. In Marimo it'sCommand+Shift+oandCommand+Shift+pand they work from within the active edit of the cell, in contrast to Jupyter. Confusing. - The tab completion I’m so used to and rely on in Jupyter works differently. The
tabkey is only used for indentation as far as I could see. Completion of object attributes (options that come after writing a '.' after an object's name) happens automatically but with a significant delay. In Jupyter they appear after pressing 'Tab' which is more convenient because I control their appearance. When the context menu appeared I could only use the down arrow button to navigate the menu items, but that may be a local problem. - One of the most helpful features of tab completion (for me) is for file path completion in Jupyter. This, unfortunately, didn’t seem to work in Marimo.
- The
shift-tabfunctionality of Jupyter, showing the function definition when the cursor is in the function call brackets is a must for me. In Marimo, similar to the tab completion, it is automatic. Function doc string and attributes appear automatically when the cursor is at the end of a command, or when hovering above a keyword with the mouse. Slightly different than Jupyter, maybe even better. I think I can get used to it. - matplotlib plots work out-of-the-box quite well, although the figure size is different on the screen so if you need identical results you might need to change the figsize parameter.
- The “hide code” functionality (
command+h) is an easy (and great!) way to make the notebook clearer for occasional readers and my future self. - The screen appeared to be narrow and smallish with wide margins that didn’t use all my screen space. This can easily be solved by setting “full width” to on in the settings menu (it solves the matplot figure size issue above too!).
Migrating a notebook from Jupyter
After a playing a bit with the tutorials, I decided to jump in the deep water and load a notebook of a recent project I worked on. It started out well but after running a few cells I got a novel error unique to Marimo:
The variable 'x' was defined by another cell
This does not happen in Jupyter of course, and is a direct result of the reactivity mentioned above. This is actually a desired behavior as it forces me to create non-ambiguous code. And yet, It felt a bit like my mum just told me I can't watch TV until I clean my room: Necessary but annoying as Marimo won't let you define the same variable more than once and even ordinary Python code will break if it has multiple declarations for the same variable in it.
In Jupyter I often plot something by using the plot command in the last line of a cell. Here it created 2 identical plots, probably because the plot function returns the plot and also plot it to the screen if it's on the last line of the cell. To prevent this duplication I stored the plot result in a variable plot_a = df.plot() and used plt.show() to show the plot.
After adding suffixes some of my variable names to prevent the double declaration error, I discovered the find-replace command is activated with command+f.
Intermediate conclusion
This concludes my initial experiment, which was just to check that I can get the basic functionality I’m used to with Jupyter. I feel a slightly cozy feeling in my gut knowing that the code I produced will run reproducibly regardless of the cell execution order I performed in the notebook. This is the opposite of the slightly inconvenient gut feeling I get after saving a Jupyter notebook with lots of cell twisting and entanglement.
It also feels nice to know I can use git for version controlling this code, without requiring the special library that helps me get ipynb files behave nicely with git (nbdev, I encourage you to check it out if you are working with Jupyter!).
I know that I merely scratched the surface of Marimo's extra features in this episode and it is not fair to conclude the comparison with Jupyter just yet. But what I saw is enough to encourage me to make further, deeper, explorations. Next I’d like to build an interactive web app using Marimo, but this will be in a different post.