Nerd For Tech
Published in

Nerd For Tech

You need to be careful on what you print on Jupyter Notebooks

Today, I was working on a dataset, using Jupyter Notebook application.

I was trying to parse the dataset from one format to another. And the dataset size was relatively large, about 2 Gigabytes of text.

While parsing, I fell into the carelessness of printing the parsed data values. Which resulted in printing a significant part of the 2 Gigabyte text.

It resulted in having a notebook with 200 Megabytes of size. I guess since the information in it corresponded to millions of text lines, browser was not able to open the Jupyter Notebook anymore. That means I could not even copy paste my code to recover my work.

I didn’t want that. I wanted my work back, so I tried a few things.

Things I have Tried

1- Cell -> All Output -> Clear

I tried to clear all cell outputs, and then save the file.

It did not work, even though I managed to press “Clear” and then hit Ctrl+S before the webpage died. I tried this multiple times, but there was no hope.

2- Deleting Lines Manually

I knew that Atom handled large text fıles better than the default text editor, so I tried editing the .ipynb just like we would edit a .txt file, to delete the unnecessary parts.

Atom was crashing after I scrolled more than a few thousand lines in the .ipynb file. I did not know how many lines it contained, but I changed the difference in file size.

0,5 Megabytes was reduced from the 200 Megabyte file, with a deletion of few thousand lines. So Atom was not a good solution, since it crashed when it loaded more than 0,5 Megabytes of lines.

3- Deleting Lines with Python

I think this might have been done much better, however I was confident that this approach would solve my case, so I tried the particular solution:

  • First, I created a backup of the corrupted file, just in case I do anything wrong.
  • I have read the .ipynb file from python, then wrote back only a few lines back.
  • To be more clear, I did not open the .ipynb file in Jupyter. I opened it as a readable and writable file in Python, to modify it. Just like you would open a .txt file in Python.
  • First and last lines, I wrote back to the file. The millions of other lines that were in between, I did not write back.

This resulted in me being able to open the .ipynb in Atom again, since file size was reduced to 2 Megabytes.

There, I found the relevant cells contatining the code, copied the code, and my work was recovered.

This is a basic issue, however it seemed interesting to me. Turns out you need to be careful on what you print on Jupyter Notebooks.

Note: Modifying an .ipynb like you would modify a .txt file corrupts the .ipynb if you don’t do it carefully. So take a backup of your notebook before you modify it like .txt, if you will.

--

--

--

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Recommended from Medium

Navigating CPU-Bound Code in an Async World

What is Spheroid Script?

Jolocom added to DIF Universal Resolver

The Stack of the Future: Building a Warehouse-First CDP on Snowflake Using RudderStack

Testing APIs Using Postman

Get Prometheus Metrics from a Express.js app

AWS Proton and Tale of So Many Options

Introducing Foxtrot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmet Melek

Ahmet Melek

Machine Learning Engineer at Primer AI | Sharing thoughts on computers and other geeky stuff

More from Medium

Automation Python Scripts: Connecting to Oracle Database using Cx Oracle

Adios Pandas! Process Big Data in a Flash using Terality, Dask, or PySpark

Is Pandas really that slow?

MLOps: Azure Machine Learning Components with Azure GUI Dashboard