Nerd For Tech
Published in

Nerd For Tech

You need to be careful on what you print on Jupyter Notebooks

Today, I was working on a dataset, using Jupyter Notebook application.

I was trying to parse the dataset from one format to another. And the dataset size was relatively large, about 2 Gigabytes of text.

While parsing, I fell into the carelessness of printing the parsed data values. Which resulted in printing a significant part of the 2 Gigabyte text.

It resulted in having a notebook with 200 Megabytes of size. I guess since the information in it corresponded to millions of text lines, browser was not able to open the Jupyter Notebook anymore. That means I could not even copy paste my code to recover my work.

I didn’t want that. I wanted my work back, so I tried a few things.

Things I have Tried

1- Cell -> All Output -> Clear

I tried to clear all cell outputs, and then save the file.

It did not work, even though I managed to press “Clear” and then hit Ctrl+S before the webpage died. I tried this multiple times, but there was no hope.

2- Deleting Lines Manually

I knew that Atom handled large text fıles better than the default text editor, so I tried editing the .ipynb just like we would edit a .txt file, to delete the unnecessary parts.

Atom was crashing after I scrolled more than a few thousand lines in the .ipynb file. I did not know how many lines it contained, but I changed the difference in file size.

0,5 Megabytes was reduced from the 200 Megabyte file, with a deletion of few thousand lines. So Atom was not a good solution, since it crashed when it loaded more than 0,5 Megabytes of lines.

3- Deleting Lines with Python

I think this might have been done much better, however I was confident that this approach would solve my case, so I tried the particular solution:

  • First, I created a backup of the corrupted file, just in case I do anything wrong.
  • I have read the .ipynb file from python, then wrote back only a few lines back.
  • To be more clear, I did not open the .ipynb file in Jupyter. I opened it as a readable and writable file in Python, to modify it. Just like you would open a .txt file in Python.
  • First and last lines, I wrote back to the file. The millions of other lines that were in between, I did not write back.

This resulted in me being able to open the .ipynb in Atom again, since file size was reduced to 2 Megabytes.

There, I found the relevant cells contatining the code, copied the code, and my work was recovered.

This is a basic issue, however it seemed interesting to me. Turns out you need to be careful on what you print on Jupyter Notebooks.

Note: Modifying an .ipynb like you would modify a .txt file corrupts the .ipynb if you don’t do it carefully. So take a backup of your notebook before you modify it like .txt, if you will.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmet Melek

Machine Learning Engineer at Primer AI | Sharing thoughts on computers and other geeky stuff