Rolling in the deep with Deepnote

Kush Ojha
GDSC VIT Vellore
Published in
8 min readJan 8, 2022

After changing my operating system to Linux, I found it rather clunky to set up Anaconda and run Jupyter Notebooks. This made me look for a cloud-based notebook platform for my projects. This is where I discovered Deepnote, a notebook that brings teams together to explore, analyze and present data from start to finish.

Deepnote, unlike the more well-known Jupyterlab, Colab, and Kaggle Notebooks, allows you to collaborate on Python notebooks in real-time just like you would on Google Docs. Your coworkers can improve your code or leave comments to help you communicate better with your team.

You can try out Deepnote for yourself at this link.

Getting started with a Deepnote Project

You can register on Deepnote using your GitHub or Google account. To make a new project, just click on the new project button and get started. Your project is associated with a unique Docker Container where you and your collaborators can work together.

Deepnote comes with intuitive templates that enable you to get up to speed quickly with the aim of your analysis. I will be creating a Deepnote project from scratch for my project. I am going to try to predict players’ overall scores using the FIFA20 data-set to showcase the ease of use and power of Deepnote notebooks.

You can also import your existing work from

  • A Jupyter notebook (or .ipynb file) from your system
  • GitHub repositories
  • Files uploaded on Google Drive through Colab.

Setting up your environment

If your project demands some modules which are not readily hosted on Deepnote, you can run terminal commands by entering ‘!’ before your terminal command and you will get an option to add those dependencies directly to your requirements.txt file.

You can also add your dependencies in your project’s DockerFile or append them manually to the requirements.txt file and they will be installed automatically every time you want to work on your project.

The first step after installing and importing essential modules in any data analysis is uploading or connecting to our data. Deepnote comes with nifty integrations with Data storage platforms like Amazon S3, BigQuery, Google Drive, Snowflake or popular databases like MongoDB and PostgreSQL and much more. I will be uploading the dataset from my system via drag and drop, but integrating your data is just as easy, a few clicks and you are done.

Reading your data

Instead of the typical pd.read_csv(‘/path’) we can use SQL to read our .csv file and store it in a variable at the same time. Deepnote integrates SQL with Python seamlessly in its notebooks. This provides better readability to our code as SQL code is much more human-readable and easier to interact with as compared to Python code. This is done through writing our SQL query in a SQL block, one of many different blocks provided by Deepnote, some of which we will be exploring later.

We are presented with a beautiful database view with distribution of the numerical values in the columns

After selecting some useful columns, we move on to analyzing players based on various statistics they possess. Deepnote has so-called “input blocks” — drop-downs, sliders, text boxes, etc. I will be using a drop-down input to choose the country of a player and a slider which chooses their overall to see all the players with a certain age and above the chosen overall stat.

I got all Brazilian players with an overall stat over 85. I can choose any country and overall value and get the required DataFrame view for the same without typing or changing anything!

Exploring attributes for our model

Visualization made easy

Deepnote comes with Chart blocks that help you come up with quick visualizations for your data. All you have to do is select the DataFrame you want to analyze, choose your X and Y axis (optionally, a colour as the third attribute), and the type of chart you wish to see, and voila, your data analysis has become much easier.

How many lines of code did these beautiful visualizations take? Zero!

This reduces your work and lets you spend more time exploring your data and less time translating thought to code. I have made many such beautiful visualizations using this feature with other assorted visualizations which you can check out in the notebook.

Picking attributes for our models

There are over 100 columns (104 to be exact) in the dataset, and surely not all of them can be influencing the overall stat of a player. We must choose the columns which make the most sense in influencing a player’s overall stat. Let’s look at all the columns first and choose the columns we wish to use for further analysis and preprocessing. My teammate, Chanakya Vivek Kapoor, knows some tricks to help me with this. Thanks to Deepnote’s real-time collaboration feature (example shown below), he can hop right into the notebook and help me out.

Live Collaboration

You and your teammates can work on the same notebook concurrently and leave comments to communicate more effectively, just like Chanakya and I did. All you need to do is add them as a collaborator to your project either by sharing the link with them or entering the email they used to login into Deepnote.

Code embedding

Deepnote lets you share your embed your code blocks in your blogs or your website provided that it allows embedding. Just select the block you wish to share and click the share option and enable sharing to get a link or HTML code for the embed.

I will be embedding my correlation matrix for the attributes I chose and try to remove the attributes which do not correlate much with our target value i.e. overall. I will be adding more code snippets using embeds because they help me give a clearer picture of how the code works and you do not have to refer to the notebook again and again. You can hover over the chart to see tooltips for this heatmap!

Preprocessing, Training, and Evaluation

Looks like height_cm, weight_kg, weak_foot, and pace do not really contribute to the overall rating of a player. Let’s drop them to make it easier for the model to learn and predict overall. This can be done by a simple pandas command.

Now it is time to divide the data into attributes and target values and create a train-test-split to evaluate our models.

Everything is set and we can now give this data to different models and evaluate which one performs the best at predicting the overall stat of a player in FIFA. I will be using Linear Regression, SVM Regressor, and Random Forest to achieve this and evaluate all of them using a simple function to calculate different accuracy metrics for these models.

Time to look at the results. May the best algorithm emerge victoriously!

Would you look at that, over 99% R-2 Score for Random Forest Regressor. Could it be that I am a Machine Learning God who can make highly accurate models using rather small amounts of attributes and data or is my model over-fitting? It is probably over-fitting… time to prune those trees! I will decrease the tree depth to 3 and that should help reduce the over-fitting.

Now that looks like some sane accuracy scores! Let us look at one of the trees we made using the Random Forest Regressor.

Publishing your notebook

Now that we have analyzed the data-set and have chosen a model to predict the overall value of a player, it is time to share our findings with our team. You can publish your notebook by appifying it so that our team sees only the juicy bits full of relevant information and not the tedious data cleaning. To publish the notebook, either click the share button on the top right and choose the publishing editor in the drop-down menu or directly go to the publishing editor by clicking the paper-plane logo in the side panel.

We can choose whether we want our published notebook to have an article layout or a dashboard layout. I prefer the article layout, so let’s go with that.

The right side shows how your published notebook will look like and which blocks it will contain. We do not need to show blocks that include import statements or do not present any data visualization, so I will be removing them by simply clicking on the delete button. We can abstract the code for our custom-made visualizations by hiding the code for the blocks but leaving the outputs as it is.

You can interact with the appified notebook here or you can view the notebook source code using the button below:

Conclusion

Deepnote knocks it out of the park. It is chock-full of features that go above and beyond traditional notebooks. Here are my personal favourites:

  • Real-time collaboration and comments
  • Database Integrations
  • SQL code blocks
  • No-code visualizations
  • Ease of setting up your environment with required dependencies

I keep getting pulled back to Deepnote for data analysis. Let me know in the comments what you think of Deepnote after trying it out. Do checkout this session from DevJams21 to to see a live demo of an end-to-end data science workflow in Deepnote.

--

--