The perfect way to run your Jupyter Notebooks in the cloud
Have your data scientists working on the value adding tasks, and get over the infrastructure fuzz with a click.
Data science has been the golden nugget last years. Swiftly produce results in a matter of days, is what speaks to the mind of people in the various business lines. The old days of lengthy quarterly deployment cycles seemed to be gone. Months of analysis and modeling seemed to be banned to ancient history.
Various data science teams are through the experimental phase, and are up to the period where business expects stable, repeatable results and want to see the return on investment.
Welcome to the software development and delivery cycle (version management, team collaboration, documentation, …), architecture, Infrastructure and scalability challenges. Early successful data scientist where a jack of all trades, spending almost 80% of their time on setting up servers, scaling them, anticipating compute and storage needs, trying to keep versions of models aligned, resulting in only spending 20% of their time on real datascience, building models and interpreting results.
Cloud as part of the solution
The scalability and flexibility for compute and storage has been partially solved by the well known cloud platforms like AWS, GCP (Google Cloud Platform), Azure.
Yet again data scientist need to master one of these platforms, if not all, and focus on the Infrastructurepart of these platforms.
As well these platforms are IaaS (Infrastructure as a service) or PaaS (Platform as a Service) solutions, but not integrated solutions that allow you to kick of you datascience project from the model perspective.
What if there was a solution that combined the Notebooks, infrastructure abstraction and version management all in one platform.
Jupyter Notebooks in the Cloud
Notebooks are a common approach to develop and document models within data science teams, especially the Jupyter Notebooks are very popular. They provide users with a neat interface and are easy to use. Basically, they let users focus just on what they want to do: write code and execute it right away. Nonetheless, there are limitations. The Jupyter Notebooks run that on your local machine, are entirely dependant on the computational power dependant on your computer’s CPU/GPU/RAM/etc. specs. Most laptops are more than enough for the basic tasks you encounter when starting out in data science.
Once you start to do Machine Learning, especially deep learning, tasks on your local machine, you might more quickly hit barriers than you think.
Furthermore, when you work on group projects, you need to find a way to share your Jupyter Notebooks within your team. While using GitHub enables you to share your code, it does require you to know how to use GitHub and it does not enable you to just share a link to give your team members the ability to see and edit your work. As a side note, GitHub also requires all of your team members to remember to commit their newest versions to GitHub, which can easily be forgotten if you have never worked on data science group projects before.
Therefore, your team either needs to know how to work with one of the cloud service providers and GitHub or you will face delayed group projects and very hot laptops.
Saturn Cloud allows you to quickly spin up Jupyter notebooks in the cloud and scale them according to your needs. Basically, it lets you run your Jupyter Notebook on a VM inside AWS, Azure, or GCP without you having to know how to appropriately set up and use these services. It also has a few very nice features that distinguish it from other offerings out there, such as giving you the ability to specify a conda environment, requirements.txt, or docker image in order to standardize environments across all team members. You can also share your Juypter Notebooks with the public or team members using links. This eliminates the need to understand how to work with GitHub for basic data science projects. If you do know how to use GitHub, it still offers a fast and convenient way of testing and developing code with others. As a result, (aspiring) data scientists can focus on data science instead of DevOpsand finish their projects more quickly than they would have been able to do otherwise.
Besides that, Saturn Cloud enables you to deploy a Spark or Dask cluster with just one click. This simplifies issues of dealing with very expensive computations by making distributed computing available to you with a single click. Saturn Cloud also automates version control for you, obviating potential issues arising from team members not committing their newest versions.
Let us quickly show you how to spin up a Jupyter Notebook using Saturn Cloud, to get a feel of how simple it all is.
First, you name your notebook and define the VM (virtual machine) you would like to utilize. All you need to do is specify the disk space and RAM. Saturn Cloud automatically terminates your VM if you do not use it for a default of 10 minutes. You could, however, change that to whatever time period you would like. Using the advanced options, you can define the environment you would like your team to use. After hitting create, your Jupyter Notebook in the cloud will be launched.
Once it is up and running, you will see the following:
By clicking on ‘Go To Jupyter Notebook’, you can access your Jupyter Notebook in the cloud and start coding. And that is basically all you need to do for most simple tasks. If you would like to use more advanced options, such as deploying a Spark/Dask cluster or using GPU’s for your workloads, you can just do so within your existing Jupyter Notebook by clicking on the blue button above and customizing your VM(s) as follows:
Why would you want to add a GPU (Graphics Processing Unit) to your VM? To put it very simply: think of the algorithm you want to train (e.g. a neural network) as a series of mathematical calculations. Now, with a GPU, you are essentially doing all the computations at the same time whereas, with a CPU, you would do them one after another. That is, in essence, why GPU’s are the better choice for expensive computations, especially in relation to machine learning.
As you can see, getting started with Jupyter Notebooks in the cloud is very intuitive using Saturn Cloud. Once your notebook is running, you can also easily share it from within the notebook with the public or just your team members.
To demonstrate this, you can visit the notebook visualizing Data Scientists Compensations :
View the “Data Scientists Compensation” notebook on Saturn Cloud
DevOps can be really difficult when trying to get data science group projects off the ground. Hosting Jupyter Notebooks with Saturn Cloud while also taking care of versioning and the ability to scale in or out as needed can tremendously simplify your life, decrease your time to market, decrease cost and the need for expert cloud skills.
Our point of view: the ability to quickly and easily share notebooks is just Amazing.