Python in the cloud changed how I do science and it can do the same for you

Jordan Landers
CyberPaleo
Published in
4 min readJan 31, 2023

Full disclosure: I am an active Python user. Python has become the default language for analysis and visualization in the climate modeling community and growing. There is a growing array of tools specifically designed for analyzing climate data (including paleoclimate!), and generous support for working with many types and sizes of datasets. If you’re curious about “Why Python?” check out this article from Nature.

The olden days when one could do cutting-edge paleo-science in Excel are long gone. Increasingly, analyses are required that demand a programming language like R, Python or Matlab. To each their own, but I am partial to Python (as disclosed above!). In addition, the most exciting paleo-science is happening at interfaces between data types, or where observations are fused with the output of numerical models. If you’ve ever dealt with these climate model behemoths, you know that they spit out data in gargantuan amounts, which would burn a spreadsheet to a crisp. Programming languages like Python handle those volumes much more gracefully, but the barrier to entry into a code-driven analysis ecosystem and wrangling those large data files can trigger feelings of vertigo.

I can absolutely relate. Ask 10 people how they set up their computers to write and run code, and you’ll get 11 answers…at least 11 answers. Getting to “the starting line” (as it were) can feel overwhelming, a feeling that is exacerbated by the looming question: will it even be worth it?

Enter: The Cloud.

Doing work in the cloud means logging into some kind of online system (e.g. a JupyterHub, Google’s Colab, DeepNote) and running code on machines housed in a server farm with data that is accessible via url or has been uploaded to the system.

I have historically had an (arguably irrational) attachment to working locally, but with the dawning of 2023, I have been actively reconsidering my nearly decade-old approach and I am firmly convinced that I should be doing much less science locally and much more in the cloud.

Here’s my thinking:

  • Ready-made computing ecosystem: The entire coding ecosystem is ready made. Python is configured, Jupyter Notebook is set up, packages are installed (complete with dependencies)! Most new users are likely to find that the pre-installed packages (tools that extend the functionality of a coding language for specific tasks) will cover their analysis and visualization needs. If there is something missing you can either install it yourself in your JupyterHub session, or contact the administrator of the JupyterHub to request it be added to the preinstalled packages. One of the great things about working in the cloud is that the a hub administrator makes sure that the pre-installed packages are all configured to work together smoothly.
  • Node sharing is caring: You can scale compute resources to meet your needs. When you spin up a jupyter hub node, you get to choose a size so you can have hardware when you need it, but when it’s not in use it returns to the shared pool — much more efficient resource allocation.
  • Keep your local disk clear(er): You don’t have to host data locally on your computer. With the help of resources like the lipdverse and Pangeo-Forge, you can point to locations where data can be access and use it without storing gigabyte upon gigabyte of model output on your machine.
  • Easier collaboration: Working with a common environment improves the odds of smooth interoperability. Not only can packages conflict with each other, but collaborators can run into roadblocks when code is written and run in different environments. With the hub, you can feel confident that the environment you wrote in is the same one your collaborator will read (and rerun/tinker) in. So smooth…
  • Free up CPU: Running code in the cloud will never slow down other tasks on your computer, nor will other tasks slow down the code you’re running. Don’t run code locally and edit photos at the same time. In fact, I don’t recommend running code locally while photo software is running.

Bonus: With our JupyterHub specifically, you get a nice array of demo notebooks and templates to spark inspiration and expedite your efforts!

--

--

Jordan Landers
CyberPaleo

Earth Science Graduate Student at the University of Southern California - data science, paleoclimate