Deploying a cloud-only development environment

sptkl
NYC Planning Tech
Published in
3 min readMar 18, 2020

Data Engineering at NYC Department of City Planning has been cloud-forward ever since the beginning. We were one of the first teams to run our data production pipelines on DigitalOcean, which provides convenience and speed. However, the code base for these data pipelines were developed locally. Every time we push new code, we would have to apply small tweaks here and there to ensure everything runs. This process quickly became tedious, and especially when datasets are large it could be time consuming.

Therefore we decided to move to a cloud-only development environment, where we can ensure high consistency between our development environment and production environment. Going cloud-only enables high networking performance, which greatly reduces data I/O time and allows faster ETL development iteration.

Our first interaction with cloud-only development environment was through the VSCode remote development extension. Even though this extension is still in the preview phase, it has proved to be the single most useful extension for Data Engineering. The remote development extension is easy to install and we only need a simple ssh config to connect to different servers. Instead of using terminal to access servers and using nano to apply fixes, VSCode provides a seamless development interface that makes code editing and execution a much more enjoyable experience.

Remote Development Extension

Using the VSCode remote extension quickly became insufficient as our team grew and as Data Engineering started to transfer data products to PC based teams in the Agency. It was clear that we have to bypass any dependencies locally, including VSCode, and we need a solution that can work anywhere without any prior configuration. This is where we came across across code-server, built and maintained by Coder, a company that provides cloud IDE as a service and distributes open source docker images for their software components.

check out the code-server github repo

Data Engineering mostly runs on a PostgreSQL + Python stack, so having a browser based GUI for SQL and Python is equally important. For this purpose, we rely on the Jupyter docker-stacks as a notebook environment and Pgadmin4 for database management. Pgadmin4 is our go-to option as a GUI for Postgres because it has mapping support for spatial data.

All these web applications are containerized and can be easily deployed using docker-compose. Note that we are are mounting the docker socket here so that we can achieve a docker-in-docker functionality, which gives users full docker access through the Jupyter terminal and the code-server terminal.

We are using our custom images here for code-server and jupyter because we want to streamline installation for some dependencies crucial to our development, such as docker-cli, psql-client-common, postgis, gdal, and Ipyleaflet.

Since we want users to be able to edit and develop the same set of files in Jupyter, code-server and VSCode remote development, we created volumes for folders in code-server (/home/coder/project)and Jupyter (/home/jovyan/work), and their corresponding local folders are mounted to /home/<username>/project folder on the bare-metal server.

This ensures any edits locally in /home/$USERNAME/project will sync directly to the project folder in code-server and work folder in Jupyter, the reverse also holds true.

Using nginx for reverse proxy and certbot for ssl certificates, we can easily serve Code-Server, Jupyter and Pgadmin over https. Note that for Jupyter and Code-Server, we need special configuration for nginx because the terminals are connected through websocket.

These cloud-only browser based development environments proved to be extremely helpful for on-boarding new members. Instead of complicated and time consuming system configurations, we just deploy a dedicated cloud development environment for each member of the team and immediately they are ready to contribute code that can flow into production without any hiccups.

These applications also helps in bridging gaps between PC based teams and Cloud based teams. PC users, who usually face difficult installation issues, can now open these applications in Chrome and create desktop shortcuts, then use them as if they are locally installed.

--

--