Introduction to K8s at D11s (w/o yaml)
At Descartes Labs we currently ingest about 25k images every day. By which I mean we download, recompress and store (in Google Cloud Storage) about 10TB of raw satellite data every day. This allows us to provide a single, unified interface to many different datasets including:
- Landsat (4, 5, 7, 8)
- MODIS (Terra, Aqua)
- Sentinel (1, 2, 3)
interactive graph and data of "Count processed (23747790) from 08/09/2015 to 08/09/2017…plot.ly
As of 2017–08–09 we have an archive of 23,747,790 images, representing 811.7 TB of compressed jp2 or approximately 8 PB of uncompressed data. Compression varies by data product and individual scene but we get about a factor of 10 on the average. This data is stored in Google Cloud Storage and accessed internally using our FUSE implementation where we have demonstrated: “…aggregate read bandwidth of 230 gigabytes per second using 512 Google Compute Engine (GCE) nodes” https://arxiv.org/abs/1702.03935
Ok, Great! We have all this data, now how do we access it? Well, Google Cloud Storage is essentially an infinitely scalable key/value store, so if you know the exact key, you can go right to the data. So, we just need some way of finding keys for the images we want to retrieve. To solve this problem we will introduce the notion of a “sidecar” service. That is, it will not store the actual data; just the metadata, please.
This metadata will include both spatial, “Where is it?” and temporal information, “When was it acquired? When was it published? When did we process it?”. Further, since we want to support a mixture of public and private data, access control is an important feature. Finally, since we can’t be sure a priori what attributes will be available for all products, we want the system to be easily extensible.
PostGIS is ideally suited for this problem. We will make one small table for our imagery products (with a few tens of entries) and then one large table for all the metadata (with tens of millions of rows). We can use a GiST index on our geometry field for efficient spatial queries and regular B-Tree indexes on our temporal fields; the acquired, published and processed dates.
Additionally, we can create a “groups” array column (for access control) with a GIN index and a “data” JSONB column as the catch-all field (for any remaining unstructured data). A complete reference implementation of this schema (as a Django application) is available here. “Runcible” is a simple, yet effective solution for spatio-temporal indexed search, which we have used internally at Descartes Labs for ~18 months; having served over 100 million API requests.
Django is a high-level python web application framework. In this case, it provides a mechanism for mapping from python objects to database relations and operations (via filters) as well as a url routing mechanism to provide “JSON over HTTP” web services. Think “RESTful”, but without the higher level of rigor with respect to resource definitions. Note that we also bend the definitions of the HTTP verbs. For example, we sometimes use a POST where a GET request would be more appropriate in order to send arbitrary geometries over the network to be used in spatial queries.
Runcible provides three basic operations over our metadata. The “summary” operation allows you to discover what is available for a particular spatio-temporal query before you actually retrieve the data. “Search” performs the actual search (with paging) and “get” returns the metadata for a particular key.
Here at Descartes Labs, we have been using the microservice architecture in building out our platform. If you are…medium.com
Runcible is a simple WSGI (web server gateway interface) application that can be easily containerized using Docker and run using Kubernetes. One overarching design goal is to make the application as simple as possible (but no simpler). In particular; authentication, caching and load balancing will be off-loaded to other parts of the system.
In order to follow along at home, you will need to create a Kubernetes cluster somewhere. We use (and can recommend) the Google Cloud Platform, which makes it point-and-click easy (you might even be able to do it for free). You will also need the
kubectl utility (most easily installed using the Google Cloud SDK).
In order to deploy our service we first build a container. The container is configured to run the application using gunicorn, a python based WSGI server. You can tag the container during or after the build process. Here I am tagging the container using a public Docker Hub tag during the build process. Just change the tag to reference your own container registry.
$ docker build -t aliasmrchips/runcible:latest .
At this point you can test the container locally using Docker. Note that the service uses dj-database-url to externalize the database settings so we need to pass that in as an environment variable in addition to exposing the service port.
$ docker run --expose 8000:8000 --env="DATABASE_URL=..." aliasmrchips/runcible:latest
To test, a simple
curl http://127.0.0.1:8000/products should suffice.
In order to run on our cluster, we need to first push the container to a registry (Docker Hub or Google Container Registry, for example).
$ docker push aliasmrchips/runcible:latest
Finally, we can run the container using Kubernetes. The
DATABASE_URL can be the same as above (depending on where your database is hosted). If you are using Google Cloud SQL (also recommended), you can use the excellent GCP SQL Proxy Helm Chart to create a Kubernetes service for the database connection. Note that these commands will work as is (even if you didn’t build your own container), since the referenced image is actually available on docker hub. The service won’t be functional until you create a database, though.
$ kubectl run runcible --image=aliasmrchips/runcible:latest --env="DATABASE_URL=..." --port=8000
$ kubectl expose deployment runcible --port=80 --target-port=8000 --type=LoadBalancer
$ kubectl autoscale deployment runcible --min=2 --max=20
The first command will create a deployment on the Kubernetes cluster. By default the deployment will create a replica set with a single pod using the specified image. The second command will create a service on the cluster that can be used to route requests to the application running on the deployment’s pods. The
type=LoadBalancer argument directs the cluster to create an external IP address.
This is different from what we actually do in practice at Descartes Labs; we use
type=ClusterIP (which is the default) and route requests internal to the cluster from our gateway proxy infrastructure. Finally, we could manually scale the deployment to create more pods (or release resources). However, we don’t want to do this by hand, so instead we create a horizontal pod autoscaler (HPA) using the autoscale command that will create and delete pods based on load.
To test the scaling of our system, we will create an example “User” using that will leverage the Descartes Labs python client, configured to use the metadata service we set up. This artificial user will select a random shape, date range and product, get the summary (i.e., how many results are available) and then page through the results (100 at a time); wait 1 to 10 seconds and then repeat. The load test itself will be deployed to the same cluster (or preferably a secondary test cluster) and run in the same way as the service. Here, however, we will manually scale the load generator to create ~1000 such users and then run this configuration for an hour or so.
This load results in 20 runcible pods (our max) servicing ~4000 summaries per minute and ~2000 searches per minute (not all random input combinations return actual results). On top of that, we can create a “burst” user that does 1000 random requests with no delay. This takes ~1 minute to complete, under load or not. When the load test is complete, the cluster will scale back down to our minimum configuration of just two pods.
To clean up, just delete the deployment (and the service)… or, if you are done with Kubernetes, just delete the cluster (how cool is that?). You can always make a new one when you need it.
This is a written version of a talk from FOSS4g 2017. Here are the slides…