Scaling the Po.et Infrastructure
Building a decentralised application that is scalable and quick to turn around is no simple task. Amongst the business objectives of the application, there are various technical objectives that need to be met.
When planning the move to mainnet, there were several core pieces the Po.et team needed to plan for and get in place. The ability to deploy a platform that could address all of the business and technical objectives was paramount to the product we were shipping. Addressing all the business and technical objectives, while providing a great developer experience for the team and reliable service for our users was priority number one.
Beyond that, there were several things we needed Po.et to be; Not only did we need our infrastructure to be fault tolerant (and highly available), but it was also important for it to run efficiently without over provisioning resources. Moving fast and shipping multiple iterations of our code per day (or hour) was key and it was paramount that our infrastructure was nailed down from a security point of view. But more imperative than all of this, we needed to ensure we could move with confidence, and without breaking any functionality of our platform.
It’s no secret that we have a lot of moving parts to our platform, and microservices provide us a way to ship code fast and securely — specifically, we decided to deploy a microservices architecture running on the Docker, orchestrated with Kubernetes (K8s) and Helm.
Operating Kubernetes clusters allows us to run, monitor, and scale each individual component that makes up the Po.et protocol in an elegant manner. This allows the engineering team to focus on writing solid, tested code, with confidence that our continuous integration and continuous delivery pipelines will handle the deployment iterations to our environments.
We run multiple environments of our Kubernetes clusters, which are identical in specification. Kubernetes (commonly stylized as K8s) is an open-source container-orchestration system for automating deployment, scaling and management of containerized applications. Our quality assurance (QA) environment is configured identically to our production environment. To achieve high availability, we run a multi-master, and multi-node K8s cluster, spanning multiple availability zones (AZ). This helps us mitigate downtime should an AZ become unresponsive or suffer severe outages. Anyone who is familiar with the Fallacies of Cloud Computing should be aware that cloud networks are not always reliable. Networks do fail. It’s not a matter of if, but rather when. So designing cloud infrastructure with failure in mind is imperative in today’s world, especially when tolerable up-time is expected to be no less than 99.999%.
All of Po.et’s Kubernetes nodes in private subnets, inside a Virtual Private Cloud (VPC) on Amazon Web Services. We then have public/utility subnets which are in the public zone. This subnet allocation allows us to expose certain services publicly to the world, such as the Po.et API. We can also secure and manage all the internal services inside the VPC in an easier manner.
Our microservices are run on K8s nodes, and in some cases we provision single-tenanted nodes in the cluster for services that run sensitive workloads, such as Hashicorp’s Vault. Kubernetes provides a great mechanism for this, using taints, tolerations, and node affinity. Implementing these constraints allows us to restrict to which nodes certain pods can be scheduled.
Po.et chose Helm to manage our Kubernetes applications and deployments. Helm charts help us define, install, and upgrade even the most complex K8s application. The biggest benefit of using Helm is the templating aspect of Helm charts. We can define one chart for a service, and then parse in varying configurations tailored for each environment. This allows us to keep the parity between services deployed to multiple environments even more tightly bound, with very minimal deviation between them.
By keeping the gap between development and production small we can achieve seamless continuous deployment of our services.
- Making the time gap small: a developer may write code and have it deployed hours or even just minutes later.
- Making the personnel gap small: developers who wrote code are closely involved in deploying it and watching its behaviour in production.
- Making the tools gap small: keep development and production as similar as possible.
Our QA and PROD environments seldom drift far apart in terms of versions of software running:
$ helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
frost-api 129 Tue Dec 4 17:41:19 2018 DEPLOYED poet-frost-api-1.2.0 1.24.8 default
node-regtest 6 Mon Dec 10 21:55:55 2018 DEPLOYED poet-node-1.2.4 2.14.1 default
node-testnet 95 Mon Dec 10 21:56:30 2018 DEPLOYED poet-node-1.2.4 2.14.1 default
$ helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
frost-api 7 Thu Nov 29 15:46:58 2018 DEPLOYED poet-frost-api-1.2.0 1.24.8 default
node-mainnet 21 Mon Dec 10 21:55:22 2018 DEPLOYED poet-node-1.2.4 2.14.1 default
node-testnet 21 Mon Dec 10 21:56:24 2018 DEPLOYED poet-node-1.2.4 2.14.1 default
The Twelve-Factor App
The twelve-factor app is a methodology for building software-as-a-service apps. The fundamental concepts of the 12Factor are:
- Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
- Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
- Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
- Minimize divergence between development and production, enabling continuous deployment for maximum agility;
- And can scale up without significant changes to tooling, architecture, or development practices.
To achieve this, the 12Factor App is divided into 12 Factors (Chapters):
- Codebase — One codebase tracked in revision control, many deploys
- Dependencies — Explicitly declare and isolate dependencies
- Config — Store config in the environment
- Backing services — Treat backing services as attached resources
- Build, release, run — Strictly separate build and run stages
- Processes — Execute the app as one or more stateless processes
- Port binding — Export services via port binding
- Concurrency — Scale out via the process model
- Disposability — Maximize robustness with fast startup and graceful shutdown
- Dev/prod parity — Keep development, staging, and production as similar as possible
- Logs — Treat logs as event streams
- Admin processes — Run admin/management tasks as one-off processes
The Po.et engineering team rigorously implements the Twelve-Factor App. By extracting configuration from the code base and storing it in the environment, we can can run multiple variations of the same code, with different configuration. We are almost completely 12Factor compliant, barring some statefullness that exists in the Poet node. There is always room for improvement, and this will be something that we will get right in the near future.
We have a very a modest Kubernetes cluster running our production work loads. All of our services (including our Network Operations Services such as Prometheus, Grafana, Fluentd) run on our cluster nodes. We partitioned nodes into various instance groups, (Core Services, Monitoring, Vault, etc), and have tailored each instance group to meet the requirements of the pods that it will ultimately be running. This allows us to configure the compute resources more effectively depending on what services are going to run on them. Some services are more CPU intensive, some are more memory hungry. By separating the instance groups in this manner, we can determine the best machine specs for the workloads.
All instances are deployed using auto-scaling groups and launch configurations, with a set minimum and maximum capacity. This simple configuration maintains our cluster fleet at a desired capacity, allowing room for scale out events to occur at the compute layer.
In conjunction with horizontal pod autoscaling (HPA) we have the ability to scale out the number of pods in a replication controller, deployment or replica set based on CPU and memory utilization. This will ensure that we utilize the existing compute resources available before needing to scale out the compute layer.
The Cluster Autoscaler is a standalone program that adjusts the size of a Kubernetes cluster to meet the current needs. Cluster Autoscaler increases the size of the cluster when there are pods that failed to schedule on any of the current nodes due to insufficient resources.
Horizontal pod auto-scaling combined the Cluster Autoscaler handles our scale out requirements very well for now, allowing us to run our cluster efficiently and scale out/in autonomously to meet the resource utilization requirements of our services. In other words, our services will scale out automatically to handle more requests, and our cluster size will scale out providing more compute resources, essentially giving us unlimited capabilities in terms of scaling.
The Network Operations stack Po.et runs on is comprised mostly of Open Source projects such Prometheus, Grafana, Fluentd and Elasticsearch.
Prometheus is an open-source systems monitoring and alerting toolkit. It is extremely extendable, and you can configure it to scrap pretty much any metrics you want with only a small amount of initial effort. The concept is to export your metrics via an endpoint, and then configure Prometheus jobs to scrape these endpoints for the metrics you are exporting. Using this approach, we export node metrics (CPU, Disk Usage, Memory etc), Kubernetes related metrics (Cluster wide, as well service / pod specific). All our metrics are then graphed in Grafana, to provide visualisation of the vast amount of data we have. We have dashboards for literally every component of our infrastructure, from up-time metrics, EC2 instance metrics, to RabbitMQ message queue metrics.
It is of high importance to base decisions around our platform and infrastructure on relevant metrics. With quantifiable metrics driving a lot of the decision making, we can monitor and track our improvements going forward.
For our log aggregation and analysis, we using the EFK stack, which is Elasticsearch, Fluentd and Kibana. Fluentd performs a crucial part of logging in a container based deployment. Our applications generally log to stdout, and Fluentd collects this data, and ships it off to our Elasticsearch instances for ingestion and analysis. Kibana is a great tool for visualising data gathered from our logs. We specifically use Kibana for visualisation our performance by analysing response codes in our logs. We also use it of SIEM events in our AWS account, to account for all API calls made to our AWS accounts. We can view all operations that were made, and implement threat intelligence by monitoring the types of events, the locations these requests were from, and the frequency of these events.
Ahead of mainnet, we were mindful of ensuring the Po.et protocol was put through its paces; this means we were sure to put our cluster through a rigorous load testing exercise prior to the launch. We were able to put together together a test environment that would be as close to what we would expect other node operators to run. This allowed us to benchmark our platform without any scale out events occurring to identify what our baseline utilisation will be. Load testing in a statically defined set of services allowed us to also identify shortcomings of some of our services’ ability to scale. With these quantifiable metrics on hand we optimised our code base and infrastructure to reduce some of these limitations.
However, when our load testing yielded an overall success in the current infrastructure, we were able to verify that we could create 1,000 user accounts concurrently on the infrastructure with no scaling enabled.
Running a load test on retrieving works through the API also proved to handle 1,000 concurrent users retrieving works, with 0% failed requests over the 5 minute test period. We can also comfortably handle POST’ing 7,000 works per hour on infrastructure without the need for scaling out.
The results of our testing bode well for the future-proofing and scalability of our infrastructure. We can simply enable HPA and Cluster Auto-scaling to meet our needs accordingly, without any human intervention, allowing our engineering team to focus on developing new features and shipping code more frequently.
We are pleased with the current performance of the infrastructure and are confident that we will be able to scale out to handle the anticipated workload as the Po.et Network grows. All of this is able to be achieved while still being able to ship fast and deliver results — important and key to our mission. We invite you to keep hashing to mainnet and stay tuned for what our team has in store.
- Wesley Charles Blake | Sr DevOps @Po.et