Application delivery to AWS with Berlioz. Introduction. Part 1.

Delivering applications to the cloud is hard. It involves multiple stages: configuration of necessary resources and rules, deployment of services, providing means for the services to communicate, monitoring of services and doing all this continuously as the code and requirements change. Today there are tools that cover each individual stage. Since those stages are not completely independent, it requires a lot of tedious manual work to provide necessary information to further stages and to pass dependency details to previous stages. This way of doing causes a lot of inefficiencies in the process and unnecessary tensions across teams.

Berlioz looks to application delivery process not as individual stages, but as a whole. By doing so, it lets services to be described in an application-centric way, bypassing lots of logistics and overall making the process less prone to errors. Berlioz delivers and runs applications on AWS. It supports containerized microservices as well as monolithic legacy VM loads, regardless if they’re stateless or stateful. Berlioz natively provides service discovery and service mesh features directly to services without proxy agents along the data path. It allows application delivery to one or multiple staging environments, to one or multiple AWS accounts. Also, for improved development team productivity — Berlioz allows deployment of applications to the local workstation as if they are running in the AWS cloud, even if they use AWS native resources like DynamoDB, Kinesis, etc. All of this is available with one command. No prior provisioning or configuration of AWS account is needed. No scripts or configurations that contain infrastructure-related information like firewall rules, security profiles (like AWS IAM or ROLEs), IP settings (routing, gateways, subnets, hardcoded IPs) are used.

This is made possible by making Berlioz an intent-driven rather than configuration-driven system. It accepts declarative application and service definitions rather than infrastructure configuration templates. The developer of the service declares the resources that the service provides, and the resources that service consumes. If you look to Berlioz definitions and equivalent definitions in any today’s modern provisioning and deployment systems you would immediately notice differences in thousands of lines even for a small sample application. But, the problem that Berlioz solves is not just to make the file shorter. In next sections below I will describe seven problem domains and how Berlioz solves them.

The first problem is that configuration is unmanageable. Configuration is a direct projection of the original intent on the current environment plane. Any change to the intent or environment makes the configuration invalid. That’s why with today’s systems it is not trivial to answer simple questions like “do I need this firewall rule?”, “is this cloud resource used or needed? if yes then who is using it?”, “can this IAM role or policy be removed”, etc?

Instead, Berlioz keeps the intent as code. During the delivery stage combines the intent with the environment and policy constraints and produces the actual configuration. Particular resources can be traced to the intent. In case of intent changes, the resources that cannot be traced back to the intent would be automatically removed. This ensures that there are no stale resources and configurations in the actual deployment.

The second problem is that configuration is unusable by the developer. When the configuration is convoluted and is full of infrastructure specifics, it is often not well understood by service owner developer and often managed by different teams like ops, devops. This creates unnecessary dependency across teams. With a high overhead cost of effort associated with the creation of new microservices, the existing microservices tend to grow in size. Eventually, this causes a monolithic application which was once broken into microservices, with time to become a set of monoliths once over.

Since Berlioz accepts only the original intent and not the concrete configuration details, it becomes a trivial task to define a new service. Also, since the definition only accepts the intent declaration, a notion of conflicts or incorrect configuration simply does not exist. A typical microservice which exposes an HTTP endpoint and uses a database only takes 9 lines in a trivial Berlioz YAML definition.

The third problem is related to service deployment. Modern tools separate provisioning and deployment stages. They use set of tools to provision, and once done another set of tools to deploy. This creates few problems. First, a disconnect between provisioning and deployment stages, and in order to mitigate requires a large chunk of configuration to be provided to both provisioning and deployment stages. This creates redundancy and is prone for error. Second, in cases when both configuration and the code changes are made in the same version it causes complications in the deployment process. Performing true canary or green/blue deployments when both config and code are changed at the same time causes a much bigger challenge. This is often solved by introducing custom provisioning scripts for version transitions, which is ugly.

Berlioz looks at provisioning and deployment stages as one. It takes service intent declaration and combines it with dynamic policies that control replica counts, versioning, and few minor metadata. The resulting rules let Berlioz know how to exactly construct resources and deploy services. This lets Berlioz to easily transition from one version to another.

Fourth problem, service discovery. I think you might have already guessed where I am going. As of today, this requires an extra set of tools and clusters to configure and maintain. Again, causes redundancy and inefficiency.

Berlioz looks to service discovery differently. The service intent declaration already knows that some ServiceA is going to communicate with ServiceB, and this creates necessary firewall rules to make that happen. It also uses the same intent declaration to feed peers of ServiceB directly to ServiceA. Berlioz will use the same mechanism to provide discovery of cloud-native resources like (DynamoDB, Aurora, Kinesis, etc.).

Fifth problem, the performance aspect of service discovery/service mesh functionality. Today’s systems provide this functionality either through a DNS or DNS-like clusters which respond to lookup requests with resolved IP address and port, or by implementing a proxy layer. This introduces additional latency, potential bottleneck, consumes unnecessary compute resources and create headache of maintenance and scaling of those clusters as well.

Berlioz believes that for services that want to communicate with each-other, the data path should be direct and there should not be any additional processes in between. Again, Berlioz takes the service intent along with current deployment information and provides the peer information directly into the microservice memory address space. This allows service peers to be resolved instantaneously within the process and any service-to-service communication to happen directly towards destination with no extra hop. This allows retries with backoff logic, client-side load-balancing(server-side load-balancing also supported), tracing, etc. This is achieved by using pretty trivial, open-sourced client-side SDK. As a side bonus, the SDK allows performing service-mesh features towards AWS native resources as well just like they are regular microservices. Service-mesh features can be controlled and turned on or off dynamically using policies.

Sixth problem. It works in the test but fails in production. When setting up specific cloud configuration it is easy to make things work in one staging environment (for example test), but get things fail in another (for example production). A finely defined green/blue deployment process that can handle config changes can avoid the outage, but there would still be stress involved in the teams.

Multi-staging environments are in the core of Berlioz. It does not differentiate between dev, test, prod, etc environments. This allows having identically performing environments within a single or multiple AWS accounts. Of course, few things like scale parameters, or public domain naming can be customized per staging environment. But other than that, deployments would produce identical results. In Berlioz, this is made possible with just a single command to deliver (provision and deploy) the entire application.

Seventh problem. It works on my workstation, but fails in test. In most of companies developers either manually run their services or use some custom scripts to run services locally during the development. This creates a disconnect between the reality that happens during the test/production, and things that happen on the developer’s workstation. Few companies succeed with building such tools/scripts, although with the cost of spending time and focus maintaining those scripts. Those tools would be completely in parallel and redundant to the production configuration. Berlioz provides two solutions to this problem.

Solution A. Developer can deploy code changes into his own staging environment to AWS. Works completely isolated, developer can play with scale parameters, etc. Initial deployment would take 3–5 minutes, mostly to spin up the instances. For subsequent code changes deployment would take a just couple of minutes, and the time would mostly be spent on pushing the images to AWS. If not using large-scale parameters, the AWS charges would be minimal or even zero as a part of free-tier. Already a very good solution, considering that there is no tool/script to maintain, and the fact that the environment is the same as the one during test/production. But Berlioz can do something even better.

Solution B. Time is precious and taking a coffee break to run the code is too much, regardless if it’s a single program, or a set of microservices communicating together. Berlioz allows developer to locally deploy application services on a local workstation, makes them available through the integrated service discovery, provisions all AWS native resources (like DynamoDB, Kinesis, etc) in AWS and makes them available to locally deployed services. It also allows all service-mesh features that are present for cloud deployments to be available locally. This makes single command local deployments to be up and running within seconds, saving developers’ precious time, concentration and improving overall productivity.

This was the overview of Berlioz. For those who scrolled all the way to the last paragraph, it’s a service that combines application provisioning, deployment, orchestration and service mesh in AWS. Some components of Berlioz are open-sourced and are available at GitHub.

I’m inviting you to try out Berlioz. It comes with multiple sample projects, starting from a simple web server, to a multi-tiered complex projects. For getting started and more technical details visit https://github.com/berlioz-the/berlioz