Joyent CoPilot: Bringing Application Awareness to Cloud Infrastructure. Part IV.

7 min readOct 25, 2017

In the previous episodes

In Part I you were quickly introduced to the goal of the project as well as some of the technological concepts necessary to understand CoPilot. Part II covered major conceptual and structural changes and Part III focused on the sign up process, creation of a deployment group and deployment of services.

In this episode, we’ll be breaking down some of the major CoPilot components, learning how services can be assessed from the perspective of performance and application architecture and how they can be scaled.

Breakdown of the major abstractions

With deployment groups, services and instances being the most prominent structural concepts, it is worth quickly talking about what constitutes them and how they relate to each other.

As you already know, a deployment group in other words can be described as a project — a collection of services that act either as an application or a significant chunk of it and infrastructure resources underneath it. But that sounds a bit abstract. Let’s talk about how it all looks in practice. Bear in mind that the following breakdown is still under development — the more we learn about the needs of users, the more synergetic the work of the product and Joyent engineering teams become, the more knowledge and confidence we gain in terms of the structure and features.

Deployment group

In practice,a deployment group (aka project) consists of several sub-objects and tools supporting them.

Services — constructs that represent containers running the same software. Service are constituted of numerous other subordinate objects, but for now all that matters is deployment group empowers users to asses of all their service at once, providing a broader context.
Instances — all of the containers that constitute the entire project.
Manifest — a centre-piece of the deployment group that defines all of the services.
Overview — an amalgamation of activity feed and metrics, that empower users to correlate changes in application’s performance with events.
Alerts — a tool closely related to metrics, that informs users of changes in the performance of services.
Versioning — a tool that enables users to replace a manifest file with it’s previous versions — this can result in application constitution, architecture and configuration. I will refer you to my colleague for more details about this feature
Networks — this sub-section of a deployment group enables users to configure underlying infrastructure — VLANs, IP networks and Firewalls.
People — in the future releases, this component will be responsible for management of users and their roles within a deployment group.
Settings — a collection of features related to billing and other yet uncategorised activities

Service

A service shares quite a few similarities in terms of object composition with a deployment group.

Activity feed — a chronological list of event that have occurred with the service
Instances — a list of instances that constitute a particular service
Metrics — in-depth aggregated performance measurements
Networks — networking and security configurations of a particular service
Tags and metadata
Service manifest — portion of the manifest defining a particular service

Before we move forward, it is worth mentioning, that there are several infrastructure objects (such as storage) and parameters missing from this inventory. As we move forward and improve our understanding of the product, these piece will find their place within CoPilot.

Understanding application’s architecture

Application management is a complex activity and involves a large volume of multi-faceted data. On top of that, each user group prioritises different aspects of that data. For the sake of simplicity, we can differentiate 3 broad groups of interest:

Understanding relation between services (the architecture)
Observing immediate changes within services and understanding reasons behind them as well as acting upon them
Observing long-term changes or changes that have occurred in the past, understanding reasons behind them and acting upon them

For the moment, let’s focus on the architecture. Being able to understand how services relate to each other can be crucial in tracing sources of reduced performance or failures. It can also facilitate further development of an application, by providing engineers with a clearer understanding of software’s structure and what shape it can take further down the road.

To facilitate understanding of application’s structure, we have designed deployment group’s topology view. Topology view is a visual representation of services and connections between them defined in the manifest. Furthermore, we’ve included a simplified preview of metrics and an ability to execute actions on each service. A video walk-through of this feature will be shown in a bit, but for now, here’s a little preview:

Contextualised overview of services

The topology view proved to be useful in understanding the structure of an application, however it has limited capacity to reflect ongoing changes. It was clear that there is a need for a more granular overview of services — a solution, that would empower users to understand the state of their services without losing a broader context of the overall application.

This was made possible by designing an alternative to the topology view — a list view of services with content-dense components — called service cards — representing each service.

A service card is constituted of several items:

Name of the service
Number of instances. This element is meant to represent the desired number of service instances (defined by the user) and their actual number. Users scale services up or down, defining what number of instances is necessary for an optimal performance of a service and application. Instances can become unresponsive or fail, resulting in fewer instances than desired. It is crucial to be aware of these changes.
Data center. This component explicitly informs users of the geographical location of the host hardware.
Actions menu. This menu enables users to perform service-level actions such as stopping, starting, restarting, reprovisioning, updating, changing a version of and scaling services.
Service’s health. First of all, it is worth pointing out, that the definition of ‘health’ is an ambiguous topic and its definition can be delegated to whoever is managing the infrastructure and the application. In other words — “if you know what you are running, you should know what healthy or unhealthy means”. However, Joyent is working in enriching Triton and CoPilot with some awareness of health. For the time being, it is important to understand that a container’s ability to run (being healthy from the perspective of the infrastructure) and its ability to receive requests (being healthy from the perspective of the application) are two separate things. Hence for a service to be deemed healthy it might need to meet several conditions. A container needs to come to live and run, be discoverable, to be able to receive requests and interact with other containers.
Service’s task queue. Actions such as scaling, upgrading, stopping, starting and reprovisioning performed on services can require a considerable amount of time to accomplish and need to be communicated to users clearly. This element serves as an indicator of these actions, with a more granular information available in the deployment group’s task queue ( we will hopefully get to talk about this in future articles).
Metrics. Metrics is a whole topic in itself (and will be covered shortly). All that matters for now is that within the list of services, each service card has 3 slots dedicated to performance metrics, enabling users to choose what metrics they are most interested in, providing a degree of configuration flexibility.
Differentiated groups of instances. Without going into too much depth, there are certain scenarios when some of the service instances need to be be differentiated from the others. For example, let’s say that one of the application’s services is a database. Database can have two types of instances — primary and secondary. Primary instance is the one through which other services access the database, whereas secondary instances provide backup to the primary, in case it fails. Bear in mind that there are many other reasons why a service might have different types of instances — what is important for now is that different types of instances can consume resources allocated to them differently, hence a service card has to be able represent this differentiation.

Service card differentiating groups of instance

Scaling services

We briefly covered the process of deploying services and monitoring them. It’s about time to talk about scaling. Scaling is a process of increasing or decreasing number of service instances and is related to application’s performance optimisation.

Optimal performance means keeping resources from being under-used (which would mean wasting money) or being depleted (resulting in a slower execution of tasks, unresponsiveness or complete failure). Since every instance of a service has a limited amount of resources allocated to it, a number of instances can be increased, thus distributing a higher load between them. Likewise, if resources are being underused, number of instances can be reduced.

Check out the walk-through video of the scaling process:

Phew, that was a lot of information to take in. But you pulled through — high-five! In the next part we will be talking more about metrics and monitoring, so stick around.