Joyent CoPilot: Bringing Application Awareness to Cloud Infrastructure. Part II.
What have we learned so far?
Previously, on “Night Rider”…. In Part I we’ve covered Joyent’s background, our collective goal to build an experimental application management platform and the technology behind it. It’s about time to talk about the actual design work.
A brief musing on the process
The benefit of working closely with technology pioneers like Joyent is accepting the fact that the goal and success of the project are ever-emerging and embracing its free-form nature as part and parcel of the innovation process.
This means that the first few months of the project were dedicated to understanding Triton as it is, imagining where it could go further and prioritising what will make it successful in the hands of Joyent’s clients. With that said, the most crucial methods that kept the ball rolling were continuous collaboration with Joyent and ruthless prioritisation.
Doing some homework
To figure out what direction the design work should take, we had to understand what tools Joyent currently provides and how they relate to products and services of other companies. Understanding Joyent’s current and potential future users was another piece of the puzzle, that would eventually lead us to a better understanding of what our short and long-term priorities are.
Users
Let’s start with people who use Joyent’s products or are expected to be drawn to using them. Speciality and experience are two significant factors distinguishing different users.
Even though speciality can be (and usually is) quite fluid, with more senior people having had an opportunity to work in different capacities, certain skills can be a user’s primary expertise: software development, product management and operational tasks (to name a few).
Experience in this context is defined by the magnitude and complexity of projects a user has been engaged with and knowledge of various technologies and systems. For example, a junior hobbyist developer might have a very limited knowledge of developing an app and running it on all-in-one platform like Heroku, whereas a senior technological officer might have the experience of building and managing extremely complex applications and using products like AWS, Kubernetes, Datadog, Mesos, Terraform etc.
It is also worth mentioning that how “hands-on” a user’s role is significantly contributes to their perception of different tools and willingness to adopt them.
To keep the scope manageable, insights from Joyent and conversations with their users (as well as users of other products) helped us to define and focus on 4 user groups: full-stack developers, backend developers, senior technological officers and accounting/security auditors. Even though each of these groups can be very diverse in itself, we’ve identified and focused on certain traits that would allow us to both contain the design process and meet a reasonably wide range of needs and expectations.
For example, senior back-end developers might prefer working with command-line tools (CLI), rather than graphic user interface (GUI), as they consider the latter either limiting or obscuring information that would be more accessible in CLI or unaccommodating to their style of work. These users require clarity around understanding an application’s operations and performance as well as tracing sources of failures or any other changes within an application. Ease of service scaling (among other operations) is imperative.
Mid-level full-stack developers, with a bit less experience or wider, rather than deeper, knowledge might use GUI to learn about the product and then move to CLI. Those with less established style of work are more willing to be challenged and adopt new workflows.
Chief technical officers and other folks working in DevOps (development and operations) tend to be knowledgeable of a wide variety of sophisticated systems. They are usually comfortable around CLI, yet might prefer to use GUI as long as it adds ease and comfort as well as contributes to efficiency of their work. These users also deal with project management, billing and defining access to resources for other users.
Auditors and data analysts have little to no involvement in the development and are responsible for auditing data usage and billing. These users need to have a clear understanding of costs and billing, easily manage payments and raise questions about data consumption.
Bear in mind that these groups are not final and do not reflect the entirety of Joyent’s current and potential clients. However, you have to start somewhere, right?
Narrowing down user groups still left us with a broad spectrum of needs and expectations which required finding a sound balance between functional versatility, affordance, ease of use and overall sophistication. Understanding Joyent’s position within the market, their relation to other companies and products helped to identify what would create opportunities for the new product to gain traction (ow wow, my ‘tech-jargon’ sense is tingling). As our understanding of users improved, we’ve started studying Triton’s current platform.
Assessing Triton
Assessing Triton made us realise that even though technologically Triton was very versatile, interfacing with it lacked flexibility. Utilizing it to the fullest was possible only by working in Command Line Interface (CLI), leaving the GUI secondary if not obsolete.
However, the biggest challenge for us has been the fact that whereas Triton is really good at being cloud infrastructure service provider, application management isn’t part of it’s core features. But what does that mean, you might ask? It means that Triton provides users with computing, networking and storage resources on which applications can be built and operated, however it’s up to the user to construct and maintain the mental model of an application, provision instances and associate them with components of a specific software. ContainerPilot does provide service-oriented architectures with application orchestration, but it’s far from holistic application management. So what a holistic model would look like? Worry not, my extremely patient friends, you will find out.
Introducing services
Creating and integrating application management features with Triton meant a considerable conceptual and technological leap, that would require introduction of several new levels of abstraction. Let’s begin with the most pivotal one — service abstraction.
As you know from before, an application consists of discrete services — pieces of software dedicated to perform specific tasks — that are containerised and can be scaled up or down, resulting in a higher or lower number of instances. Unfortunately, at the time Triton was lacking awareness of application’s architecture — you could create and manage an individual container, but if you had multiple instances of the same application’s component, Triton would not be able to represent them as a single ‘service’ or as a part of a bigger whole, leaving application orchestration up to the user.
Let’s return to our previous blog website example. Imagine you have an increased traffic to your blog, you scale up one of your services (for example Nginx) and get 10 instances of Nginx running. Without the application management features, you’d be able to examine each of those 10 instances individually (their properties, performance indicators and so on) but nothing would help you to gain an insight about the overall performance of Nginx service or its relation to the whole application. No aggregation of performance, no means of abstracting all of that information about 10 Nginx instances into one object and no capacity for executing actions on a service. Measurements of performance and actions would be possible only on ‘per instance’ basis.
Combining multiple instances into one ‘service’ object and being able to execute actions on it — such as provisioning, stopping, starting, scaling — would reduce complexity around application management, making operational tasks more efficient and the application’s structure more intelligible.
Deployment groups
With service abstraction now being one of the top priorities, next question for us was what if you want to work on multiple applications and manage them separately? Or what if you have a large application, that you want to split up into smaller chunks and each of those chunks is made up of multiple services? Furthermore, what if you want to have multiple deployment environments for developing, testing, previewing and launching the application to the public? That’s hell of a lot of questions, isn’t it?
Being able to create separate groups (or as we’ve dubbed them, deployment groups) of services and related resources (such as VLANs, IP networks, storage, billing etc) seemed like the next logical step towards an effective application management. On top of that, it would also make versioning possible (learn more about versioning from my colleague Alex ). The foundation for this new type of abstraction was already in place — as we’ve learned before, a docker-compose.yml file can be used to define multi-service applications (docker-compose.yml = deployment group). To provide a familiar metaphor, think of these groups of services and resources as projects.
It feels like it’s time to return to our good old blog example. Let’s say that the blog website and the infrastructure it is running on is one project. You might want to have a second project — it will have exactly the same application, exactly the same application configuration, but scaled down (fewer instances), since you are planning to only run tests with it. You might also want to work on a completely different unrelated application — say, online store — that will also be a different project.
Another important benefit of utilising deployment groups is an ability to implement different deployment techniques such as blue-green or canary. (I will courteous cut a corner here and refer you to Google).
How are you doing? Pulse, pupil dilation still check out as adequate? Good, let’s move on to another major area of interest.
Role-based access control
Another major area of interest for us was user authentication — verification of user’s identity — and authorisation — specification of user’s access to resources and permissions to execute certain actions.
Triton’s authorisation system was loosely based on NIST role-based access control (RBAC) model. In a nutshell, this approach to authentication deals with the concept of roles. A user can have multiple roles, each role describing what actions are permitted and what resources are accessible to that user. These actions, that can be associated with a role, are called policies. It is important to distinguish, that policies are not directly assigned to a user, they can only be assigned to a role, that in turn gets assigned to a user’s account.
Now let’s talk a bit about the concept of a user. A user is a billable holder of an identity and can own certain resources. Ownership in this context means complete access to a resource and a capacity to execute any actions. Apart from users, there are also sub-users. Sub-users are identity holders that do not own any resources and require for permissions to be given to them, in order to perform any kind of action. In practice (and Joyent’s reality) this means that a sub-user is created by a user and has no autonomy. If a user is deleted, so is the sub-user. A sub-user account cannot have access to two different resources belonging to two different users accounts. This creates a considerable obstacle for clients that want to have an easy access to all of the projects they are part of, no matter who they belong to, since it would require switching (logging in and out) between multiple accounts.
Organisations
To enable Joyent’s clients to collaborate with multiple projects’ owners without the need to have multiple accounts, Triton required yet another new abstraction — organisations.
An organisation is an entity similar to a user — it is billable and can own resources. However it does not require authentication. Instead, it requires to have at least one member, responsible for its administration. Following this approach, a user can have just one account and be a part of multiple organisations and projects, within which user’s access to resources and permitted actions would be defined by assigned roles.
When push comes to shove: technological implications and CoPilot
With the new abstractions in place, it became clear that majority of the product design will need to be done from scratch, divorcing from Triton’s current platform that focuses around infrastructure management and tailor the new one for application management. Introduction of new abstractions (or objects, as we call them) like services, projects and organisation meant not only significant impact on the behaviour and the taxonomy of the product, but also the underlying technology. As the product design team was summarising the initial research findings, Joyent’s engineers were preparing for technological changes required to bring application management and improved RBAC features to Triton, culminating in project Mariposa.
The outcome of this process was the decision to design and build an experimental stand-alone product that could be launched and iterated on without waiting for the necessary technological changes to be implemented in Triton. Working in parallel on this stand-alone product, dubbed CoPilot, and Triton would enable us to potentially merge the two in the future.
What a twist, huh? Almost pulled ‘Empire Strikes Back’ on you.
Before we jump into specific areas of CoPilot and talk about how newly defined abstractions manifested as tangible features, let’s quickly discuss how the relation between the components of the product had to change (comparing to old Triton model) and enriched with new ones, to make the shift from infrastructure to application management.
The new structural model
Let’s start by quickly covering Triton’s original structure. Without the ability to groups related resources together, there’s little to no relations between the major objects such as instances, networks and storage — hierarchically the are all on the same level. This minimalistic approach to structure is not sufficient for application management.
In the new model, the most integral object is a service and the entire new structure of CoPilot is based around it, making it a bit more hierarchical.
First of all, at the top of the new hierarchy, we have users and organisations. Both users and organisations can have resources allocated them. A user can be a member of multiple organisations.
The second level is represented by deployment groups — collections of services defined by a docker-compose.yml file (could be a full application or a larger piece of it) and resources associated with them. Deployment groups can also be utilised to differentiate deployment environments.
The third level — services and their instances, networks, storage and other aspects of an application.
To understand relation between the components of the new product better we’ve created an object map of it. An object map in the context of this project allowed us to inventorise product’s components and actions that can be performed on them, uncover if there’s anything missing, understand relations between these components, group and prioritise them.
Alpha, beta and beyond
With all the major components in place, we had to identify what would be the minimal toolkit necessary for application management, what components and features would form the foundation of the product (it’s alpha version), where would we take it next (beta) and what even longer-term development would look like. We also had to make effectiveness of GUI compelling enough to be used in parallel with CLI or as an alternative to it.
After a rigorous assessment, the initial scope was narrowed down to:
- Navigation
- User authentication
- Creation of a deployment group and deployment of an application to the cloud with a docker-compose.yml (alternatives to that will be discussed further down
- Creating and maintaining a clear understanding of application’s structure
- Monitoring of application, its services and their instances
- Scaling and other service-level actions
- Task queues
Or to put it simply, the first version of the product was meant to enable users to sign up, create a deployment group, deploy services, monitor their performance and scale when needed.
To keep this as short as possible (which, by this point, sounds pretty ironic), only most prominent aspects of CoPilot will be covered in further parts. Deployment groups and services are complex objects, constituted of multiple pieces of functionality. Even though this essay focuses mostly on service monitoring and scaling (among other actions), bear in mind that managing them involves things like versioning, security configuration, user management, networks and storage, billing and much more.
- “Roads? Where we’re going we don’t need roads!“— Dr. Emmett Brown, Back to the Future.
Ok, folks, we are about to start digging into CoPilot — set your phasers to “wow” and follow me into Part III.