Last week we attended the Velocity Conference in San Jose, CA. Velocity focuses on distributed systems and DevOps practices and tools. During the conference we heard three main themes: 1) reliability/resiliency, 2) observability, and 3) distributed system security.
Reliability/resiliency: The conference kicked off with Astrid Atkinson of Google defining the terms and suggesting best practices. She stated, “reliability is the quality of being trustworthy or of performing consistently well,” while resiliency is the ability to recover quickly from issues. In distributed systems, DevOps has to assume things are going to fail. Best efforts should be made to minimize the failure surface. To achieve five nines of availability (~5.26 minutes of downtime a year) a company needs monitoring, incident response, post-mortem/root cause analysis, testing and release procedures, and capacity planning. Importantly, a system can be more resilient than any individual part as the system can leverage redundancy, self-healing, active-active, and fail overs to prevent outages. It is imperative to take a systems-level view of the environment.
Double clicking on testing, we see more businesses testing in production. As Martin Woodward of Microsoft joked, “We all test in production because we see if our code works or not.” We believe systems are becoming more complex due to containers and functions and are increasingly distributed across availability zones, regions, and clouds. To maintain resiliency, teams can conduct testing in production through canary, blue/green deployments, feature flagging (Launch Darkly), and chaos engineering (Gremlin). Chaos engineering is a particularly powerful means of preemptively testing environments for resiliency. It helps teams decrease outages and fire drills in the middle of the night.
Observability: Observability has a few different components: log aggregation/analytics, time series database/analytics, distributed system tracing (Lightstep), monitoring, and alerting/visualization. Repeatedly during the conference, speakers noted that DevOps should measure both the system and its components to determine if an application is working. Tamar Bercovici of Box underscored that it is important to measure the correct metrics. For example, to determine the health of the database teams should execute and measure simple queries rather than CPU load, lock timeout, and error rates, which are secondary factors. Observability is only as good as the relevance and quality of the measured outputs.
Distributed system security. A few speakers discussed the challenges of security in distributed environments that include containers and functions. Unlike VMs, containers share root level access to the kernel. In turn, if one container is compromised the issues can spread across the host. Scanning images for vulnerabilities is a first step, but runtime detection and enforcement is critical to prevent malicious activity (Stackrox). One technology highlighted at Velocity was BPF (the Berkeley Packet Filter) a Linux kernel module that helps enforce least privilege for pod-to-pod traffic in the container world (Cilium).
Serverless architectures result in security benefits and drawbacks. According to Luis Eduardo Colon of AWS, there are some security advantages in serverless environments. For example, there are no unpatched servers, denial of service becomes a billing issue, and immutability eliminates compromised servers. However, there are security drawbacks as well. Security monitoring becomes extremely hard and greater flexibility can lead to a greater attack surface. Colon called out the ten most critical security risks in serverless architectures including function event data injection, broken authentication, insecure serverless deployment configuration, and over-privileged function permissions and roles. We believe another important threat vector are insecure third-party open source libraries used in functions that can allow malicious actors to include backdoors.
Security deployments in serverless environments are also different. Operators can’t instrument the kernel or run an agent to secure a function. Instead developers and security professionals need to embed policy libraries within the function. We are excited by next-generation application security startups that can address heterogeneous environments from VMs to containers to functions.
Velocity Conference is one of the best vendor-agnostic DevOps focused conferences. This year speakers and attendees highlighted reliability/resiliency, observability, and distributed system security. We look forward to speaking with startups that solve challenges across these categories.