Image for post
Image for post

In October we cohosted a special Test in Production session with Honeycomb.io. Since our monthly session fell one night before Halloween, we decided to invite speakers to share their scariest stories of operational outages and other horrors. Though stories of epic failure can be hard to hear, they are invaluable learning experiences.

Paul Biggar, CTO and Co-founder of Dark, former CEO and Co-founder of CircleCI, tells a horrific tale of how a massive security breach nearly ended their company. As a result of Adobe getting hacked, Mongo HQ was compromised, and as a customer, CircleCI was also affected. …


Image for post
Image for post

Reliability of a service is a combination of availability and correctness. At LaunchDarkly, reliability is our number one priority. Over 40 billion feature flags are evaluated on the LaunchDarkly platform every single day, and our customers count on us to deliver. Learn how we did it.

When evaluating a service like LaunchDarkly, you may have questions about how that service will behave under worst-case scenarios. At LaunchDarkly, this is something we have thought about — failure and worst-case possibilities. Over the last four years, we’ve worked hard to create a service that’s incredibly robust against different kinds of failures. …


Image for post
Image for post

At our July Test in Production Meetup we focused on testing systems and processes at scale. Girish Patangay, Manager and Engineer at Facebook, kicked off the evening with a talk about how Facebook manages different versions of its platform in the real world.

“This talk is about how not all Facebook looks the same. Even in this room, all you folks have different versions of our software, mainly because of Gatekeeper. So if you think about this, each person just has a slightly different version and the way we use Gatekeeper, even today, is almost everywhere through our entire stack.”


Image for post
Image for post

In June we focused our Test in Production Meetup around chaos engineering. Nora Jones, Senior Software Engineer at Netflix, kicked off the evening with a talk about how Netflix tests in production.

“Chaos engineering…is the discipline of experimenting on production to find vulnerabilities in the system before they render it unusable for your customers. We do this at Netflix through a tool that we call ChAP…[It] can catch vulnerabilities, and allows users to inject failures into services and prod that validate their assumptions about those services before they become full-blown outages.”

Watch her talk (or read the transcript) below to…


Image for post
Image for post

At our March Meetup Isaac Mosquera, CTO at Armory.io, joined us to talk about canary deployments. He shared lessons learned, best practices for doing good canaries, and how to choose the right metrics for success/failure.

“The reason it’s pretty hard is because it’s a pretty stressful situation when you’re deploying into production and you’re looking at your dashboard, just like Homer is here. And he’s trying to figure out what to do next. Does this metric look similar to this metric? Should I roll back? Should I continue moving forward? Is this okay? …


Guest post by Isaac Mosquera, Armory CTO and Co-Founder.

My co-founder DROdio likes to say “Business runs on relationships, and relationships run on feelings“. It’s easy to misjudge the unseen force of human emotions when changing the culture of your organization. Culture is the operating system of the company; it’s how people behave when leadership isn’t in the room.

The last time I was part of driving a cultural change at a large organization it took over three years to accomplish with many employees lost — both voluntary and involuntary. It’s hard work and not for the faint of heart…


Companies exploring feature management are looking for control over their releases. A common theme is using feature flags to rollout a feature to a small percentage of users, or quickly roll a feature back if it is not working properly. These future flaggers also seek to control a feature further by limiting its visibility to individual users or a group of users. In this piece, we’ll explore how LaunchDarkly lets you control your releases in these ways.

When launching a new feature, or simply an existing feature, everyone to receive the same experience. When working with a new front-end marketing…


Image for post
Image for post

Last week I attended the QCon London conference from Monday to Wednesday. It was a thoroughly interesting and informative three days. The sessions I heard ranged from microservice architectures to Chaos Engineering, and from how to retain the best people to new features of the Windows console. But there was one talk that really stood out to me — it took a hard look at whether we are Software Developers or Software Engineers.

QCon London is a conference for senior software engineers and architects on the patterns, practices, and use cases leveraged by the world’s most innovative software shops.”

QCon…


Image for post
Image for post

In February Steven Czerwinski, Head of Engineering at Scalyr, spoke at our Test in Production Meetup. This session was focused on monitoring and observability while testing in production, and Steve shows why he feels monitoring is an important element within that process. If you’re interested in joining us at a future Meetup, you can sign up here.

Steve presented a case study around latency issues a Scalyr customer recently faced. He shares how his colleague, John Hart, explored the issue, and then reviews some key learnings realized after the event.

“Monitoring is so important to testing in production. I want…


Image for post
Image for post

In February, we invited New Relic Developer Advocate, Clay Smith, to our Test in Production Meetup to talk about instrumenting CI pipelines. If you’re interested in joining us at a future meetup, you can sign up here.

Clay took a look at the three pillar approach in monitoring — metrics, tracing, and logging. He wanted to explore what tracing looks like within a CI pipeline, and so he observed a single run of a build with multiple steps kicked off by a code commit.

“I wanted to try and apply some of this stuff to understanding AWS CodePipeline that I…

LaunchDarkly

Empowering all teams to deliver and control their software.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store