Monitoring for developers

Shane Harter
Crafting Cronitor
Published in
4 min readFeb 19, 2017

Monitoring and QA are the same thing. You’d never think so until you try doing a big SOA. But when your service says “oh yes, I’m fine”, it may well be the case that the only thing still functioning in the server is the little component that knows how to say “I’m fine, roger roger, over and out” in a cheery droid voice. — Steve Yegge

Ten years ago, still young in my career building software things, I got my first taste of DevOps. We didn’t call it that back then but for the first time I had to worry about how my software would be delivered to the world.

It was an awakening. A new professional class of tooling was being built in companies like Google and Amazon but things for most of us were still primitive: Usually you paid somebody to host servers that you drove to the data center in the back of your Subaru. Then you setup everything else, which is quite a bit when you really need to be on top of things. The process where I worked, and sadly I think many can relate, was just to clone everything in the most recent working system, manually updating where necessary. Automation where I found it was often a bundle of Perl scripts or .bat files that came with their own folklore. I learned that ops was just another legacy system that needed to be maintained by people who hadn’t built it.

As a developer, I was changed. I started thinking about and building ops configuration into my (Subversion) repos. Have we built a daemon? The init conf should be auto-generated. A website? Dev and prod Apache conf should be committed with the code that relies on it. I still remember where I was when I first read about Puppet 😍.

Developers have kept ourselves fed for the last 5 years in part by breaking big apps into littler apps and calling them microservices. I enjoy slaying a monolith as much as anybody. In the second decade of my career it helps soothe the guilt of having built so many to begin with. But often microservices are bespoke, one-off mini-monoliths. This is not the way to engineer good outcomes. To avoid developer malpractice in 2017 the table stakes for microservices include:

  • Conformity of build process. If your services cannot be developed, tested, staged, and deployed in a uniform way you will pay with decreased velocity and more mistakes. This is the assembly line part of your software product. It should be boring, consistent and reliable.
  • Conformity of craftsmanship. In a service-oriented architecture you will often need to make changes in multiple code bases to ship a new feature. It’s an anti-pattern for teams or developers to seal-up services to themselves. The only way that people can move from one service to the next is consistency of craftsmanship and expectations. If you don’t have a runbook, linter, or tests that run with a single command you’re doing it wrong.
  • Aggregation of errors and logs. Use an error aggregation tool like Sentry. Combine logs into Elastic Search or Splunk. Give yourself operational visibility into your distributed system.
  • Uniform visibility into the health of your services. The smaller a micro service is, the easier it is to overlook when the system is down and you’ve temporarily lost a dozen IQ points. In the chaos you might just forget about something like the single sign on redirect service. It happens. Distributed systems are just more complicated and you need better monitoring.

In the era of the monolith, these tasks didn’t really need to scale. You configured Nagios once and tweaked it every now and then when something important changed. Today, the only sensible way to do this work is in code. Tools like Chef, Puppet, Terraform, Ansible and database migrations are the result. Things aren’t as primitive anymore.

In 2014 I started working nights and weekends with a friend to launch Cronitor, a service for monitoring cron jobs and scheduled tasks. A lot of great companies use Cronitor and we’ve had fun building it. Last year on the hop from San Francisco to Portland we found ourselves talking about what to do next. Cronitor is built with microservices and discussing our own pain points we were startled by the question why can’t we just keep monitoring configuration directly within our code?

On the return flight a few days later we drafted a plan to build web service health checks and SDKs for popular frameworks that will enable developers to configure monitoring directly from application code, with updates that are published automatically when you deploy your app.

Introducing: Auto Health Checks

In the fall we launched the first part, web application health checks with assertions like response body contains "SUCCESS".

This week, we are launching our first SDK for Django. With django_auto_healthchecks, you attach monitoring directly to routes in urls.py and when your app is restarted, health checks are collected in a batch and published. Cronitor runs them immediately for post-release validation and as frequently as every 30 seconds thereafter. SDKs for Express, Laravel and Rails are coming soon.

Our goal with Auto Health Checks is simple: a version controlled single source of truth that defines the health of your app, with instant alerts when you need them. Check out django_auto_healthchecks on Github and sign up free at https://cronitor.io to get started.

--

--