Launching Our Public Beta

Cliff
The Opsee Blog
Published in
4 min readMar 30, 2016

--

It’s been a little over a year since we started Opsee. We knew that the changing landscape of code, the adoption of the cloud and containerization, would precipitate the need for better monitoring tools. Not just because the infrastructure is different, but because these changes are putting infrastructure in the hands of developers. All the while our colleagues at other monitoring companies have been busy ingesting more data, building ever more baroque dashboards, and generally adding complexity in their never ending quest to build a “single pane of glass”. We, however, suspected that a different approach was called for. So we started talking to potential customers, lots of them. As we talked to developers and operations teams a few themes quickly became clear:

  1. Most production incidents are fairly routine. Teams develop technical debt that often manifests as bad behavior in production: deadlocks, memory leaks, and other more mysterious lockups. When the survival of the company depends on shipping new features, this debt gets relegated to being managed through the on-call shift.
  2. Existing tools weren’t designed for developers. Monitoring has traditionally been built by and for operations people, with a focus on collecting more data, novel visualization techniques, and building bigger dashboards. Fine goals if you’re building a product for someone who’s paid to look at those graphs, but it’s a disorienting nightmare for a developer who’s getting notified of an issue.
  3. Existing tools are too hard to install and maintain. Many larger businesses address this with full time ops or devops teams, however smaller companies often do not have this luxury. Remarkably, we found that even in large companies there are teams that cannot get the time of day from internal IT Ops.
Monitoring for on-call developers should be simpler

Since then we’ve been building the Opsee product and making the tradeoffs and decisions that we think best serve our target user, the on-call developer. One of the earliest tradeoffs we made was to make Opsee AWS only. It narrows our applicability, certainly, but it also allows us to automate things other products can’t: give us a pair of AWS keys and we can get your entire environment monitored automatically.

Just add our instance to cover your environment

Another early call we made was to abandon the traditional monitoring requirement of needing a software agent colocated in every system that a customer runs. Instead we decided that our software could run isolated on its own EC2 instance, shielding the customer from the risks associated with a more invasive agent.

And we also decided to focus on health checks as the means of monitoring services. Metrics are great, but they are too noisy of a data source to alert on. If we want alerts to be clear and actionable then their cause cannot be vague, it needs to be a very clear cut “is this thing working or not?”

Health is more than status codes

Of course, in a real production environment developers care about more than just whether or not a service is responding. Opsee’s assertions capability allows developers to pull out arbitrary data from a health check response and ensure that it conforms to their expectations. And of course, an alert is useless if it doesn’t help you on the path to fixing a problem. That’s why Opsee lets you restart impacted instances directly from the alert.

We aren’t anywhere near done yet, but the team has done an amazing job bringing such a technically challenging product to fruition thus far. It may seem like a cliche, but there’s a lot of truth to the statement, “It’s really complex to make something simple.” And we feel like the product does enough right now to meet the needs of a whole bunch of developers out there. That’s why we’re taking the cover off and declaring the launch of our public beta.

Through the duration of the public beta period Opsee will be free of charge. We’ll be adding tons of good stuff to the product as well, including monitoring for RDS, ECS, and Elasticache. We’ll also be adding support for multiple Opsee instances and multiple logins per account.

So if you’re ready to stop fighting overcomplicated and antiquated monitoring systems, I invite you to Get Opsee and give us a try. Feel free to leave us your feedback here in the comments, or on twitter.

--

--