Building Heroku ChatOps for Slack

The development and design decisions behind Heroku’s new Slack app

Teams have been using chatbots to manage repetitive tasks for decades, from Eggdrop bots on IRC, to Hubot on Campfire, Slack, and HipChat. At Heroku, we wanted a way to release and manage our applications from a conversational interface, so we began developing an internal ChatOps tool.

Heroku ChatOps brings the operational processes that happen behind the scenes, on a single engineer’s laptop, to the forefront in a collaborative interface. Teams use shared tooling to produce a transparent workflow. When Heroku migrated to Slack last year, an opportunity arose to share this product with Heroku customers.

With ChatOps, we’ve streamlined the tasks of releasing your pipeline-based applications, viewing their latest releases, and receiving CI notifications, all within Slack.

Pipeline notifications and deployment from Slack

As the engineers who worked on this app, we feel the Slack platform afforded us opportunities we never would’ve been able to accomplish without a modern messaging system. In this article, we’d like to give you a peek into how we extended the Heroku platform by creating a Slack application. We’ll cover why we chose to use Slack’s slash commands, give an architectural overview of the application, and walk through some security considerations and error handling we encountered when building our app.

Slash commands

When a team installs Heroku ChatOps, we provide a set of slash commands to help manage your Heroku applications from inside Slack. Unlike a Slack bot which constantly polls for messages hoping to be helpful, slash commands extend CLI-like functionality to your Slack channels. We found this approach compelling because we didn’t need our users to grant full access to their Slack messages; from an operational perspective, we wanted to work with as little customer data as possible and focus on providing users declarative tools to use on demand.

We configured an endpoint in the Slack UI that points to our Heroku ChatOps Rails application: when someone types a Heroku ChatOps command, like /h promote my-pipeline, Slack will POST the command they submitted and additional metadata to a callback URL in our application. Authorization to this callback URL is handled by a shared secret between Slack and our ChatOps application, which validates that the user, command text, and team is coming from Slack and not a malicious third party.

Multi-OAuth

Accountability has become more and more important to Heroku with our recent Heroku Shield offering. Since we use Heroku ChatOps internally to deploy many of our own apps, it’s important to know who introduces changes into the system in order to meet our own compliance requirements. We need to attribute any API calls we make to the initiating user instead of creating all-powerful superusers within Heroku and GitHub.

Instead of building our own access control inside the ChatOps Slack app to manage user capabilities, we prompt people in Slack to authenticate directly with both GitHub and Heroku. We can store tokens and later act on that user’s behalf so that any action is linked back to them. These tokens are encrypted at rest with RbNaCl and refreshed every six hours.

A user’s access to Heroku Pipelines and GitHub repos is managed through the Heroku Dashboard and GitHub, respectively. This way, if a user leaves their company and their access to Heroku is revoked, that change is immediately reflected in Heroku ChatOps – and that person will no longer be able to deploy their company’s application from Slack.

Command processing flow

When the Heroku ChatOps application receives the POST from Slack, we store the command in our database and kick off a Sidekiq job to execute it.

Slack gives us only 3 seconds to acknowledge their initial POST, so we send back an HTTP 200 OK response immediately to let Slack know that we received their message. This acknowledgement isn’t to be confused with the response we send to Slack, which is displayed to the end user — that will come later.

The job we enqueued earlier is then tasked with parsing, and ultimately executing, the user’s command.

Command processing architecture

Pipeline notifications

ChatOps users can route custom events to specific Slack channels with the /h route command, helping teams set up a default Slack channel to house pipeline-related events. Here, we focused specifically on common events from our own day-to-day workflow — specifically, GitHub’s pull request and commit status events (like passing CI tests).

An example of a pipeline notification in Slack

We were able to build on existing functionality at Heroku, which already receives relevant GitHub webhooks for Heroku users’ pipelines. We receive messages over a Kafka-powered stream that we can use to match up your events and route them to the correct channel based on the pipeline.

At installation, we provision a chat:write:bot token which allows us to write to your Slack channels without any initiation from a team member.

Pipeline notification architecture

Must love regex

How do we parse the command a user submits? It’s tricky when users can type /h followed by any text.

Heroku ChatOps uses some complicated regex to distinguish between different kinds of user input. Our most intricate regular expressions are over 100 characters long. We keep these under control in a few different ways.

One way is through the single responsibility principle: we treat each command individually. First, we separate out the command itself (e.g. promote) and route it to a unique parser class. The parser’s sole job is to separate the rest of the command into individual pieces.

Breaking it down this way lets us easily write unit tests to ensure different kinds of inputs do what the user expects, like /h promote my-pipeline, /h promote my-pipeline from staging to production, and even /h promote this is an invalid command.

Instead of one giant regex string, we build our regex as a list of strings that are joined together at the end. We extract portions of the regex into helper methods with descriptive names, and describe each one with a comment. Ruby’s Regexp library allows us to capture pieces of the input.

pattern = [
"(promote)", # task
"(!)?\s+", # forced?
valid_pipeline, # pipeline name
"\s*", # optional space
"(?:from\s*#{valid_slug})?", # optional stage
"(?:",
"\s*(?:to|in|on)\s+", # to
"#{valid_slug}?", # optional downstream
")?"
]
matcher = Regexp.new(pattern.join(””))

But the best user input is one that doesn’t involve a complicated regex. Slack’s interactive messages let you generate buttons and drop-downs in Slack, so that your users can kick off interaction using a slash command, and are presented with a fixed set of options in return. While Heroku ChatOps is still very CLI-like, we’re exploring more ways to incorporate interactive messages into the app flow to improve the user experience.

Now that we’ve parsed the command and made any relevant API calls, we build up a formatted response to send back to the user in Slack. We then post back to a response URL that Slack provided with their initial POST to our application.

In order to cut down on some of the noise these notifications create, we’ve implemented Slack threads in our messages. This means we do our best to group similar messages together, such as deploying, releasing, and restarting messages.

Using threads to contain related notifications

“We’re sorry, something went wrong!”

As with most production software, there are things that go wrong. When that happens, we use Sentry, an open source error-tracking software, which helps us visualize and triage our errors.

When Heroku ChatOps receive a runtime exception, we rescue it and send it to our on-premise installation of Sentry along with the team, user, and command id to help us debug at a later time. This approach helps us quickly identify why we’re receiving errors, and whether they’re widespread or localized to a specific team’s setup. We return a custom-formatted error to our user so they have some feedback about why things did not go as planned.

We also make use of Sidekiq’s built-in retries. Sometimes we’ll see 503 and 504 responses from the various APIs we hit. These API requests get retried with exponential backoff, as well as logging, to Sentry – so we can keep an eye on how widespread an API problem might be.

The future is bright

The ChatOps Slack app was a welcome change from the previous tooling we’ve used at Heroku. Suddenly, we were able to use a simple, mobile-ready CLI that works everywhere, with very little work on our side.

We were able to build our application with a familiar Rails/PostgreSQL/Sidekiq stack. And we’re in a much better place to test, verify, and maintain Heroku ChatOps than we would have been if we had tried to build for a range of platforms.

There’s also an opportunity for us to create more sophisticated workflows later, because everything is built around Heroku Pipelines.

If you’d like to check out Heroku ChatOps, visit the Heroku DevCenter to learn more and install the app.


This blog post was co-written by the Heroku Tools Team: Corey Donohoe, Reid McFarland, Stella Cotton, Thomas Balthazar, & Yannick Schutz