
ChatOps Part 1: An Introduction to ChatOps
If you work in a DevOps or Infrastructure role, or are even remotely active in the tech blog community you have probably run across the hot topic of Chat Operations (ChatOps). Whether you’ve actually happened upon it in the wild, or would know that you had if you did is a completely other story.
So to start — quite simply, what is ChatOps? Who coined it? What is it good for?
ChatOps is believed to have been coined by Github and can fall into a few basic categories.
1 Vendor Powered notification systems such as VictorOps, PagerDuty and DataDog
These tools have a broad range of uses — from orchestrating on-call rotation in organizations and escalating software level bugs and incident reports, to connecting to Application Performance Management (APM) solutions and letting engineering teams know when there are server or application level performance degradation.
2 Tools that integrate directly into chat platforms such as Slack, Discord and the soon to be defunct HipChat
This is where we are going to spend most of our time in this article, and since our company uses Slack, we are going to be referencing it.
Many of us know these integrations as bots, or as they’re known in the Slack community Apps. Whichever name you prefer, they normally provide functionality or discrete operations to specific integrations.
For example the Google Drive Slack integration, which, you guessed it — empowers users of Slack to interact directly with items in their drive from the comfort of their chat window.
At EquityZen we aim to address daily workflows in a way much similar to the other integrations that we take for granted in Slack every day.
So now that we have a better idea of what ChatOps is — the reality is — it can be whatever you want it to be. At EquityZen, with that assumption we evaluated ChatOps as a way to address the following issues that we faced daily.
1 Lowering Engineering Churn
In our traditional software development workflow, engineers spent a non trivial amount of time moving code around (promoting code between environments, checking out code per PR, etc). For local environments — engineers manually deployed from their machines (no audit trail). If something went wrong engineers had to reach out to infrastructure staff, or debug themselves. The same was true for production deployments. This required significant infrastructure knowledge. Leveraging Chat Ops to promote code effectively removed this knowledge requirement from our engineering team to a certain extent.
2. Context Switching
This one goes hand in hand with the previous point, but stresses the value derived from a developer who is focusing exclusively on creating and iterating on bug-free code. Introducing tasks into their workflow that does not contribute to that goal is essentially wasted time. Removing these kind of time-sucks can mean big performance gains, especially with a company in a rapid growth phase — where a rising number of engineers means exponential time saved.
3. Enabling Non-technical Teams
By its very nature, the product being deployed at any given time is the responsibility of our hard working Product Team. Only by necessity did engineers have to be involved in the decision of when and how to do these deploys. As an organization we wanted to enable Product-centric deploys with minimal to no engineering interference. We knew to do this confidently we would have to instill faith not only in the Product team but in the company as a whole that the software we were making was functional and bug-free.
In two follow up blog posts we will be going more in depth on how we went about specifically enabling our Engineering and Product teams. But for now let’s dig into some technical bits.
Firstly, the most important of all technical decisions: What to name the bot? Well, with our abundance of creativity, being EquityZen and all, we went with……wait for it….. Zenbot.
Here at EquityZen our core product is built on a Python / Django stack, but as all of our other DevOps-centric tools we chose to build Zenbot in Go. If you’ve never heard of Go, or have and want to know more about it head over to golang.org and read a bit more about it. It’s a really cool, highly performant language that is a blast to write code in. To quote directly from the Go documentation -
Go is expressive, concise, clean, and efficient. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel type system enables flexible and modular program construction. Go compiles quickly to machine code yet has the convenience of garbage collection and the power of run-time reflection. It’s a fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.
Ok, so we’ve chosen the name, and the language. What’s next?
We wanted:
1 Bot Deployed on Kubernetes
Kubernetes is an open-source container orchestration system. It is used by some of the most well known applications and websites for deployment, scaling and management. As an organization we have recently transitioned to Kubernetes for all our our deployed sites and microservices, and it has added a layer of stability the likes of which the company had until now not been accustomed to. This could take up another entire blogpost, and probably someday will, but to be concise Kubernetes has provided us with container-based services, resilient deployments, on demand auto scaling — or the fact that we don’t have to worry about scaling up with new traffic — this has had a tangible impact on our business teams: it’s empowered them to launch new high-profile deals whenever they want.
2. High Availability
This goes hand in hand with the above point. We wanted a bot that was highly available, redundant and scalable — and kubernetes allowed us this.
3. Key-based API Authentication
We wanted a simple way to create API keys which we could disperse to our various microservices to enable them to authenticate and talk to Zenbot. As our use case was simple we took a simple approach.
The above startup code where we used [Gorm] (https://gorm.io/) for transactional database operations and Viper for external configurations allowed us to simply create unique 32 character API keys for any number of microservices or other services that would be interacting with Zenbot. To authenticate all the microservices have to do is pass their unique API key as header in API calls to zenbot.
As Zenbot matures we will consider moving to an Role-based access control (RBAC) model using Casbin as we have with our other Go based microservices.
4 Accept incoming commands from Slack
Being a Slackbot and all, Zenbot had to be able to provide a means by which Slack could interact with it. For this we used Slack’s Real Time Messaging API through a library called Slacker. Real Time Messaging (RTM), through websockets allows an application to receive events in real time from Slack and reply back as a bot user.
First we connect to slack through RTM. Again we’re using Viper to pull sensitive information from settings. You’ll also see a reference to Logrus — which is a wonderful structured logging library we use through our Go applications.
Once connected to Slack through RTM we need to be able to receive commands and parse them.
For this we’ve created a struct to encapsulate the entire anatomy of a command. You can see those commented inline below.
The structs are then populated and added to command lists. Below you can see a command that lists deployments. The struct includes an auth function which restricts commands to users in specific Slack groups ( here “Engineering” and “Product” only) and also defines the function which executes the command logic.
We loop through the command structs we have populated, and assign them as actions the bot can execute.
Finally, an example implementation of the command to list deployments. As you can see the comments in this code reference KEDS, which is a microservice we are running to do deployment pipelines. In this case, KEDS returns information about current deployments and Zenbot outputs these to users. In further blog posts we will go into detail about how our engineers use this in their daily workflow.
5 Pin Specific Slack based commands to particular groups within Slack (i.e. Engineering team can run one set of commands, while Product team can run another, and business users yet another)
As you saw in the above code snippet we had the ability to pin functionality to specific groups within Slack. This allows us to have subsets of functionality that only certain teams can access. I’ll post the code again below.
6 Connects to other integrations and microservices
As you’ve seen above Zenbot has the ability to reach out to other integrations and microservices — for example our deployment pipeline microservice, KEDS. Because every project needs a fun little easter egg we also added the ability to reach out to http://programmingexcuses.com/. You can see below Zenbot delivering an excuse in action.

Some of the libraries we used
- Gorm — Absolutely fantastic ORM library
- Gin — Super powerful HTTP web framework
- Viper — Indispensable configuration library
- nlopes/slack — Slack library
- Logrus — Essential logging library
- Slacker — Slack RTM Library
That brings to a close the first in a series of three posts on ChatOps. Please stay tuned for the following two which will cover how we empowered our Engineering and Products team with Zenbot. Thanks for reading!
If you’re interested in joining our team check out our open engineering positions here.
