This is not the end, but the beginning.

Bot-to-bot communication models for Slack

Bot-to-bot communication is quickly becoming something I spend a lot of time thinking about. I’ve heard tell that this kind of contemplation will be the end of us all, but in truth, although I enjoy dabbling in a bit of amateur eschatology myself from time to time, I’m optimistic about our bot-powered future.

In particular I like to think about what it means for bots to talk to each other in the first place, and how bot communication opens the door to composing bots into larger units and workflows, and how these can make our (working) lives simpler, more pleasant, and more productive.

Why should we care about bot-to-bot communication? (Aside from dark fascinations with the bot apocalypse?) When bots can talk to each other, they can coordinate their actions (for ill, or, yes, for good—let’s stay focused on the good). And bots that coordinate can be composed into even more interesting entities. Imagine—a swarm of bots working together to make your working life simpler, more pleasant, and more productive. I find this idea fascinating.

So, what does bot-to-bot communication look like?

(Before we begin, I’d like to thank Mike Brevoort of BeepBoop for planting this seed in my head. Mike has been talking for some time now about some of the concepts I introduce towards the end. I don’t claim these ideas as my own—I just want to start a wider discussion of the possibilities!)

Let’s consider some models of communication, looking at a particular moment in time in which we have one (or more!) bots _speaking_ and one (or more!) bots listening. There are a number of ways in which we can describe the speaker-listener relationship. Because we are using the simplifying assumption of looking at just an instance in time, let’s ignore that real bot communication might be a two-way affair, just so we can get some initial assumptions sorted out.

One : one

These bots are talking. Nay, conspiring.

One obvious model of bot communication is one in which the speakers and the listeners know about each other. The speaker will directly mention or address the listener, and the listener will be listening specifically for the speaker. This model has some distinct advantages worth considering.

First, because both bots will need to know the identity of the other there is a built-in if modest guarantee about how the messages will be structured and what they will say: For example, you can be reasonably certain that a GitHub bot will always have something to say about GitHub. Likewise, when a bot knows its audience, it can be reasonably sure that they will understand what is being said. There is no need to standardize on a particular message structure or presentation (more on this later).

Second, this model of communication, (being e.g., the basis of TCP/IP), is ubiquitous and well understood. You can open up a raw TCP socket between the bots, for instance (but why would you bother with that?), or have a conversation in a private chat channel. (Which, please note, Slack does not generally allow, but let us set this aside. Maybe we are using IRC, I dunno.)

Of course, on this model, because identities matter, each bot will need to know who it will be talking to in advance. Messages are targeted, and the listener only listens or processes messages from bots it is configured to watch. This means that we have introduced a very tight coupling between bots, and reduces the overall flexibility of what is possible. Support for additional bots to converse with must be added explicitly.

This seems bad. Let’s see if we can decouple the bots a bit, and what that gets us when we do.

One : many

This bot is gathering a following. Watch out!

Another interesting model is one in which the listeners know who they want to listen to, and the speakers don’t care who is listening. You might recognize this as the model for broadcast television: The speaker in this analogy is a television station, and the listener is a television console. The station does not need to know anything about the television listening to it—the station treats them all the same. (Let’s ignore the fact that television is a unidirectional medium for the moment.)

This frees up a huge coupling issue, as now the speakers no longer need to know who to address—they address everyone, and the onus of understanding is placed squarely on the listeners.

However, there remains a coupling, in that the listeners still need to know which bots to listen to. There remains a problem of discovery (who should I listen to?) and of tight coupling.

Hrm.

Many : many

Confuse the hoo-man!

One model that has been neglected in the context of the web is one in which anyone can talk, and everyone listens. I’ve heard this model characterized as "pub-sub", but really a better analogy is a chat room (see what I did there?). On this model, as with the one-to-many model, the speakers don’t care who is speaking. But, moreover, the listeners don’t care who is speaking.

In most chat systems where conversations are conducted in a strictly one-to-one basis (or in a raw TCP/IP environment), this model is unwieldy at best, as bots have to maintain connections to every other bot, creating connection overload. Because the number of connections grows very quickly, networks of even modest size are going to be practically unstable and will needlessly consume system resources.

Of course, as you can imagine, there is a better way. One of Slack’s distinct advantages in this arena is the very notion of a channel, where—as I hinted before—anyone can speak, and everyone can listen. (You were wondering when I would plug Slack, weren’t you?) If we think of a Slack channel instead as a kind of event bus, this model becomes a whole lot more tenable.

But what does this model give to us? Because it doesn’t matter who is speaking, the message itself is what becomes important. Now, any bot can offer a message for other bots to digest—for example, any bot can report a GitHub event, not just the GitHub bot. This opens the door to bot composability, the ability to chain arbitrary bots together so that the output from one becomes the input from the next. You just throw the bots you want to compose together into a channel, and watch the sparks fly.

Composability also creates a more open bot ecosystem, as any part of a bot composition can bet swapped out for a new one that works better or offers new features. And because we don’t need to know anything about who is speaking, it makes experimentation easier too. For example, you (a mere hoo-man!) can step in and interpose yourself into a bot conversation to influence the flow for testing and experimentation purposes.

A real-world, if farcical, example

Oh, the botmanity.

I built a pair of bots designed to heckle people in #random—Statlerbot and Waldorfbot. Drop them into a channel and they will engage a dialog making fun of what they hear. Go ahead, install them, I’ll wait here.

These bots are not very smart at all, but they exhibit an emergent behavior that is fascinating to watch. In short, their dialogs have all been deconstructed into call-and-response pairs, like this:

Statlerbot.hears("Boooo!") {
reply ("That was the worst thing I’ve ever heard!");
}
Statlerbot.hears("It was terrible!") {
reply ("Horrendous!");
}

&

Waldorfbot.hears("That was the worst thing I've ever heard!") {
reply("It was terrible!");
}
Waldorfbot.hears("Horrendous!") {
reply("Well it wasn't that bad.");
}

And so on.

The two bots are only aware of each others’ existence when they are invited into a channel—and then, only to prompt you to install or invite the other one. So, in practice, they totally don’t even know the other exists. The conversation itself is a simple matter of waiting to see if something comes across the event bus, and then putting a new event on the bus in response. There is no conversational state, no coupling between the two bots at all.

At this point the one real disadvantage of the event bus model should be glaringly clear: Although bots don’t need to know about each other, they do need to know something about what could be said. This model of communication works only if we have public protocols for what messages look like, so that speakers can produce message that will actually be consumed, and listeners can effectively identify messages of interest. But I think that this disadvantage is outweighed by what we get. Observe…

A real-world, if imaginary, example

Imagine this (admittedly very engineering heavy) scenario: We have a software product that is backed by a GitHub repository, and we want to automate deploying that product to production. We put three bots in a channel, one watching a GitHub repository, one from our continuous integration service, and one that represents our production servers.

The GitHub bot observes that one of our programmers have pushed a new commit to master in our GitHub repository, and reports this event into a channel:

{
"event": "git commit",
"site": "GitHub",
"repository": "Statlerbot",
"branch": "master"
}

Our continuous integration (CI) bot is watching for messages encoding git commit events, as it happens. When it hears a message with this event in it, it kicks off a new integration process. As it happens, on this occasion it succeeds, and the CI bot posts the following into the same channel:

{
"event": "ci report",
"repository": "Statlerbot",
"branch" : "master",
"result": "pass"
}

And, as you have guessed, our production bot is watching for ci report events that have passed—it takes the initiative to push this new commit out to our production servers, post a new message into the channel, and we can call it a day.

We can make this scenario as baroque as we like: We can imagine the CI bot blocking pull requests whose builds fail, linters and smoke-testers getting involved…there are so many ways to compose this kind of workflow, and Slack channels are a great way to create those compositions.

Best of all, because this all happens in Slack, you can see the entire history of deployments and passing integration tests and so forth. The entire causal structure of your deployment workflow is open to observation and examination.

This is only the beginning

Now, consider this: Let’s forget structured messages. As bots get better and understanding humans, they also get better at understanding each other—to the point that we can totally set aside the public protocols we decided we needed above.

Once bots can talk to humans, and bots can talk to each other in human languages, we begin to form a very robust network of communications that doesn’t hinge on other bots doing the exact right thing. So long as intents can be clearly identified and extracted, we can compose bots without any care about how they ought to fit together.

This is the exciting world I’m aiming for: A world in which bots and humans interact together to get work done. 👷🏾🤖👩🏿


So, I’m pretty fired up about bot-to-bot communications. Maybe you are too. Let’s make our bot-infused future a great one!