On-Site Friendly

The infinite space between words

The story of 2 tech centers

The first distributed team I had to deal with was in AWS — it was a team split with 2 junior people in Seattle and a team of 4 people in Dublin. The two offices were 8 time zones apart.

The way we faced the challenge was to give components of a system to each side to develop. But because there were more people (and more seniority) on the Dublin side, they were supposed to review the code for Seattle, thus Seattle spent most of the time waiting.

Having 2 teams, 2 tech centers, 2 views — we started spending a lot of time in meetings (in the 2 hours overlap that we were able to stretch out) to concentrate all the discussions and try to reach consensus as a single team.

Effectively, our initial decision led to a B class citizenship for Seattle engineers. This was unfair to them and it ended in them not accepting it anymore. At some point they just started approving each other code requests so they could move faster. Eventually, they simply left the team.

Conway’s law declares that

organizations which design systems… are constrained to produce designs which are copies of the communication structures of these organizations.

Our communication pattern was pretty much nonexistent, thus our organization disappeared.

Doing it right

My second taste of a distributed team was at GitLab.

GitLab is a 100% remote company with 50 people when I joined, and over 400 nowadays.

At the beginning I had a massive cultural shock. In the past, I had days full of meetings, and was expecting more of the same, however I found nothing of the sort: suddenly I had a lot of time to do actual work. Also, I was being forced to think and write things down (in the open) in the issue tracker. Almost any meaningful comment on Slack was followed with “hey, maybe we should move the conversation to an issue”.

I struggled to understand how to work for a while. I found myself with a lot of free time and not a lot of explicit direction. Talking in Slack was not the way to go, and there weren’t any meeting either, how was this people staying on top of things? Then I met my onboarding buddy over a Zoom call, who explained a new way of working, with it, the power of uninterrupted focused work.

The key for me was that the expectation was completely different. Companies that follow the SCRUM way™ inadvertently strive for keeping state consistent across all the actors. All the people must be on the same page all the time — everybody must share the state every morning so we all move as a single block. Any failure in sharing the state, will halt the team blocking it to move on. Doing this often enough, people will start losing patience and working alone, not caring about whatever else happens around them. Can you blame them?

GitLab (and the remote culture) has a radically different approach. There you work trying to solve a somewhat defined business problem, in short iterations, delivering something, and then adjusting.

Theoretically, SCRUM (not going to say agile, as that’s a different discussion) tries to follow this model: short iterations (2 weeks by default) then adjusting. As much as it sounds similar, there is one critical difference: SCRUM ceremonies, which can be seen in their communication models.

The difference is the both philosophical and practical. The office model tries to keep strong consistency. The (GitLab) distributed system strives for weak consistency, freedom, then adjustment, on shorts loops.

SCRUM ceremonies (daily stand ups, sprint plannings, retrospectives) are rigorously held synchronous events. They require everyone to be in the same place (physical or virtual) at the same time to share the state.

This model works great when everyone is at the same place at the same time to share a high bandwidth communication model. But if you look a little closer you will see that besides the retrospective, no meeting leaves a trace. There is no persistence of the event that you can read or follow up on later, only shared state in the daily meeting is the current and planned future state moving forward.

Any distributed system, like a group of people working together, will end up always following a set of rules: the CAP theorem.

The CAP theorem states that it’s impossible for any distributed storage to have more than two out of the following three guarantees:

  • Consistent reads — every read receives the most recent write.
  • Availability — every request receives a non error response (without the guarantee of being the most recent write)
  • Partition Tolerance — The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

We can draw a comparison of a storage system which work is to produce some result with a group of people doing work as a unit, which is also trying to produce some result. This way we can apply the CAP theorem to this human system.

Network partitions are part of life — we can’t not have them. Thus we must design our systems to either be consistent or available.

Classic offices are Consistent and Partition-Tolerant. GitLab is an Available and Partition-Tolerant working implementation.

GitLab product was built as a solution for GitLab’s problem: a 100% remote company with people around the world. Because of this specific problem, they created a set of rules that everybody must follow. You can read them all in the remote manifesto:

  1. Work from anywhere you want.
  2. Communicate asynchronously.
  3. Recognize that the future is unknown.
  4. Have face-to-face meetings online.
  5. Meetings are for bonding, blockers and future.
  6. Bond in real life.
  7. Give credit where it’s due and remember to say thank you.

Some are guarantees, some are rules that you have to follow. I will be focusing on the 3 rules that add the most value.

Communication tools

Quoting the remote manifesto

Don’t try to mimic an office. Communicate using issue mentions and chat tools. Reduce task switching and put an end to email overload. Choose the right channel of communication according to the necessity of the task you’re working on. Can it wait a few minutes, a few hours, even a few days? Don’t take someone from their work if you don’t have to.
If people are working from the same location, it is important that they do not skimp on writing things down.
Everyone should use the same tools to communicate.

As we were discussing before, to be distributed we need to distribute the state. Status meetings tend to concentrate the state in a synchronous event in which you are either present or not. This model of state sharing has a set of strong limitations.

  • The state is not persistent, it’s not recorded anywhere.
  • Only people in roughly the same timezone can be present. People from different time zones do not have access to this state — thus they are not consistent.
  • If we insist on doing it, then the meeting must be operated in a remote-friendly way.
  • With timezone difference, this means either that one of the groups will have the meeting later in the day, or that the remote team will “do it asynchronously” while the office one “does it synchronously” effectively turning remote people into B-class citizens.

There are multiple details that can destroy the everyday experience and get us to the AWS distributed work experience: the isolated group of people who can only read the state but not really contribute to it.

As the remote manifesto claims: to work as a distributed company we need to stop behaving as a colocated office simply because a colocated office is not distributed: it’s a single point of failure.

If we analyze the different communication models, we can identify traits for each of them, which can yield a better view on what’s the model that would work best:

Meetings

Synchronous events.

They require everyone to be there, whoever is not there can’t add value.

They are also volatile: when things are decided in a meeting there is no evidence of this being decided (unless there is an agenda in place, or a meeting is recorded).

Meetings that are not operated in a remote-friendly way create an unbalanced power situation in which remote people are poorly represented.

Meetings are extremely expensive too. There is no such thing as a 1 hour meeting because you will always have at least 2 people in the room, which means 2 working hours. This scales linearly as you add more people to the meeting.

On the bright side, meetings are valuable as a high bandwidth communication tool. Ad-hoc meetings (tapping into someone’s shoulder) can become a real problem and many people “work from home” expressly to prevent this form of interruption.

Google docs

Persistent, asynchronous or synchronous depending on the usage. Private or public, depending on the usage.

Google Docs are awesome for making a meeting persistent by using them as an agenda in which people can take shared notes as the meeting develops.

Docs are also great to review a text asynchronously, as long as people use the suggestions feature.

The point in which Docs start falling short is when people use the comments feature, the interface to handle comments is generally bad and does not scale beyond one or 2 comments. Using comments to “assign” tasks makes sense though.

Don’t expect Docs to be searchable in the long term because generally they tend to fragment a lot.

Slack

Semi-synchronous, fully asynchronous when there is the right level of expectations set.

Some people expect to get an immediate reply in Slack, to the point of getting upset if you don’t do it. This is similar to the “tapping into someone’s shoulder” interruption style or calling on a phone. Effectively asking people to stop doing what they are doing to context switch.

Personally, I use Slack as an asynchronous queue.

Persistent but extremely chaotic. When people speak in Slack, at any point in time they may open a thread. This can happen in any order, which makes it impossible to build a well-reasoned and consistent state out of it.

Most of the conversations in Slack happen in direct messages, meaning that most of it is private communication. Cabify Slack communication distributes in 90% private messages, 5% private channels, and 5% public channels.

On the bright side, it can be used to have quick conversations or to share snippets.

Slack is a valuable solution, I would try to control how many back and forth interactions you get in a conversation to consider a wider bandwidth solution (a call) and I wouldn’t count on Slack as a long-term state storage solution.

Email

Asynchronous, persistent, and ordered, as long as there are no threaded conversations taking place. When this happens it’s impossible to keep track of things.

The biggest weakness of email is that it is private. This blocks it from being a valid solution for distributed teams because new people joining a team will not be able to reach the previous state.

I use email as a private to-do list, to keep track of what I have to do as unread emails, aggressively archiving emails that have already been handled.

Issue trackers

Good issue trackers behave like the best asynchronous tool there is to have conversations with people who live far away, from the good old days of the internet: forums.

These tools are asynchronous, ordered, persistent and public. Some tools also include other tooling such as labels to categorize issues better. A well managed issue tracker is an extremely powerful tool that not only allows to track down work, but also enable asynchronous conversations effectively.

These conversations can have different rhythms and solve different purposes. It is common for my organization to have 2 rhythms:

  1. The state of an issue, which is updated daily with what happened, which blockers are found and which things are going well.
  2. The planning of an issue or business value. This has a slower pace and is used to discuss the things that will be done in the future: it can be short-term, like your next task, or can be long-term, like strategic thinking.

This is probably the key point to consider when talking about asynchronous work. I will follow up on this later.

Roadmaps

Roadmaps must be persistent, public and asynchronous. They should have a good reason to change and should only be changed when people are fully aware.

Personally, I like to keep roadmaps in our handbook (under version control), so in order to make a change, someone has to send a merge request and explain what is going on. This allows people in the future to be able go back in time and read why a specific change was made, and by whom.

Roadmaps should have an even slower pace of change than an issue tracker, and they should have a linear history. If you are using an issue tracker to track your roadmap, consider dropping it in favor of plain text on top of a strict version control tool.

Our model

At Cabify we are little by little embracing asynchronous, distributed work.

Some teams are more advanced, some teams have more work to do to get there.

The way we are doing it follows a simple set of rules that start at the organization level and end at the individual contributor.

Mission and Vision

Every team or squad has a clear roadmap that defines the business value and the expectation of when a business value should be delivered. The roadmap is easy to find, is public, and is tracked to prevent rogue changes.

Use the right tool

Every person joining a team from anywhere in the world must be able to catch up without the need of a synchronous event (meeting), to understand the state of the roadmap at the task level, and to understand why a decision was made.

It’s fine to have a conversation over Slack, in person or over a Zoom call. But what is not written down in a persistent, public place, didn’t happen.

Because of their volatile and private nature, we discourage the use of meetings to share the state of a project unless it’s absolutely necessary. We do also discourage the use of Slack to do “stand ups” because of the chaotic nature of the tool.

State should be kept in issues. Anyone subscribed to an issue can easily follow its progress. Some people use GitLab RSS to “follow users” or projects so they can see how they are doing in a single place without synchronous events. Some people use email for the same reason.

Organically, we have seen that different teams created lobby repositories which are only used as an issue tracker. This has turned out to be really interesting because anytime you need to talk to a team, you can just use the team lobby. And finding the right team lobby is as easy as visiting the /product/<team> namespace, or just using the search box.

Understand the rhythm of a task

As an individual contributor, the biggest problem you can have is finding yourself waiting for someone else to do something. The way to prevent yourself from being blocked is to keep multiple conversations going at the same time.

We encourage people to work on a task and track its state in an issue.

While you work on today’s task, use strategic thinking to prepare your next work by removing dependencies or making them explicit. The way to do this is to be aware of the roadmap and have a conversation with all the stakeholders asynchronously in a public and persistent place to reach a state in which you have enough data to tackle autonomously.

The critical part of this way of working is applying the theory of constraints to your project. This is probably a topic by itself, but generally speaking, your project work should be structured in a chain of events which identifies dependencies upfront. Once the structure is in place, you then apply buffers in such a way that when your task can be started you don’t depend on anything else.

The anti-pattern is to wait to make decisions until the last possible moment (student syndrome). Delaying these decisions to synchronous meetings which are all or nothing, in which people who weren’t able to attend didn’t have a voice, and whoever is working in a different timezone can’t move forward.

Use the amount of meetings required to move forward as a metric for how performant and organized a team is. If you need to have a lot of meetings, then you may need to ask for help to start planning ahead, avoiding being reactive. This kind of behavior is particularly visible in silo-oriented cultures, where handoffs between silos happen.

Keep your work chain-dependant free by looking ahead of time, you will be able to move at a consistent pace, without any wait time.

TL;DR

  • Use the right tool for communication. Prefer public, organized and asynchronous models. Keep the status and future conversations in a public and persistent space.
  • Think tactically and strategically at the same time, work on your future issues to remove dependencies, work on your current ones with full end-to-end ownership.
  • Build up buffers to prevent being blocked.
  • Measure the health of your team or organization through the number of meetings: the more meetings and synchronous communications required, the worse the health is and the more reactive you are.