Turn app interactions into team knowledge in Slack
An in-depth look at how Geekbot provides valuable team insights
The era of the digital workspace
Open offices, coffee nooks, meeting rooms and more. The traditional workplace as we know it is undergoing a radical transformation. Our work is now mostly digital — in a home office, on chat platforms and video calls, with documents and code stored in the cloud.
This workplace digitization highlights how communication platforms such as Slack could be the testing ground for developing the next generation of team collaboration tools. Project progress, team metrics, collaboration insights — also known as team knowledge — can be an algorithm.
An advanced knowledge base combined with a smart digital assistant unlocks perks like empowered remote work, improved management, data-driven decisions and more.
Geekbot is a digital assistant that lives where teams do their best work, such as Slack. Initially designed for asynchronous standup meetings, it quickly became apparent that it is an opportunity to turn standup report data into valuable knowledge — from projects and tasks to overall team health.
Our goal is to provide a digital assistant that decentralizes teams by unlocking team knowledge. In this post, we will present the methods, algorithms and tools that we employ to build such a system.
What is our data?
Although it varies between companies, Geekbot’s primary data source is a digitized version of the daily standup. Most of the conversations you can have with Geekbot fall between two categories of data — questions asked by the app and answers provided by the user.
Our main challenge is extracting knowledge related to a team from bot-user interaction. Real natural language text is messy and comes in varied formats. A typical answer may include team-specific terms, slang, grammatical errors, and multiple sentences that pertain to different topics or internal context. Last but not least, it may contain sensitive information that should be handled with care.
Questions, on the other hand, are easier to work with. They are typically more structured than answers because they follow a pattern dictated by the Geekbot function you’re using. For example, most daily standups include a:
- Will-do question, i.e. What will you do today?
- Done question, i.e. What did you do yesterday?
- Blocked-by question, i.e. Is there something impeding your progress?
- Sentiment question, i.e. How do you feel today?
Questions are generally more concise, better written and shorter than the average answer. Lastly, they provide critical context to the answers themselves.
Our goal is to provide a digital assistant that decentralizes teams by unlocking team knowledge, which also requires action and initiative to enhance the team’s performance and flow.
For instance, you can currently use Geekbot to request status on a project over a certain timeframe, view individual or entire team activity, understand main influencers and overall team health by simply asking a question in your own words.
Geekbot can even be proactive and, for example, send you personalized notifications or suggestions about other teammates relevant to an issue you mention as an obstacle to your work.
We consider the team to be a living entity whose functions and activities are defined by the individual tasks of its members, but not in an additive fashion. That is, the team as a whole is different from just the summation of its members. For instance, if a teammate has listed a debugging code task, it may be part of an inter-team project or activity containing other members’ tasks unrelated to debugging.
Additionally, a team’s work features several different entities (members, projects, tickets, meetings), and various relationships between them (membership, dependence, similarity), immediately hinting at the many-to-many data model.
Knowledge graphs are ideal for representing many-to-many relationships and modeling both the whole and the parts of a complex system. They fit the requirement of a framework that’s easily and vastly expandable, suitable for an exploratory research project.
Knowledge as a graph
Our knowledge graph aims to depict the various entities in a workplace and the relationships between them. All teams that use Geekbot have their own unique subgraph in the Geekbot knowledge graph. Every completed report or survey triggers a series of actions in the background to generate the suitable vertices and edges.
Members continuously enrich the knowledge graph by answering Geekbot’s questions. Their answers get broken into sentences, which are the building block of the knowledge graph, and get connected with member nodes according to the type of question they answer.
Entities with attributes are constantly being created and connected to other entities based on relationships. Besides members and sentences, we also have tasks, topics, meetings, and skills. We create the appropriate relationships according to the entity type.
For instance, once a task is created, we can connect it with its owner via the owner-of relationship, connect the sentence(s) that define it via the describes relationship, and include any skills it involves via the requires relationship.
Now, let’s see the methods we use from graph theory and natural language processing to build both the entities and relationships.
Classifying text into work-related matters
Formally, text classification is the task of assigning a label to a piece of text. We use it extensively to classify the content of your conversations with Geekbot in order to extract work-related entities and their metadata.
Fortunately, the machine learning community has an abundance of information on the topic, and without getting too technical, we’ll share our most significant learnings:
- Carefully designing and iterating on the labeling process is well-invested time.
- Transformers model fine-tuning outperforms any approach most of the time, but it’s not always worth the required resources.
- Multi-language NLP is not here yet. Infrastructure and performance issues must be addressed, and Sebastian Ruder elaborates this point very well here.
- Even when the classification task is pretty standard, such as sentiment analysis, visualizing the results in a meaningful way is a challenge.
Different words, same meaning
The purpose of the similarity relationship we’re using in our schema is to describe how close two pieces of text are in the semantic sense.
For instance, we want the sentence “Deployed X feature on staging” to be close to the sentence “Tested feature X on production,” but nowhere near “Staged a funny prank on @Robert yesterday!”
The problem of calculating a similarity between two pieces of text is well studied and improved during the last years in NLP research. One major concept behind the recent advancements is that of word embeddings.
The idea behind embeddings is to represent words (and, by extension, sentences) as vectors. But how do we go from word embeddings to sentence embeddings? There are many methods, but we’ve discovered that a combination of mean pooling and learnable sentence embeddings works best in our case.
The last piece to be added is measuring the closeness of sentence embeddings. Following standard practice, we use cosine similarity: a metric representing the angle difference between the vectors.
We’ve always wondered how such a crude process can still retain information about the meaning carried by individual words. Turns out, the answer lies in a mathematical concept called concentration of measure. Simply put, this tells us that given a high enough dimensionality space, we can approximately retrieve the individual word vectors, even after we applied some function over them. The only practical limitation is that the number of word vectors added is not too large (as this would effectively make all sentence embeddings the same).
If you’ve followed everything we’ve covered so far, you’ll know we’re in luck as our problem domain mostly comprises short sentences.
Clustering or how to discover your tasks
To understand the inner workings of your team, and to build tools that help you manage it, we first need to know where you consume time and energy at work. In the context of Geekbot, this activity is embedded in conversations, which we depict as sentence entities in the knowledge graph.
These sentences, when grouped together based on their contextual meaning, make up a task. The grouping can be done using a class of algorithms from complex networks. Our type of clustering not only groups semantically similar sentences in the same task; it also has the beautiful property in which sentences which are not strongly similar may appear in the same task due to their connections.
Over the course of a month, you’ll likely start, stop, restart and finish multiple tasks. As a result, getting the first and last date of when you mentioned something could be inaccurate. The semantic clusters constructed from clustering contain semantic information, but we’d also like to have temporal information. To do this, we use an algorithm from graph theory called graph matching.
This process works as follows:
- Given a set of semantically similar sentences in a cluster, we initially divide the nodes into two groups consisting of sentences including responses to will-do questions and done questions.
- We then connect the nodes in these groups with edges that have the time difference of the sentences as weights and apply the graph matching algorithm to match pairs of sentences such that the total time difference is minimized.
The end result is a partially connected graph, with connected components of the graph only showing the periods of you actively working on a task. All this is shown in a Gantt chart depicting both the descriptions of the tasks and their evolution through time.
As we mentioned earlier, our goal is to provide the tools that give access to team knowledge — which brings us to our next point: if you’re already using Geekbot to sync with your team through daily standups, why not use it to get answers on work-related questions as well?
This happens via two NLP tasks, which we use to transform your questions into knowledge graph queries — intent classification, the automated association of text to specific goals; and named entity recognition, classifying entities mentioned in the text into predefined categories or tools.
Using this, we built a feature you can use to ask Geekbot a question directly in Slack about your team.
We hope you now have a good idea of how we are building team knowledge and making it available to you. In this section, we will get a bit more technical in order to describe the necessary steps and tools used for realizing this system in production.
One deciding factor of our design is the requirement to keep the core service of Geekbot separated from the knowledge graph building process. The motivation behind this is to avoid having the two compete for resources or slowing one another down.
The separation takes place in the abstraction level of the report. Completed or edited reports enter the knowledge graph building funnel in real-time and are broken down into sentences, which are then passed along the necessary paths with the help of message queues.
We use service replication to horizontally scale the various time-consuming inference procedures the knowledge graph requires.
The architecture shown above is broken down into three core components, each of which interacts with the knowledge graph.
A team agnostic component continuously receives data from reports and creates sentence nodes, similarity edges, and necessary attributes. This component includes sentence creator, sentiment, blockers, and sentence similarities.
The main inference component consists of tasks and topics, meetings, and skills; and differs from the first because it uses past knowledge from the team to build the corresponding entity nodes and relationships. This component also has a recurring function, rather than real-time.
Lastly, the usage component serves the knowledge to users through dedicated APIs in the form of visualizations, features, and bot conversations. This includes activity viewer, topics, Q&A, and insights.
Cloud, database & tools
Our microservices-based architecture is deployed on the Google Cloud Platform. We are using Pub/Sub message queues for intercommunication and a series of APIs to connect the knowledge graph with the core Geekbot service.
The most natural choice of database abstraction to represent a knowledge graph is a graph database. Our database of choice here is Neo4j as it is one of the most popular native graph database implementations.
The graph algorithms that Neo4j offers through its graph query language and the Elasticsearch interface have been indispensable features. On the flipside, graph databases are definitely not as mature as relational databases. And this took a toll on us when searching for documentation and reliable database adapters.
Python is our go-to programming language for prototyping and data-oriented procedures. We’ve benefited a lot from the surrounding ecosystem of libraries around NLP and ML.
Let’s briefly cover some of the libraries and tools we use:
- Graphs: NetworkX
- NLP: Spacy, NLTK, huggingface’s transformers, fastText, Duckling, Rasa
- ML: PyTorch, Keras, scikit-learn, imbalanced-learn, TensorFlow
- Web interfacing: Flask
Give it a try
Our goal is to provide a digital assistant that decentralizes teams by unlocking team knowledge, all by processing and understanding your daily interactions with Geekbot.
While it is uncertain to what extent the system we created will contribute to the digital workplace transformation, we hope it will serve as an inspiration for others to create such tools.