Harnessing Generative AI for collective discovery: lessons from two years of deployment at scale.
Euro Beinat
A tool for collective discovery
Given the size and scale of Prosus’ global operations, serving one quarter of the global population with our consumer internet services in over 100 countries, and processing a vast number of transactions daily, we have been extensively using AI and Machine Learning for several years. AI/ML is fully integrated into our services, with hundreds of models in production. Like many technology companies, we rely on automation and predictions to operate at this scale.
In the summer of 2022, we embarked on developing a personal AI assistant for our colleagues across the Prosus group. Our goal was to provide everyone with the opportunity to test Generative AI firsthand and to explore its potential impact on their work and business. As of this writing, the assistant is utilized by approximately 13,000 colleagues across 24 companies. Here are some insights we gained from deploying the tool at scale.
Our journey in GenAI
In 2019, we began developing solutions based on Large Language Models (LLMs) such as BERT and GPT-2. Although these models were not yet ready for widespread use, they represented a significant advancement for processing language and unstructured data. We initiated a large-scale program of field tests in collaboration with companies in our group to identify viable applications for LLMs and the conditions under which they could be effective.
Between 2020 and 2021, we conducted over 20 practical field experiments, testing applications ranging from creating educational materials and Q&A, to document synthesis, code automation, documentation and bug fixing, and more. Many companies within the group found similar use cases, such as analyzing help desk tickets, but also applied the tools in unexpected ways to increase work efficiency and gain independence. Most use cases emerged from bottom-up discovery, often in collaborative project channels. This led us to facilitate efficient collective discovery by launching an AI Assistant — Toqan.ai — initially to our engineering teams and then to everyone.
Toqan is a general purpose chatbot, designed with the needs of product and technology teams in mind. Initially accessible through Slack, it integrates several Large Language Models — commercial, OpenSource and fine-tuned in house — but also image interpretation and generation, voice encoding and generation, large document processing, data analysis and code creation, for a total of over 20 models and tools. It also accesses the internal knowledge bases of the companies to provide grounded responses. We implemented several guardrails, including privacy and security measures like no-learning and no-retention policies, to protect data from being used to train future models. Additionally, we launched an extensive education and training program, partly delivered through the tool itself.
Use cases for the AI Assistant
Based on a comprehensive analysis of interactions and feedback from user interviews, the assistant is utilized for engineering-related tasks in approximately 50% of instances, while serving a diverse array of purposes in the remaining cases. Examples include:
- Correcting and explaining a code snippet error in the team’s style and documenting it accordingly (source: an engineer)
- Summarizing all experiments performed on the store wall in the past six months in Confluence (source: a product manager)
- Improving and rewriting feedback on a colleague’s performance, strengths, and weaknesses (source: a team manager)
“Software engineering” and “writing and communication” are the most frequent uses. Surprisingly, engineering tasks are also common among non-engineers, from HR to finance and customer support, focusing on simpler, exploratory tasks or personalization of tools and data analysis. A notable group of users seeks direct database access, formulating queries in English to bypass traditional dashboard interfaces. This trend is dubbed the “movement of liberation from the dashboards.”
“Writing and communication” is prevalent among non-engineers, with a constant demand for enhancing clarity and nuance in communication, from report writing to polite inquiry. This underscores the value of a personal, private tool as a safe space for asking even basic questions.
Feedback mechanism
Early on, we introduced a feedback mechanism, including options for positive (thumbs up, heart) and negative (thumbs down, Pinocchio for unreliable or fabricated answers) feedback. In fall 2022, “Pinocchio” feedback accounted for almost 10%, indicating a need for careful oversight. However, this rate dropped to below 3% by June 2023 and stabilized around 1.5%, thanks to improvements in the underlying models, enhanced prompting techniques, and better user proficiency in crafting prompts. While eliminating bad responses entirely is impossible, they can be effectively managed.
Impact on work
User feedback highlights three main benefits: increased speed in task execution, especially in engineering; the ability to undertake more tasks, such as design work and data analysis; and greater independence, reducing reliance on colleagues.
The tool’s impact on productivity is significant, with over 81% of users reporting productivity increases of more than 5%-10%. A/B testing for certain tasks shows time reductions of 50% or more, aligning with industry results. In about 60% of the cases users turn to the assistant as a first help resource, to get unstuck, to get going.
When we introduced the tool, we had a simplistic view, one where a well-defined portion of work can be automated. What we found instead is a wide array of micro-productivity bursts distributed across all workflows. Increasingly, they cluster around themes and “jobs to be done”, for instance, data access without intermediation, or market research. Insights from these experiences are guiding the development of vertical applications and specialized AI assistants.
Use case discovery
All teams use the tool to discover and test use cases for their organization. First they stress test the use cases with the AI Assistant until convinced that they work, and then graduate them into the regular engineering practices and into production. Examples of products of this pattern are:
- Genie, a learning assistant (K12) developed by brainly.com
- compr.ai: conversational grocery ordering application at ifood.com.br
- Kodie, award-winning coding mentor developed by SoloLearn.com
- simulation RolePlay (part of learning sales skills), developed by goodhabitz.com
We have seen this pattern a dozen times and it has become ingrained in the operations of the companies using the AI Assistant.
Agents: the next act
The assistant is evolving from a Q&A platform to a tool capable of performing complex tasks, such as web browsing, code creation and execution, and API connections. After nearly two years of development, agent-based functionality is robust enough for practical business use, and we are gradually introducing a version of the assistant that intelligently selects agents based on the task. This shift towards vertical, agent-based tools represents a significant opportunity for value creation and differentiation for the Group, marking the near future of the AI Assistant.