Let the Robots Decide: AI-Powered Decision-Making for More Equitable Digital Mental Health Services

Published in

Patrick J. McGovern Foundation

9 min readMar 29, 2024

In partnership with the Patrick J. McGovern Foundation’s (PJMF) Data to Safeguard Human Rights Accelerator Program, SameSame is leveraging Reinforcement Learning (RL) to implement a digital mental health intervention that automatically adapts to local context, rapidly expanding the ability to respond to the challenges faced by millions of LGBTQI+ youth across the Global South.

by Jonathan McKay, Dena Batrice, Codie Roelf, and Copland Shepherd

Amara’s Law states that people tend to overestimate the short-term impact of new technologies while underestimating their long-term effects. For the team at SameSame, whether, when, and how to leverage artificial intelligence has always involved vigorous debates about the potential benefits and risks of using new and often untested technologies to support vulnerable LGBTQI+ youth. However, thanks to our partnership with the Patrick J. McGovern Foundation, we’ve moved beyond purely theoretical discussions. In the last year, we’ve built, deployed, and tested a system that allows us to run multiple versions of our WhatsApp chatbot simultaneously, automatically analyzing the performance of each version and funneling our users to the version that is performing best. Much to the chagrin of our Product Lead, the ‘robots’ have been making (some) decisions about our product. As a result, we now have more than eight different experiments running across two countries, and the percentage of users that complete the chatbot onboarding has increased from 46% to 96%. How did we get here exactly?

Those Most in Need Face the Greatest Barriers

For Lesbian, Gay, Bisexual, Transgender, Questioning, and Intersex (LGBTQI+) youth living in many parts of the world, accessing mental health support can be an uphill battle fraught with stigma, discrimination, and even the threat of violence. In the more than 70 countries and territories where consensual same-sex activities and forms of gender expression are still criminalized, discriminatory laws create additional barriers to accessing face-to-face services, exacerbating the challenges already posed by the persistent stigma surrounding mental health. The need for support is critical. LGBTQI+ individuals are disproportionately affected by mental health issues due to societal pressures and discrimination — in South Africa, for example, LGBTQI+ youth suffer from depression at four times the rate of their straight and cisgender peers.

The Potential of Digital Mental Health Services

Digital mental health services hold immense promise for LGBTQI+ youth, offering a lifeline to those who may otherwise struggle to access support. These platforms provide a safe space for individuals to learn coping skills, seek guidance, and access resources without fear of judgment or discrimination. Moreover, they offer anonymity, which is crucial for individuals reluctant to disclose their identities or seek help in traditional settings. There are risks as well as potential rewards, however. Privacy breaches and unauthorized access to sensitive information pose a significant risk in countries where LGBTQI+ identities are criminalized, with severe consequences for individuals’ safety and well-being.

This is where organizations like SameSame step in. We are a group of engineers, clinicians, and product designers with lived experience of struggling with our identity and overcoming mental health challenges in hostile communities. We’ve come together to leverage digital technologies to safely and responsibly address the mental health needs of LGBTQI+ youth in challenging environments — to build what we wish we had when we were growing up. We believe that by harnessing the power of technology and data, we can reach different and larger audiences than many of our on-the-ground partners and tailor mental health services to meet the specific needs of LGBTQI+ individuals.

In early 2023, the SameSame team built and launched our first product in South Africa — a WhatsApp chatbot that provided automated mental health support and links to vetted services. Initial results were promising. More than 2,000 chatbot users were engaging with a broad range of the content sets we created, including our adaptation of an evidence-based Cognitive Behavioural Therapy (CBT) course called AFFIRM, which is explicitly designed for LGBTQI+ youth. There were also challenges. We were struggling to analyze and make sense of the large volumes of data we were collecting and, as a result, struggling to determine what changes to make to our chatbot.

Founded in late 2021, SameSame is still a young organization with only four full-time staff. From the start, our approach has been to find and work with like-minded partners with deep technical expertise and complementary skills who can help us go further and faster than we could on our own. Enter the Patrick J. McGovern Foundation (PJMF). Through our participation in the Patrick J. McGovern Foundation’s Data to Safeguard Human Rights Accelerator programme, we haven’t just upgraded our data infrastructure to make data more readily available for interrogation; we’ve also found a way to respond more quickly and nimbly to users’ needs and preferences through the introduction of a Reinforcement Learning (RL) system that automatically selects ‘winners’ from the multiple, simultaneous experiments we’re conducting. As a result, we are better poised to take our chatbot to new countries and to respond to different constraints and user preferences in those countries, even with our small team.

Our Hypothesis

Our hypothesis in the PJMF Accelerator Programme, informed by a conversation with Rob On and The Agency Fund, had two parts. First, we believed we could use ‘multi-arm bandit’ experiments (essentially A/B testing on steroids) to learn what combination of content, features, and user experience is most likely to engage our users and help them complete individual modules of our CBT course. The second and more unconventional part of our hypothesis was that we couldn’t assume that the findings of these experiments would always hold true and would need continuous retesting. To flesh this out a little, we assume that just because most users seem to prefer one experience over another — voice notes over videos, for example — we can’t know for sure if that preference will be constant over time or consistent across different countries. For example, changing data prices that reduce downloading costs may influence users’ preferences for voice notes over video. The world we live in, online and offline, is constantly changing, and our products need to change with it to remain relevant and valuable to our users. Artificial intelligence, particularly Reinforcement Learning (RL), seemed to offer SameSame an effective way of letting us do just that.

From Idea to Action

Our previous blog details some of the technical aspects related to introducing RL to our system. The technical aspects, however, were relatively simple compared to some of the other types of work we needed to undertake to get the system up and running.

First, we needed to understand what experiments we wanted to run. This included determining what aspects of the user experience we believed were most likely to influence outcomes and were also likely to change from context to context. Looking at our data, we felt confident we could make some improvements to the user experience of our chatbot without significant testing and experimentation — simple fixes like restructuring our menu so it’s easier to navigate. We believe that a considerable number of UX improvements and changes are likely to be positive for our users across time and place. So, instead, we attempted to identify aspects and elements of our bot’s content and user experience that we believed could make a significant difference to a user’s engagement but which may also be susceptible to change across time and place. Preferences regarding different media types seemed like one of the most prominent examples, as they are highly subject to changes in the affordability and availability of data services.

Other factors took more time to settle on. For example, one of the tests we decided would be worthwhile to run continuously across contexts was a test that would determine whether or not users were more responsive to content with an informal tone as if written by a peer, or content that used a more authoritative tone and drew on statistics and facts to instill confidence in our users. Our Executive Director, Jonathan McKay, had previously spent time interviewing adolescent girls in countries across the world about the sources of trusted information. These interviews convinced him that any message's style and tone strongly affect the likelihood of the message being read, understood, and internalized. The more general lesson from these discussions is that new technologies offer organizations like ours new capabilities and tools. However, using these tools, even if self-evident on the surface, still takes time to parse out.

Our Results

Throughout the Accelerator programme, we launched more than eight multi-arm experiments supported by our RL system. These experiments have tested the influence of multiple factors on completion rates for our message ‘flows’ — including message length, message tone, requests for personal information, the structure and timing of the consent process, user preferences related to media formats, linear versus user-driven pathways, and the structure and naming of different modules of the CBT course. Thanks to the support of PJMF, we were also able to launch new campaigns that helped us acquire the users we needed to test if the system was working. Over the course of the programme, over 80,000 young people in South Africa and Zimbabwe began conversations with our chatbot and interacted with one or more of the experiments mentioned above. Through these experiments, we have improved the number of users starting our CBT course threefold and more than doubled the number of users completing one of our CBT modules. Of the users who complete the clinically validated mental health self-assessments before and after engaging with our CBT course, 63% of them are showing improvements in their mental health. Our bot is supporting more LGBTQI+ youth and engaging them more meaningfully than ever before.

The System Has Yet To Be Adequately Tested

Despite encouraging results, the team at SameSame is still grappling with the newness and complexity of an algorithm that is making decisions for us. Each member of the SameSame team, particularly our Product Manager, is used to running time-bound experiments and reaching definitive clarity on the best option based on the data at a particular time. Now, we have multiple experiments running — none of which are ever supposed to end or come to a conclusion. This may seem trivial, but it has required a shift in the mindset with which we approach product design, which sometimes feels uncomfortable.

There are also technical consequences. Because none of the experiments ever really ends, every new experiment adds multiple new conversational flows to our back end, dramatically increasing the complexity of our back-end system, which now supports an ever-expanding set of variations of our chatbot experience.

As we started to look at the decisions the RL system was making, we also became concerned that the system was choosing the winning arm or arms without sufficiently exploring all the options. Further reading, research, and discussion around this issue revealed that our algorithm may occasionally prematurely select winning arms, especially if the initial data samples do not represent each arm’s underlying probabilities. Algorithms like Thompson Sampling, which we use for our RL system, aim to overcome this risk by sampling arms according to their likelihood of being the best. However, this isn’t necessarily foolproof. The solution we’ve landed on is to restart the algorithm periodically, rerunning the experiment from the start, to prevent premature decision-making on the part of the algorithm. The lesson, for us, reassuringly in some respects, is that the system we’ve built augments our decision-making rather than replacing it — we still need human beings. This is particularly true when trying to interpret the results, a part of our work that is crucial if we’re going to be able to design even more effective digital mental health services in the future. To learn more, read our insights report.

What’s Next

The actual test of the value of the system we’ve built will be when we launch our WhatsApp chatbot in Nigeria. By launching our chatbot with multi-arm experiments running right from the start, we should be able to learn what LGBTQI+ youth in Nigeria want and need from our chatbot much more quickly, providing us with data that we can then dig into further to plot out the evolution of our work in Nigeria and beyond, until all LGBTQI+ youth understand that while we might be a little different from our peers, ultimately, we’re all the SameSame.