Inside Welli — Noom’s New AI Powered Health Assistant

Yifan Xia
Engineering@Noom
Published in
9 min readJun 27, 2024

Authors: Yifan Xia (Data Science @ Noom); Calvin Hopkins (Engineering @ Noom)

Part 1 of the AI at Noom Series

On June 27, 2024, Noom publicly announced Welli, our new Health Assistant. Welli is Noom’s first productionized Large Language Model (LLM) powered feature. This is our first post in a new AI blog post series at Noom. Today, we’ll discuss the process we took for leveraging LLMs in production, how we built some specific components of Welli, and some notable things we learned along the way.

What do you do when the technology landscape changes overnight?

Photo Credit: Fig 1. Image generated by DALL·E 3

Prior to 2023, the ML Platform Team at Noom was focused on more “traditional” machine learning problems: predicting user actions and outcomes like engagement + customer lifetime value (LTV), classifying and understanding users, etc (this will be a topic for a future post). But with the advent of GPT4 in early 2023, we had to immediately ask ourselves: “What does this powerful new technology mean for Machine Learning at Noom?”

While there are far too many answers to that question then there is room for in this post, we’ll mention a few relevant ones below, with their impact on our team roadmap, and how each influenced our path towards building Welli:

  • Team Prediction #1: “It is unclear who will be the winner in the GenAI race; we don’t want to back the wrong horse. GenAI at the beginning of 2023 will be nothing compared to GenAI at the beginning of 2024. We need to be able to adapt and fast.”

Almost 12 months later, this prediction still holds true. While OpenAI may have been the ones to kick off the race, over the past three months, we’ve seen the leader on Hugging Face’s Chatbot Arena Leaderboard swap multiple times with the releases of Gemini-Advanced from Google, Claude 3.5 from Anthropic, and GPT-4o from OpenAI.

  • Team Prediction #2: “We need to think about GenAI use cases beyond chatbots, even if that is our first use case for Noom.”

Chatbots are often the first type of application that comes to mind when thinking about how to apply LLMs, but GenAI can also be used for other tasks, including text summarization, image interpretation, content rewriting, and classification.

  • Team Prediction #3: “LLMs are just one part of the overall tooling needed in order to support these use cases in production. These tools will likely evolve just as fast as the LLMs themselves.”

Some of these tooling and overall platform components have come a long way over the past year (notably GPT Assistants and Vertex AI Agent Builder), but others we have had to build and maintain ourselves.

  • Team Prediction #4: “We really know next to nothing about how to apply LLMs to production use cases yet.”

While accurate at the time, this is thankfully no longer the case.

In order to effectively support LLMs, we embarked on extending our Machine Learning Platform. We had several goals, including quickly learning the essential capabilities crucial for leveraging LLMs in production, facilitating the easy adoption of these cutting-edge technologies by the broader engineering team, maintaining agility amidst the constantly evolving LLM landscape, evangelizing these technologies with the entire Noom team, and expediting exploration of additional use cases.

But most importantly, our goal was to improve Noom’s user experience. Welli was just our first step on this journey.

Welcome to the World, Welli!

The idea of Welli first materialized after an initial exploration into LLMs, some quick data analysis of existing product features, and some initial brainstorming. We asked ourselves if we could build something that would enable our coaching team to support our users better and help them get faster responses to their questions. Less than ten days later, the first version of Welli came into existence as a prototype in a Hex app.

Fig 2. Hallucinations on Noom feature from initial prompt vs. correct instruction from current model

At that time, Welli was a single basic prompt built on top of GPT-4–0314 and wrapped with Langchain. It became apparent very quickly that this was not something we could put in front of users. It was trivially easy to get Welli to hallucinate; Welli had no context about any of the recent features in our app (GPT-4–0314 training data cutoff was Sept 2021), Welli’s tone did not match the persona that we wanted our users to associate with Noom, Welli had no way to leverage information about users to personalize and customize its responses, and many more drawbacks. It also was apparent that with further experimentation and iteration to resolve some of these issues and improve the response quality, Welli could be a reality for most of our users.

Today, Welli is built with multiple prompts on top of the latest OpenAI and Google VertexAI models, powering different components of its logic. Some portions of the model are configured in the Retrieval Augmented Generation (RAG) paradigm, where relevant data is retrieved from a FAISS Vector Store curated by experts from Noom’s customer support, coaching, program management, and content teams (we call it Noom Knowledge Base). We’ve run multiple red team exercises on our models and have built out a Human-In-The-Loop adjacent (closer to Human-Out-The-Loop) system for model output validation that also enables fast human intervention when required. Our ML Serving infrastructure supports retrieving and injecting user data (device type, user preferences, etc) to personalize responses to the individual interacting with Welli.

Fig 3. Noom’s overall LLM Serving Infrastructure

Next, we’ll walk through our process for prompt development and how we’re leveraging techniques like Few-shot prompting, CoT reasoning, Self-consistency, tone setting, and more to build out Welli’s prompt.

Decoding Welli’s Thought Process

In order to decode Welli’s thought process, we need to understand a few critical elements: first, we have Prompt Instruction working with GPT4, which forms Welli’s foundation. We then implemented Prompt Augmentation with Noom Knowledge Base (RAG), Dynamic Prompt based on user data, and JSON Format Response. These elements enable Welli to accurately process user messages, understand their needs, and deliver customized responses. We will discuss each of these elements in greater detail inthe following paragraphs.

Fig 4. Critical Elements in Welli’s Thought Process

During the initial phase of constructing our prompt, each team member started by individually exploring different approaches. Then, we shared our findings and merged them into one solid prompt, which empowered Welli as an intelligent AI chatbot capable of supporting users on their journey to better health. Below is an outline of the different components in our Prompt Instruction:

Fig 5. Different Components in Prompt Instruction

After successfully creating a robust prompt framework, our team shifted focus towards further enhancing Welli’s capabilities by incorporating three other crucial elements:

  1. Training Welli through Noom Knowledge: A crucial aspect of our prompt involved implementing the RAG framework, which integrated Noom Knowledge Base data into the CoT process. We used embedding search techniques to feed Welli the most relevant information based on the user’s inquiry. This improvement has significantly boosted Welli’s factual accuracy (in other words, reduced hallucination).
  2. Parameterizing the system with User Data: We tailored our Prompt and Noom Knowledge dynamically based on user data through various parameters, unlocking a personalized approach that best suits the user’s unique situation. For instance, when Welli interacts with a user undergoing GLP-1 treatment, Welli will specifically know more about what to expect from weight loss medications and how to navigate symptoms, thus offering more personal care to our users.
  3. Structuring unstructured generative output with JSON: We instructed Welli to save its intermediary analysis from the CoT process and output responses in JSON format. This structured method guaranteed a seamless integration between systems and provided us with comprehensive data points to gain insights into the Welli decision-making process.

AI & Humans: Supporting Noomers Together

Now that Welli is finely tuned to engage with users, what comes next? We recognized the importance of human interaction, particularly when a personal touch and feelings of accountability to another human can make all the difference. To achieve the goal of connecting humans with AI, we implemented a series of classification models with LLMs to detect if/what type of human touch is required under certain circumstances. This is also incorporated into the actual response prompt (part of the CoT strategies discussed above) so Welli can give users instructions on how to connect with humans if they prefer. As of publication, all of our escalation detection models are built with Vertex AI LLMs.

At Noom, we have three critical people teams who support our users together with Welli:

  1. Clinical Specialist Team: Clinical Case Specialists will intervene when there are potential safety concerns with users.
  2. Human Coach Team: A group of trained human coaches provides motivational support and accountability guidance, ultimately helping users work towards long-term healthy goals.
  3. Support Team: A Customer Support team assists users with technical questions like app bugs or account issues.

For each of these teams, we created three specific escalation paths for users to get connected.

Fig 6. Human & Welli Collaboration
  1. Clinical Specialist Escalation — Welli determines and initiates it. This is Welli’s first classification task. Here, Welli assesses the presence of safety concerns within user messages. When such concerns are identified, Welli will forward the message to our dedicated clinical specialist team. The team will conduct a comprehensive review and deliver appropriate interventions to ensure the safety and well-being of our users.
  2. Human Coach Escalation — Prompted by Welli and initiated by the user. In the instance where a user types “message coach” or a similar command, Welli will connect the user to a human coach who is trained to support more complex coaching needs. This also gives our users control over the type of support that they wish to receive.
  3. Support Escalation — Instructed by Welli and conducted by the user. The support team serves as an additional personal touchpoint. When Welli cannot fulfill a user’s needs, particularly for more technical and account-related inquiries, it will share instructions on how to contact the support team.

In the previous section, we discussed various methods in prompt engineering, such as Role Setting, the CoT technique, and Few-shot prompting, which are extensively used here in classification models. For example, we helped Welli distinguish the differences between human teams in the Role Setting step and provided Welli with CoT instructions to identify safety concerns and determine severity levels. Additionally, Few-shot prompting is incorporated to help Welli classify the scenarios that are challenging to describe in plain text.

Measuring Welli’s performance

As Welli’s development continues to evolve, our team has created a series of tools and methods to measure Welli’s performance throughout multiple stages. In this section, we aim to spotlight a few key tools from each stage, with the intention of providing our readers with a preview of evaluating LLMs:

Fig 7. Tools and Methods for Measuring Performance

By leveraging all the tools and methods above, we can continue to grow Welli and support our users positively at each iteration.

To be Continued…

We plan to continue sharing how Welli is doing and what changes we may make in the future, as well as providing deeper insights and lessons learned during our development. Stay tuned!

Thank you to everyone who has contributed to Welli’s growth across different aspects. Welli’s success is a reflection of collaborative effort from our dedicated teams, including but not limited to Data Science, ML Platform, Software Engineering, Coaching, Product, User Experience Research, and Design.

If you’re excited about what we’re doing at Noom, join us! Check out our open roles here or contact us directly on LinkedIn (Yifan Xia/Calvin Hopkins).

--

--

Yifan Xia
Engineering@Noom

Data Scientist @ Noom. building impactful user experiences through data.