The “Jobs to Be Done” of Conversational Systems

Published in

Speaking Artificially

7 min readJul 18, 2023

Generated by Bing Image Creator powered by DALLE-2

One framework Product Managers employ to empathize with customers and gain a unique understanding of the features required to enable them to want to purchase a product is the “Jobs to be Done” (JTBD) framework.

A definition pulled from BuiltIn:

Jobs to Be Done is a theory stating that customers don’t buy products, they buy the completed jobs the products help bring about. For example, someone doesn’t buy a screwdriver because of its features, they buy what the screwdriver ultimately does for them: helps assemble furniture so their home looks better.

Effectively, Product Managers are trying to understand the ultimate intended end goal of the consumer and how the product fulfills that need, not necessarily the individual features the solution comprises of.

So what are the Jobs to Be Done by conversational systems? Specifically, when we examine the common jobs across conversational systems today and more broadly, those that utilize natural language processing, are there any patterns?

Note: for purposes of this analysis, we will examine non-contextualized conversational and NLP systems and will later be compared and contrasted in an upcoming post.

Lets example the common types of systems and extract the Jobs to Be done:

Command-and-Control — Often seen within smart assistants such as Google or Alexa, commend-and-control interfaces are intended to take requests and take immediate action.

They take short command-like utterances from a user such as “turn off the light!” or “tell me the weather” and perform a specific task the system is pre-trained to do. Often they are limited to a specific scope such as modifying the state of smart home devices such as lights, querying the internet for information to answer questions, or more playful tasks such as telling jokes and making small talk. These systems are also fueled by extensions by third party developers in a broader ecosystem to invoke specific tasks such as skipping a song in Pandora or Spotify or reading back specific sports news within ESPN.

The system uses speech recognition to translate the command into text after a wakeup word or phrase that triggers the microphone to start listening (if applicable), utilize natural language understanding to classify the intent and extract the appropriate parameters/slots, and play back audio with a response once the action has been performed.

Command-and-Control center around one primary job — hands free convenience and the manipulation of devices. Imagine someone at home carrying a basket containing a load of laundry in their hand entering into a dark room to put the clothes away. What would be the hastle of putting the basket down, turning on the light, and picking the basket back up? Could those steps be removed by simply asking a smart system to turn the lights on for you? In an alternaitve scenario, imagine a family sitting around a dinner table and having a discussion about a particular topic. One point is raised that does not have an answer and requires a quick internet search. The member calls out to an assistant to get the answer and, without leaving the table, receives it. Isn’t that convenience?

Self-Service — Self-service engines are one of the most frequently employed use cases of conversational systems. They are intended to provide a level of self-service away from a live person, highly trained on the subject matter, for common tasks, thereby saving those responses for long-tail or more complex requests, subsequently cutting costs.

Self-service systems are found over voice channels such as interactive voice response systems or chatbots within a variety of applications from e-commerce, banking, learning management systems, and much more. The intended goal is to contain the person within the system and accomplish a specific task — opening a credit card account, checking the status of a package, or guidance through a process.

These systems rely on a conversational interface powered by a state system or dialog engine, speech processing, natural language understanding for intent classification, and most of all — external functions. These functions can be queries to a database to retrieve information or external API calls to backend systems to perform requested task.

The Job for self-service systems does not service the actual end users, rather, those organizations employing the system. A common trend in customer service is deflection towards chat channels and imposition of fees for using live agents for tasks that can be done on one’s own. For anyone going through that system (especially when companies go through efforts to hide phone numbers, isn’t that frustrating?). For the organizations themselves, it’s to reduce the number of inquiries and reduce operational expenditures.

Question and Answer — commonly on self-service chatbots, a subset of the inquiries inputted into the system is short-tail, frequently asked questions.

Question and answer use cases typically involve using a natural language understanding engine to classify the inputted request, either by voice (post transcription) or text, querying a backend FAQ database, and passing the answer back as the response.

The job for Q&A use cases is to deflect common questions away from a modality which serves to answer complex and nuanced situations via a skilled agent and encourage searching for those answers on a web page or other source.

Extraction and Analytics — often overlooked, this type of system aims to take natural language inputs and convert them into quantitative outputs for analytical purposes.

Common extraction services such as sentiment analysis and tone analysis utilize sophisticated natural language models built on psychological research (such as hume.ai) to extract these features and assign a numerical value to the result. These numerical values can be confidence scores — indicative of the probability that the estimate is correct as evaluated by the system — or grade of intensity — such a how positive or how negative a response might be.

For example, an application might extract sentiment scores on online customer reviews of a particular product or product line to power a forecasting model of future sales of the product extract features from movie or TV show scripts to create a cohort analysis about a particular set of viewers to power recommendations.

The true job of these systems is translation — taking a non-quantifiable format such as natural language text and translating that into a quantifiable value to act upon with a particular set of logic criteria or algorithm. When combined with other attributes and processed within a machine learning model, this data can serve extremely valuable for crossing the barrier between conversational AI and more traditional machine learning.

Biometric Confirmation — Advancements in biometric analysis has enabled a new set of identity confirmation through voice.

Biometric systems create an imprint of your voice and can detect deep fakes or impersonators to flag fraud accordingly. (I personally am not a biometrics expert so I will speak briefly on the subject.)

If you have encountered systems to create a vocal password with a phrase “At {company}, my voice is my password.” than you are already interacting with these types of systems.

Biometric use cases are one of the simplest ones to match within the Jobs to Be Done framework as the job is extremely straightforward — combat evolving technology’s ability for fraud and continue to reduce the friction of authenticating oneself to allow access to confidential and highly sensitive materials belonging to oneself.

Content Creation — powered by Generative AI algorithms, these systems input natural language (and other unstructured data) as prompts and output computer generated content in the form of text, images, code, or video.

The systems use large language models, a model containing billions of tokens, or word fragments, and underlying transformer algorithms (oversimplifying here) to translate from one modality into another utilizing a natural language instruction or inquiry. Effectively, LLMs are broadly trained to amass a large corpus of “knowledge” and can maintain a level of “memory” from previously generated content within the response and the initial prompt. Thereafter, these systems predict future words within the generated sentence to produce the end result.

The resent surge of this modality has permeated all aspects of productivity applications including generating first drafts of common templates (cover letters, marketing copy, home pre-approval letters), writing code to aid software engineering or data analytics, generating images as book illustrations, and even producing training content with a virtual avatar.

Most notably, content creation system’s jobs are to resolve the blank canvas problem, promote agility and iteration, and reduce the time to proof of concept, thereby saving time and money.

The point to be made here about Artificial Intelligence in the Jobs-to-be-Done framework is that the overall “Job” of AI is augmenting human intelligence and removing lower-level work to allow us as a species to focus on critical thinking, creativity, and furthering society as a whole.

Convenience (Self-Service)
Focus on complex tasks (Question & Answer)
Quantification of the Human Language (Extraction)
Self-Authentication / Reduction of Friction (Biometrics)
Productivity (Content Creation)

To be clear, the above focuses on Natural Language based services, NOT quantitative machine learning, which will further expand the above list.

The thesis behind Artificial Intelligence stemmed on solving large-scale human problems through training a 24/7, hyper-intelligent, never-aging, continuously learning subject matter expert. Our own human curiously as lead us to exploring artificial general intelligence (AGI) but I leave that analysis to the reader and a further blog post. Technological shifts change the paradigm of operation. While technology (not limited to software) removes some jobs, it creates others and advances society as a whole.

The “Jobs to Be Done” of Conversational Systems

Written by Sam Bobo