Navigating the GenAI Landscape: Selecting Ideal Use Cases for Your Organization
In the midst of GenAI’s accelerating digital transformation, many organizations are actively identifying and prioritizing specific use cases for integrating Generative AI into their applications. These use cases are carefully collected and evaluated to determine their potential return on investment (ROI) or to gauge the possible savings in full-time equivalent (FTE) resources. The urgency to deploy GenAI applications is palpable across various sectors, and the process of selecting the most suitable use case for experimentation or implementation is a common challenge. In this blog post, I will discuss some common approaches to this selection process and offer insights on how to refine these methods to enhance the success rate of your GenAI initiatives.
Current approaches
Most teams that are assigned with this task would start their investigation by interviewing teams and trying to identify applications where if GenAI is implemented it would have the biggest savings on human capital by calculating how many FTEs (Full Time Equivalency) could be saved. While this is the most direct way to choose the best ROI project. It is misleading and would result in wrong choices and sometimes a failed GenAI experiment.
Here’s an example of what a usecase collection could look like:
Typically, teams tasked with the integration of GenAI begin by conducting interviews with various teams and departments, seeking to pinpoint areas where GenAI could yield the most significant savings in human capital. This is done by calculating the potential reduction in FTEs. While this approach seems straightforward and promises to identify the projects with the best ROI, it can be deceptive, leading to suboptimal choices and, in some cases, unsuccessful GenAI trials.
Consider the following example of what a use case collection might resemble:
At first glance, you might be inclined to prioritize the second use case, but by the end of this post you will discover why the first use case should win the priority.
Identifying the gaps
GenAI diverges from traditional software development in that the outputs of GenAI applications are non-deterministic [1] and occasionally inaccurate, which we refer to as hallucinations. This distinction is crucial when considering the integration of GenAI into your applications. One should always ponder the following: If a customer or user is asked about incorporating GenAI into their application and their response is:
Any help from GenAI is better than none
then you have likely identified an ideal GenAI use case. Such scenarios include tasks like drafting correspondence or creating a project plan, or perhaps assisting in risk identification. Broadly speaking, these instances can be categorized as “GenAI as a Co-Pilot,” where the primary objective is to aid humans in completing tasks. A commonality among these examples is that if the GenAI’s output is flawed or incomplete, the user can refine and build upon it. In rare cases, the entire output might be discarded.
GenAI is not your typical software development. Unlike traditional software the outputs of GenAI applications is not deterministic [1], and sometimes not correct. This is a very important point to think about when integrating
Conversely, if the customer or user states:
I want GenAI to automate this task for me
this type of request should trigger a series of follow-up questions that could significantly impact your FTE or ROI calculations. Consider what would happen if the GenAI performs the task correctly only 80% of the time. For instance, if you’re developing an application that generates financial reports by analyzing data from various sources, you must recognize that LLMs powered by Retrieval-Augmented Generation (RAG) applications might overlook critical information during the retrieval phase. In the case of our hypothetical financial analysis application, if there’s a risk that the report could contain incorrect figures or misclassified data, how valuable is the application? Is a partially correct output better than none? The resounding answer is no. The implications are clear: such an application could not only be deemed useless but could also erode trust in AI systems and force users to redo the work, thereby negating any time initially saved. Mitigation strategies exist, such as implementing a “Show your work” feature, allowing users to trace the AI’s reasoning or steps taken to generate the final output. However, this may not always be feasible if the user lacks the skills to validate the steps taken by the AI.
A refined approach
The initial step remains the same: gather a comprehensive list of use cases, as illustrated in the previous example.
Next, we must assess the expected performance of the LLM application for each specific use case. Will it function correctly 60% or 90% of the time? Prompt engineering and rapid prototyping should be employed to test this, and the outcomes should be critically evaluated.
In the second step, informed by these results, we introduce a new metric: the adjusted FTE/ROI. As demonstrated in the scenarios above, how does the LLM’s performance influence its value? If users must validate the results, what does that imply for the purported FTE savings? Here’s how an adjusted table might look:
Note that although the anticipated accuracy of the model for the “Drafting a Marketing Campaign” use case is 60%, there has been no change in its adjusted Full-Time Equivalent (FTE). This is because the usecase was identified to give users a starting point or assist with generating brainstorming ideas. Therefore, a 60% accuracy level is considered satisfactory for this purpose.
I hope this is helpful in deciding what usecase to pick for GenAI in your organization! Happy coding.
[1] There are many efforts on improving LLMs to make their output more determinstic like OpenAI functions, and other prompting techniques.