Use cases and challenges of Generative AI for text data

konrad bachusz
Credera Engineering
8 min readNov 1, 2023
Photo by Lukas: https://www.pexels.com/photo/person-wearing-black-watch-holding-macbook-pro-574080/

The aim of this article is to give you a better understanding of Generative AI technology in the context of Natural Language Processing (NLP). We’ll define what we mean by Generative AI and talk about some common use cases and challenges.

Before we delve into the topic, let’s define what we mean by Generative AI.

What is Generative AI?

Generative AI is a newer form of artificial intelligence and is contained within a couple of AI domain areas. In order for us to understand how they relate to each other, let’s look at some definitions.

  • Artificial Intelligence (AI): The theory and development of computer systems that are able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
  • Machine Learning: Application of AI that enables systems to learn and improve from experience without being explicitly programmed.
  • Deep Learning: A type of machine learning based on artificial neural networks in which multiple layers of processing are used to extract progressively higher-level features from data.
  • Generative AI: A type of artificial intelligence technology that can produce various types of content, including text, imagery, audio, and synthetic data. It’s sometimes referred to as Gen AI. This article is specifically focused on applications related to text data.

The diagram below gives you an intuitive understanding of how these areas relate to one another.

How Generative AI relates to AI as a whole

Gen AI has grown in popularity after recent advancements in the Natural Language Processing (NLP) space and subsequent releases of interactive tools such as DALL-E (used for image generation) and ChatGPT (used for text generation). The below timeline illustrates some of the most recent AI innovations. It’s worth mentioning that the field of AI is much older, with the earliest neural networks being developed in the 1940s.

The timeline of recent developments within AI

Given the popularity of LLMs and tools like ChatGPT, Github, Copilot, and Midjourney, Gen AI is currently considered to be in the Peak of Inflated Expectations stage of the Gartner Hype Cycle for Emerging Technologies.

Peak of Inflated Expectations

It’s important to emphasise that not every AI solution is (or should be) using Generative AI under the hood. Gen AI is simply one of the tools within the broader field of AI and data science.

Use cases of Generative AI

“A model is only good where it brings value.”

In the NLP space, LLMs and Gen AI are mostly relevant to organisations that deal with huge amounts of text data. Examples of such organisations are those operating in retail and investment banking, insurance, energy sector, e-commerce, healthcare, the legal field, and so forth.

Some of the most common use cases are listed below:

  • Search
  • Text generation
  • Document processing
  • Document understanding
  • Generating product descriptions
  • Text summarisation
  • Policy or recommendation analysis
  • Translations
  • Personalised marketing
  • Customer service and support
  • Sentiment / Intent recognition
  • Text editing
  • Code generation
  • Code auditing
  • Code documentation generation
  • Knowledge bots
  • Troubleshooting incidents
  • Question answering
  • Speeding up operating processes and increasing staff efficacy
  • Running applications to perform the above solutions locally — e.g. on planes, container ships etc.

Challenges with Generative AI

As with any new technology, Gen AI introduces some challenges.

Hallucinations:

Hallucinations occur when an LLM generates false information that seems plausible. Hallucinations can be deviations from external facts, contextual logic, or both. Below is an example of a hallucinated response from a GPT-3.5 model. The description of AWS Jumpstart wasn’t included in the model’s training data, therefore the model response is incorrect. For someone unfamiliar with Jumpstart, this could easily appear plausible.

Example of LLM hallucination

One common approach to prevent hallucinations is to use Retrieval Augmented Generation (RAG). It’s a technique that can provide more accurate results to queries than a generative large language model on its own because RAG uses knowledge and additional knowledge by the user, as well as data already contained in the LLM. With RAG, it’s possible to see which document the model used to generate the answer. If the answer is unknown, we can make the model say so by a properly engineered prompt.

Model bias and toxicity:

Given that the foundation models are often trained on publicly available text from the internet, they may contain a lot of toxic assumptions about gender, race, religion, age etc. When such models are exposed to the public, they can have a significantly negative impact on individuals and present reputational risks for companies. In order to prevent that curation of training and fine-tuning, data is often required.

Reliability:

There are a couple of challenges in terms of the reliability of LLMs. First of all, prompt engineering methodologies often have to change depending on the model used. For example, GPT-4 and AWS Claude may not give the most accurate response despite using the same prompt template. In addition, given the stochastic nature of neural networks, there’s no guarantee that we’ll get the same response each time given the same prompt. Furthermore, scaling and latency are additional challenges in production, as we often need to accommodate a large number of users at the same time and models need some time to produce the outputs. Moreover, in the LLM world, there is always data drift. Every language model has a cut-off date corresponding to the up-to-dateness of the text used for training. For example, GPT-4 has a cut-off date of September 2021, meaning that the model wouldn’t be aware of the events that happened after that date unless we explicitly fine-tune it or provide it in the prompt context.

Safety, privacy, and intellectual property concerns:

This is a risk that many individuals and organisations have. When training or fine-tuning the model, as well as using a third-party Gen AI API, there is a possibility that the sensitive data may be exposed to a model that we don’t have control of. In order to prevent this, it’s recommended to involve tech, PR, and legal when starting new Gen AI initiatives to discuss these risks. In addition, it is important to curate the training dataset to make sure it doesn’t contain sensitive data like personal information, API keys, passwords, or secrets. Furthermore, it’s possible to use an open-source LMM model and manage it within your IT infrastructure or use a private LLM third-party provider to make sure that your data is not used to train someone else’s model.

Accuracy and testing:

The standard machine learning metrics like accuracy, precision, recall, or f1-score are no longer relevant when it comes to testing LLMs. Objective LMM model testing and explainability are difficult and not yet solved. There are emerging tools like HELM that help us evaluate the models on metrics like fairness, bias, robustness, toxicity etc. However, it’s often difficult to use HELM to test domain-specific models (e.g. models focused on technical documentation of aeroplanes) and it doesn’t cover testing solutions with RAG.

Operationalising Gen AI:

Building a production quality solution that involves Gen AI is challenging given the recency of this technology and the fact that there are not many best practices developed yet. Data governance standards, IT infrastructure, and data engineering resources have to be accounted for when building such solutions. Large cloud vendors like AWS, GCP, and Azure are trying their best to release new Gen AI offerings, like AWS Bedrock or GCP’s Generative AI Studio, to reduce those challenges for their clients.

New security risks:

There are new cyber security risks associated specifically with Generative AI. These risks include prompt injections, jailbreaking (including simulations and role-playing), and prompt leaking. Some examples are shown below.

Example of prompt injection
Example of prompt leaking

Using appropriate guardrails or adversarial prompt detectors are some of the current mitigation strategies against the above attacks.

Limited reasoning, planning, and mathematical capabilities of models:

The current state-of-the-art LMM models are generally struggling with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. Below is an example of one such case.

ChatGPT simple math calculation mistake

The research done in the academic and private sectors is trying to address these limitations.

The pace of development in the AI field:

Both within the academic field and private sector, the area of Generative AI is evolving very rapidly. New, larger LMMs are being released frequently and organisations are struggling to decide on the best solution for their problems given the ever-changing landscape. At the same time, hiring experts in the field is a challenge given the recency of Gen AI.

Costs and sustainability:

The process of training new foundation models, fine-tuning them, or hosting the associated cloud infrastructure needed to allow consumers to use Generative AI solutions is expensive and has an environmental impact in terms of carbon footprint. For example, GPT-4 was trained on over 16,000 A100 GPUs, and the cost of training was more than $100 million. In practice, most organisations won’t train foundation models from scratch, but this is still something to keep in mind when fine-tuning and making inferences.

Managing expectations:

Given the stage of the hype cycle that Gen AI is in right now, it is important to be realistic about what this technology can and can’t do at this moment in time. It’s important to communicate this to relevant stakeholders in order to avoid future disappointment and failed projects.

Conclusion

We can see that Generative AI has many interesting use cases as well as challenges. To wrap up, it is important to note that Generative AI does not present a solution to all technology problems. Even within the NLP-specific domain, we have some LMMs that perform better on certain tasks than others.

When beginning new Gen AI initiatives, it’s important to start by understanding what business value the project would bring. If you are considering starting a new Generative AI initiative but find it hard to make sense of the ever-changing AI landscape, we at Credera have vast experience in helping our clients to generate enterprise value from their data by supporting them with selection, development, and operationalisation of insights. We do this by enabling machine learning techniques and automating their integration within the business. Our three focus areas are aimed at supporting clients with their end-to-end ML project lifecycle.

These areas include:

  • AI Strategy, Use Case & Op-Model Development
  • Rapid Prototyping / Model Building
  • ML Engineering & Operationalisation
Credera's AI Focus Areas

To find out more, visit our Credera AI website page.

Got a question?

Please get in touch to speak to a member of our team.

--

--

konrad bachusz
Credera Engineering

Senior Data Engineer @ Credera. I’m passionate about all things to do with AI, data and analytics 📈