Supercharge your Apps with AI

David Übelacker
Published in
6 min readNov 15, 2023


In today’s rapidly evolving technological landscape, the integration of Large Language Models (LLMs) into various applications has become increasingly popular. These models, such as GPT-3 and GPT-4 from OpenAI, Turing-NLG from Microsoft, LaMDA from Google, and Llama 2.0 from Meta, possess the remarkable ability to comprehend and generate human-like text. In this article, I will focus on integrating OpenAI’s GPT into Java applications using Spring AI, a upcoming framework designed to simplify the development of AI-powered applications.


You can easily integrate LLMs into your app by simply extend the user’s input with the right prompt.

Large Language Models (LLMs)

Large Language Models, or LLMs for short, are at the forefront of natural language processing technology. These models are known for their general-purpose language understanding and generation capabilities. Here are some notable LLMs:

  • GPT-3 and GPT-4 (OpenAI)
  • Turing-NLG (Microsoft)
  • LaMDA (Google)
  • Llama 2.0 (Meta)

Choosing the Right Prompt

Before diving into the integration process, it’s crucial to understand the significance of choosing the right prompt when working with LLMs. The choice of prompt plays a pivotal role in the accuracy and relevance of the model’s responses. Small adjustments to the prompt can lead to entirely different results, making it essential to fine-tune your input.

OpenAI API — A Glimpse

The OpenAI API is the gateway to harnessing the power of GPT-3 and GPT-4 in your applications. It operates in a stateless manner, meaning it does not retain any conversation history. Each interaction requires sending the entire conversation context, and it expects a prompt as input, returning a model-generated completion as output.

Token Limitations

Understanding token limitations is essential when working with the OpenAI API. For GPT-3.5, you have a limit of 16,000 tokens, while GPT-4 offers a more extensive 32,000 tokens. In this context, one token roughly translates to four characters, and 100 tokens equate to approximately 75 words. You can check the token count of your text using OpenAI’s tokenizer.

Challenges and Pitfalls

Working with LLMs, including GPT-3 and GPT-4, presents its own set of challenges and pitfalls:

  • Hallucination: The model may generate inaccurate or fictitious information.
  • Mistakes: It can make factual errors or misinterpret the context.
  • Context Size Limitations: The model’s understanding is limited by the context provided.
  • Not Up to Date: It may not possess the latest information.
  • Prompt Sensitivity: Small changes in the prompt can drastically alter responses.
  • Non-Deterministic: The model’s responses can vary between requests.
  • Prompt Injection: Injecting malicious prompts can lead to undesirable results.
  • Privacy Concerns: Handling sensitive data requires careful consideration.

Spring AI: Simplifying Integration

Spring AI is a project that aims to streamline the development of applications incorporating artificial intelligence functionality. By leveraging Spring AI, you can seamlessly integrate OpenAI’s GPT into your Java applications, minimizing unnecessary complexity.

The project is currently in an experimental phase, but I have strong confidence that it won’t stay in this stage for long! 😉

Hello World with Spring AI

To get started, let’s take a look at a simple “Hello World” example of integrating GPT-4 with Spring AI:


Add following Repository and dependency to your pom.xml or build.gradle:


<name>Spring Snapshots</name>


To begin, you must get an API key for the OpenAI API from the following source:

You can than pass that api key to Spring via application property:

or environment variable:



Now, you can utilize OpenAI and easily implement your own translator, for instance:

Roles and Personas

When interacting with Chat GPT, you can assign roles and personas to shape the responses. For example:

  • User
  • Assistant
  • System
  • Function

By assigning roles, you can customize the AI’s behavior and responses to align with your application’s requirements.

Using this prompt, for example, you can create your own Yoda Assistant:

{"role": "system", "content": "You are yoda from star wars.
You should always reply like yoda would."},
{"role": "user", "content": "hi, who are you?"},
{"role": "assistant", "content": "Ah, the Force keeps me well it does ..."},
{"role": "user", "content": "tell me a joke"}

I’ve published this example implementation on GitHub. Feel free to use it, extend it and experiment with Spring AI:

Prompt Stuffing

How can you make effective use of a Large Language Model (LLM) when you need information or insights on a topic it lacks knowledge of or when its data is outdated? One approach to tackle this challenge is through a technique known as “prompt stuffing.” Prompt stuffing involves incorporating the entire dataset into the prompt itself. While this method is straightforward, it’s most effective when dealing with relatively small datasets.

When asking about the current American president, GPT will respond by stating that it doesn’t possess this information.

However, by putting an article from Wikipedia in the prompt that includes this pertinent information within the prompt, you can obtain the accurate response.

{"role": "system", "content": "Consider the following information when answering
questions: Joe Biden is the 46th and current president of the United States,
having assumed office at noon EST on January 20, 2021."},
{"role": "user", "content": "Who is the current president of the USA?"},
{"role": "assistant", "content": "Joe Biden ..."},

See also:

Retrieval Augmented Generation (RAG)

The challenge associated with prompt stuffing, as previously discussed, lies in the restriction imposed by the context size. While this approach proves effective for handling relatively small datasets, it becomes impractical when dealing with more extensive sources of information such as entire books, documentation, or article databases.

This is where RAG comes into play. RAG is a powerful concept that combines the capabilities of information retrieval with text generation. It allows the model to provide more contextually relevant responses by accessing external knowledge sources.

To put it in straightforward terms, imagine the data you want to use is divided into smaller chunks and stored within a vector database. Vector databases offer a unique advantage: they allow you to search for text similarity.

So, when you need to find relevant information, you can perform a search within the vector database to locate text pieces that closely resemble the user’s input. These similar text segments are highly likely to contain pertinent information, and you can then feed them into the Large Language Model (LLM) through the prompt for further analysis or generation.

See also:


Functions enable you to extend the capabilities of Chat GPT by integrating custom logic into the conversation. This flexibility allows you to create dynamic and interactive applications.

An illustrative example in this context involves providing the Large Language Model (LLM) with a function that allows it to inquire about the weather for a specific location and seamlessly incorporate this information into its response. For further information and detailed instructions, you can refer to the following resource:

Other Programming Languages

When it comes to integrating a Large Language Model (LLM) into your application, regardless of whether it’s not a Java-based application, you have the option to access the API directly. For Python and JavaScript, one of the most widely-used frameworks for seamless LLM integration into applications is Langchain:

More Resources

If you are looking for more, here are two good talks about this topic:



David Übelacker

Fullstack Developer & Software Architect | In love with Web & Mobile Applications