Creating Locally-Running LLM Chatbot using Java and Spring Boot

Published in

Javarevisited

4 min read1 day ago

This article will explain how to create a chatbot that interacts with a pre-trained LLM model. The solution can be running locally on a desktop or laptop, with no internet connection required. The solution will be demonstrated using Java and Spring Boot. It is hoped that this article will be of interest to those looking to create similar solutions, as there are few tutorials on how to do this (most tutorials use Python).

Background

In recent times, there have been a number of applications developed that are capable of running LLM chatbots on a user’s local machine. The majority of these applications, such as Ollama, are built on the Python platform. This article aims to demonstrate that it is feasible to utilise Java for a similar purpose. Following the reading of this article and the implementation of the provided code, it is our hope that you will be able to ascertain the viability of using Java for the operation of a locally-running LLM chatbot.

Problem Statements

In summary, this article addresses the following issue:

“ How to run LLM chatbot in your local machine using Java”

Solution

First of all, we need to formulate how to interact with LLM model. Following the example of other solutions such as Ollama, we can use LlamaCpp (https://github.com/ggerganov/llama.cpp) to interact with the LLM model. Fortunately, another library has already ported llamacpp to Java (https://github.com/kherud/java-llama.cpp). This library will be instrumental in our interactions with the LLM model.

In summary, the main external library that we will used are:

Spring Boot 3.3.2
Java-Llama 3.2.1
Java 17+
GGUF type LLM Model

The overall code for this solution is in this github: https://github.com/rsatrio/llm-chatbot-springboot .

Let’s discuss several points on it. First, the LlamaModelComponent. This component is used to define the LlamaModel we used. I usually use TinyLlama (https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) to test the code.

@Component
public class LlamaModelComponent {


 @Value("${llamacpp.model}")
 private String modelPath;


 @Value("${llamacpp.thread.cpu}")
 private Integer cpuCount;

 @Value("${llamacpp.number.context}")
 private Integer nCtx;



 LlamaModel modelLlm;

 @PostConstruct
 public void init() {
  
  //Nullify log
  LlamaModel.setLogger(LogFormat.TEXT, (level,message)->{

  });
  
  //Prepare model params
  ModelParameters params=new ModelParameters()
    .setNThreads(cpuCount)
    .setNCtx(nCtx)
    .setModelFilePath(modelPath);

  modelLlm=new LlamaModel(params);
 }

 public LlamaModel getModelLlm() {
  return modelLlm;
 }

 @PreDestroy
 public void beforeExit() {
  modelLlm.close();
 }

}

Several things to note from the code snippet above:

Logging has been disabled due to the verbosity of the llamacpp log
Inference is running on CPU

Next, let’s see the Prompt Component. This is responsible for one of the key functions of LLM Chat.

@Component
public class PromptComponent {

 @Value("${llamacpp.model}")
 private String modelPath;

 @Value("${llamacpp.prompt.path}")
 private String promptPath;


 private String promptContent;
 
 final Logger log1=LoggerFactory.getLogger(this.getClass());

 @PostConstruct
 public void init() {
  try {
   if(Files.exists(Paths.get(promptPath),LinkOption.NOFOLLOW_LINKS)) {
    List<String> stringPrompt=Files.readAllLines(Path.of(promptPath));
    promptContent=stringPrompt.stream().collect(Collectors.joining(System.lineSeparator()));
   }
   else {
    promptContent=getDefaultPrompt();
   }
  }
  catch(Exception e) {
   log1.error("Error:",e);
  }

 }
 
 public String getPromptContent() {
  return promptContent;
 }

 private String getDefaultPrompt() {
  String promptDefault="<|system|>\r\n"
    + "You are Chatty, a friendly assistant chatbot who always responds to user questions and inquiry.</s>\r\n"
    + "<|user|>\r\n"
    + "{question} </s>\r\n"
    + "<|assistant|>";

  return promptDefault;
 }

}

As illustrated in the code above, the default prompt utilizes the format employed by the TinyLama model. The “{question}” component will be substituted with the question entered by the user.

<|system|>
You are Chatty, a friendly assistant chatbot who always responds to user questions and inquiry.</s>
<|user|>
{question} </s>
<|assistant|>

Next, let’s check out the Chatbot Service that we have implemented

@Service
public class ChatbotServicesImpl implements ChatbotServices {

 @Value("${llamacpp.temperature}")
 private String modelTemperature;


 @Value("${llamacpp.topp}")
 private Integer modelTopP;

 @Autowired
 PromptComponent promptComponent;

 @Autowired
 LlamaModelComponent modelComponent;
 
 final Logger log1=LoggerFactory.getLogger(this.getClass());


 @Override
 public void generateResponse(String question) {
  log1.info("Receive question:{}",question);
  LlamaModel modelLlama=modelComponent.getModelLlm();

  String prompt=promptComponent.getPromptContent()
    .replace("{question}", question);
  String listAntiprompt="</s>,<|im_end|>,User:";
  log1.info("Prompt:{}",prompt);
  InferenceParameters inferParams=new InferenceParameters(prompt)
    .setTemperature(new Float(modelTemperature))
    .setTopP(modelTopP)
    .setFrequencyPenalty(0.2F)
    .setMiroStat(MiroStat.V2)
    .setStopStrings(listAntiprompt.split("[,]"));

  for(LlamaOutput output:modelLlama.generate(inferParams)) {
   System.out.print(output.toString());
  }
  //Add line separator
  System.out.print(System.lineSeparator());
 }

}

As illustrated in the code above, this service will used the LlamaModel that have been created earlier. We can also see that we need to set several inference parameters. These parameters regulate the behaviour of the inference process. For instance, the temperature parameter determines the degree of creativity in the responses generated by the chatbot. The stop strings specify the point at which the inference process is terminated, which is triggered as soon as a particular word is identified.

Last but not least is the Chat Service. This service will use the chatbot service and is the main service that is called by the Spring Boot Main Class.

@Autowired
 ChatbotServices chatbot;

 @Override
 public void startChatService() {

  Scanner scanner = new Scanner(System.in);
  //Welcome Greetings
  StringBuilder welcomeGreetings=new StringBuilder();
  welcomeGreetings.append(System.lineSeparator()+System.lineSeparator());
  welcomeGreetings.append("Welcome to LLM Chatbot Apps"+System.lineSeparator());
  welcomeGreetings.append("Please insert your question after the > prompt"+System.lineSeparator());
  welcomeGreetings.append("Type 'exit' to leave the chat prompt "+System.lineSeparator());
  
  System.out.print(welcomeGreetings);
  while (true) {
      
   System.out.print(System.lineSeparator()+"user (it's you)> ");
   String input = scanner.nextLine();
   if (input.equalsIgnoreCase("exit"))  
   {
    break;
   }
   chatbot.generateResponse(input);
  }
  
  //Close scanner
  scanner.close();
  
  //Close the app
  System.exit(0);
 }
}

As you can see from the snippet above, it is basically straight forward. Only a loop forever that can be interrupted by sending “exit” keyword.

Is the code above working? In below file you can see that the chatbot is running well on my local machine.

And so that’s it, your working local LLM chatbot. If you have any questions or suggestions, please leave a comment below and I’ll do my best to answer. Also do not forget to clap for this article if you find it helpful. Cheers!

References

Creating Locally-Running LLM Chatbot using Java and Spring Boot

Written by Rizky Satrio