Running ChatGPT style LLM on a local machine for sensitive data

4 min readApr 19, 2023

Introduction

ChatGPT has taken the world by storm. The simple chat-style interface backed by an enormously impressive Large Language Model (LLM) has made it possible for people to ask questions in clear text and receive high-quality answers about everything from coding help to business ideas (although not always right and sometimes quite bananas). In a previous post, I wrote about how I learned that ChatGPT could provide me with code for a test data generation tool for a very specific context within a couple of minutes.

However, there is a question of privacy, and there have been incidents where sensitive code has been sent to ChatGPT for feedback resulting in the code potentially being available for the company OpenAI that created ChatGPT. The answer to the problem of getting feedback and help from an LLM such as ChatGPT without letting code slip outside of an organisation is not around the corner; it is already here, has an installer, and can be run on your local laptop.

Competition Has Entered the Arena — Meet GPT4All

There is a wild amount of LLMs floating around the internet (check out the open LLM scoreboard for examples), and a big part of them are based on openly available material for everyone to download to their computer. I recommend a visit to Huggingface to take a look at what is available for everyone today. To run actually use an LLM you need a software with an interface that enable you to ask or instruct the LLM. While investigating different tools, I found GPT4All working particularly well on my office-grade laptop without a dedicated GPU and I’ve chose to use it as an example in this post. It is also very user-friendly installer for different systems and a user-friendly interface which let you select a compatible LLM to download and use. Everyone can install this software.

In my previous post, I showed how I used ChatGPT to create a random test persona to help a tester put himself or herself into the shoes of a user with various levels of technical understanding. Here is a perfectly usable answer from GPT4All, correctly formatted in a JSON response format:

Using GPT4All to create of test data correctly formatted as a JSON-response

The available LLM’s are today generally more limited than ChatGPT (especially compared to ChatGPT4), and just like ChatGPT, it can give some interesting responses. Meet John, the experienced software developer with technical skills of a beginner:

John, the experienced software engineer with the technical skill level of a beginner

What This Means

In this example GPT4All running an LLM is significantly more limited than ChatGPT, but it is running on my office laptop and can already give me usable answers. Running it locally opens up a world of possibilities where companies, organisations, or just people having a hobby can train and run an LLM without having to worry about sensitive data leaking to companies controlling the LLMs. In theory, you could train a local LLM on all data you have about a certain type of client or about a market area, and then testers can use the LLM to create test data or test tools on the fly without the risk of leaking sensitive data. This can potentially accelerate testing since the tester could get an answer directly from the LLM instead of searching and booking meetings with busy people in marketing or business roles that sit on the required information about the customers. And the tester could potentially get more correct test data, compared to guessing, that could help find issues at an earlier stage. It could also go bananas so at this point in time one have to be a bit careful and being able to understand the validity of the returning data.

Summary

Large Language Models such as ChatGPT can cause a privacy concern when potentially sensitive information is sent to them. There are plenty of alternatives that can be run locally, even on an office-grade laptop. This means that a company or an organisation can keep sensitive data in-house while still providing access to an LLM to accelerate testing.

About the author

Martin Nilsson has long experience within software testing and software development. He has worked with everything between test automation and exploratory testing, embedded software and cloud. Today his focus is on quality in development life cycles, from business case to code to test to customer and back. You can follow him on Twitter @MartinNilsson8.