Running Open Source LLM models Locally
In my previous posts, I showed you how to build a chatbot connected to databases using OpenAI. However, the security risks associated with external services led me to explore a better solution.
In this blog, I’ll guide you through the simplest way to deploy a Language Model (LM) locally using LMStudio. Learn how to tweak your chatbot’s code for local LM integration and join me in testing its capabilities against the conventional OpenAI service. Let’s enhance security without compromising on AI brilliance — let the journey begin!
1. Overview UI Tools for LLMs
There are various tools with different functions that make it simple to download and run LLM models with just a few clicks. Three commonly used open-source options are
While these open-source tools have different functionalities, most of them support easy model downloads and provide a chat UI for convenient interaction. Notably, tools like LMStudio go a step further by offering a function to run the model as an API service with just a few clicks.
Considering I am using Windows and aiming to utilize the model via API, my preferred choice is LMStudio.
2. LMStudio
2.1 Download and Install LMStudio
You can easily download LMStudio from their website. After downloading and running the installation file, here is the user interface you will encounter.
2.2 Setup Llama 2
The chart bellow compares open-source LLM models with different datasets. Running these models on my PC (32GB RAM, RTX 3070), I prefer lightweight options for better performance.
Among the 7 billion parameter models, Llama-2 (7B) outperforms MPT (7B) and Falcon (7B). Therefore, I’ll start testing with Llama-2 (7B).
To download and install Llama-2 (7B) models, follow the steps outlined in the provided image.
Once the download is complete, you can initiate a chat with the model by clicking on the Chat icon. Choose the desired model, and you’re ready to start using it.
2.3 Serving Llama-2 Models
To run Llama-2 models as an API service, click on the depicted icon, and then click “Start” as shown in the image below. This will enable access to the Llama model via API. The tool also offers sample code to connect to the API service, which you can easily copy and try out.
The tool also offers sample code to connect to the API service, which you can easily copy and try out.
# Example: reuse your existing OpenAI setup
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=[
{"role": "system", "content": "Always answer in rhymes."},
{"role": "user", "content": "Introduce yourself."}
],
temperature=0.7,
)
print(completion.choices[0].message)
Result :
ChatCompletionMessage(content="I'm a poet, so I must confess,\nMy name is Rhyme, and that's my address! 😊", role='assistant', function_call=None, tool_calls=None)
3. SQL Agent with local LLM Model
In the last blog, I showed how to use Langchain and OpenAI API to create a SQL Agent — a chatbot that connects to databases, generates queries, executes them, and answers questions based on the results. Please check here for more detail.
In this section, I’ll guide you on creating an SQL Agent using the Llama 2 model instead of Open AI. This approach ensures the SQL Agent operates entirely on-premise, eliminating any security concerns.
Step 1: Import requirements lib
from langchain.llms.openai import OpenAI
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.agents.types import AgentType
Step 2 : Setting model
To define the model, we’ll use the OpenAI library. However, instead of using the default OpenAI URL, we’ll utilize the URL provided by LMStudio in part 2.3.
model = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
Step 3 : Configure another component and create the SQL Agent.
# Configure DB
db = SQLDatabase.from_uri("teradatasql://dbc:dbc@192.168.11.7:1025/Sales")
# Configure toolkit
toolkit = SQLDatabaseToolkit(db=db, llm=model)
# Create SQL Agent
agent_executor = create_sql_agent(
llm=model,
toolkit=toolkit,
verbose=True,
)
Step 4 : Test SQL Agent
agent_executor.run("Number of customer")
Result :
> Entering new AgentExecutor chain...
Action: sql_db_schema
Action Input: None
Observation: Error: table_names {'None'} not found in database
Thought:It seems that the table names are not defined, so I need to use the `sql_db_list_tables` tool to get a list of tables in the database.
Action: sql_db_list_tables
Action Input: None
Observation: UserHistory, UserHistoryReference, customer, orderdetail, product, user_activity
Thought:
---------------------------------------------------------------------------
OutputParserException Traceback (most recent call last)
File ~\miniconda3\envs\easyml_llm\lib\site-packages\langchain\agents\agent.py:1066, in AgentExecutor._iter_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)
1065 # Call the LLM to see what to do.
-> 1066 output = self.agent.plan(
1067 intermediate_steps,
1068 callbacks=run_manager.get_child() if run_manager else None,
1069 **inputs,
1070 )
1071 except OutputParserException as e:
The SQL agent isn’t working well because it’s using the wrong order of tools. It should start with “sql_db_list_tables” to understand what tables are in the database first, instead of using “sql_db_schema” initially. This change in the tool order will help the agent perform better and provide more accurate results.
Conclusion
In conclusion, LMStudio offers a powerful tool for effortlessly running open-source large language models, with the standout feature being the ability to serve models as APIs with just a few clicks.
While the performance of Llama-2 (7B) may be comparatively lower than very basic models like `text-davinci-003` from the OpenAI API, it falls short even in handling straightforward SQL syntax.
To address this challenge, I will explore solutions such as adjusting prompts, fine-tuning, exploring RAG, or experimenting with alternative models. These strategies and their detailed implementation will be discussed in the upcoming blog.