Exploring the Updated LM Studio and Embedchain: Improved Functionality and Performance

Lakshmi narayana .U
4 min readDec 16, 2023
Joyful Tech Innovations- image created by the author and DALL.E-3

In a previous article, I discussed certain challenges associated with LM Studio and the associated server calls.

I am pleased to report that these issues have been addressed in the latest version of the software.

Here is a simple sample code that operates with a Local LLM. In this instance, I’m using the version mistral-7b-instruct-v0.1.Q4_K_M.gguf again.

#Simple example to make a request to LMStudio server, with a model running.
# Example: reuse your existing OpenAI setup
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=[
{"role": "system", "content": "Always answer in rhymes."},
{"role": "user", "content": "Who is Warren Buffet in 50 words"}
],
temperature=0.7,
)

print(completion.choices[0].message)
print(completion.choices[0].message.content)

There’s also another example code that uses both history and context features. This one works well too.

# Chat with an intelligent assistant in your terminal
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

history = [
{"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
{"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]

while True:
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=history,
temperature=0.7,
stream=True,
)

new_message = {"role": "assistant", "content": ""}

for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
new_message["content"] += chunk.choices[0].delta.content

history.append(new_message)

print()
history.append({"role": "user", "content": input("> ")})

In this article, I share a simple code example that uses Embedchain. Instead of using the open AI API, this code uses the local LLM provided by LM Studio.

A brief explanation about Embedchain.

Embedchain.ai is an open-source data platform for building and deploying machine learning models. It enables users to load, index, retrieve, and sync unstructured data, and create RAG applications with a suite of diverse APIs. Embedchain is designed for a diverse range of users, from AI professionals like Data Scientists and Machine Learning Engineers to those just starting, including college students, independent developers, and hobbyists. It provides a wide range of options to customize the app, including custom configurations for the model and data sources.

  • Embedchain simplifies the development of RAG (Retrieval-Augmented Generation) pipelines, handling complexities such as integrating and indexing data from diverse sources, determining optimal data chunking methods, and synchronizing with updated data sources.
  • It offers conventional yet customizable APIs, making it suitable for AI professionals, students, independent developers, and hobbyists.
  • The platform’s clear and well-structured abstraction layers allow users to tailor the system to meet their specific needs, from simple projects to complex AI applications.
  • Embedchain automates data handling, processing, and storage, making it easy to add data to the RAG pipeline and simplify the response process when users ask questions for chatting, searching, or querying.

Just as I did in the previous article, I’ve written a piece of code here. This code is inspired by an example I found on the Embedchain website and I’ve adapted it using ChatGPT.

from embedchain import App
from openai import OpenAI

# Initialize the embedchain application
elon_bot = App()
elon_bot.add("web_page", "https://en.wikipedia.org/wiki/Bill_Gates")
# Set up local language model client
client = OpenAI(base_url="http://localhost:1234/v1")

# Initialize a list to keep track of the conversation history
conversation_history = []

while True:
# Prompt for user input
user_query = input("Enter your query (or type 'exit' to stop): ")

# Check if the user wants to exit
if user_query.lower() == 'exit':
break

# Add user query to history
conversation_history.append({"role": "user", "content": user_query})

# Use Elon-bot to query based on user input
response = elon_bot.query(user_query)

# Add system response to history
conversation_history.append({"role": "system", "content": response})

# Make a request to the local LM using the conversation history
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=conversation_history,
temperature=0.7,
)

# Extract and format the response from the local language model
formatted_response = completion.choices[0].message.content
formatted_response = formatted_response.strip() # Remove any leading/trailing whitespace

# Print the neatly formatted response
print("\nResponse:\n" + "="*10)
print(formatted_response)
print("="*10 + "\n")

# Update conversation history with the latest response
conversation_history.append({"role": "assistant", "content": formatted_response})

print("Exiting the query system.")

Here, I’ve outlined a series of steps from this specific example, following the tips I shared in my previous article. This comes after doing more tests and troubleshooting.

Specific Steps to LM Studio, Embedchain and Python

In the coming weeks, I aim to use this method with other embedchain applications and try it with different local LLMs.

--

--