Running a Local OpenAI-Compatible Mixtral Server with LM Studio

Local and Loaded: Elevate Your Terminal Talk with Mixtral on Your Mac

Ingrid Stevens
5 min readJan 7, 2024
Guide to Local Inference with LM Studio

LM Studio is an easy to use desktop app for experimenting with local and open-source Large Language Models. In this guide, we’ll walk through the simple steps to set up an OpenAI-compatible local server using LM Studio. You can seamlessly switch out your OpenAI client code for a LM Studio endpoint by changing up the base URL and directing your completion requests to your local Mixtral instead of OpenAI servers. (Note: you can use any model available on LM Studio, but for this demo I use Mixtral).

This demonstration uses Mixtral 8x7B Instruct Q3 quantization (thanks to TheBloke). I’ve done these steps on a M1 32GB mac, so if you have similar hardware, these steps should also work for you.

Wait…point of clarification: what is a local inference server? A local inference server processes and executes predictions or inferences based on a machine learning model, operating locally on your computer rather than a remote server. (“Inference” in the context of machine learning refers to the process of using a trained model to make predictions or draw conclusions from input data, essentially deducing new information based on the learned patterns within the model.)

--

--