Configure LM Studio for Apple Silicon: Run Local LLMs with faster time to completion.

Apple Silicon’s Power: Maximizing LM Studio’s Local Model Performance on Your Computer

Ingrid Stevens
5 min readDec 9, 2023

In this guide, I’ll streamline LM Studio for optimal performance on your computer. Simple adjustments, like harnessing your GPU for completion requests, can boost LLM response speed by 87% (tested on a 32GB RAM M1). Let’s dive in!

What is LM Studio?

LM Studio is an application you can use to run local, open source models on your computer. It’s like having ChatGPT without the need to have internet connection, or be concerned about leaking personal data to OpenAI. They also offer a Local Inference Server, which allows you to use any model running on LM Studio as a drop-in replacement for the OpenAI API.

Install & Set up LM Studio for Apple Silicon

Download LM Studio

Navigate to https://lmstudio.ai/ and download the version which suits your machine.

Set up LM Studio for M1/M2/M3 Mac (Apple Silicon)

Without this setup, LM Studio uses the CPU. To take advantage of the Apple Silicon, change these settings:

Please note: only change these settings if you can afford to — it really depends on the config of your Mac.

  1. Open the Chat tab (💬) in the left sidebar
  2. Open Settings -> Change to “Default LM Studio macOS”
Select Settings for Chat interface

3. Confirm your changes “Accept New System Prompt”

Confirm system changes

5. “Keep entire model in RAM” — it depends…

This one really depends on your machine. If you change these settings, you may see this “Experimental Warning” which gives you a guideline on whether or not this is likely to fry your computer:

Do you know what you’re doing? 😉

I have 32GB RAM on my M1 machine, therefore, I turn use_mlock ON and get faster time to first token.

More than 16GB RAM? You’re probably safe to turn use_mlock on

If you are limited in RAM, leave this off since you might crash your computer.

Less than or equal to 16GB RAM? You probably want to keep use_mlock off

6. Enable “Apple Metal (GPU)”

7. Reload your model & (if necessary) restart LM Studio to ensure the changes have taken effect.

Comparison of Results

I did a quick comparison using codellama instruct 7B q4_0 gguf from TheBloke and asking “Write a python function that calculates prime numbers” with and without these settings and compared the speed of completion, as well as visually looked at my Activity Monitor charts.

TLDR: with these settings enabled, we get 87.7% faster generations.

Compare Time to First Token

WITHOUT “Apple Metal GPU” or “Default LM Studio macOs” enabled
WITH “Apple Metal GPU” and “Default LM Studio macOs” enabled
  • Time to first token was 3.73s without the settings, and reduced to 0.69s with these settings: 81.5% faster
  • Time to completion was 18.12s compared with 2.23s with these settings applied: 87.7% faster

Compare CPU & GPU Usage

FYI: to get these charts on the right, open the application “Activity Monitor” and run ⌘3 and ⌘4

WITHOUT “Apple Metal GPU” or “Default LM Studio macOs” enabled
WITH “Apple Metal GPU” and “Default LM Studio macOs” enabled

By following the steps outlined in this guide for installing and configuring LM Studio, you can use the potential of your Apple M1/M2/M3 Mac. The comparison results speak for themselves: 87.7% faster time to completion when utilizing the “Apple Metal GPU” and switching to “Default LM Studio macOS.”

These improvements not only enhance performance but also optimize resource utilization, as visually evident in the Activity Monitor charts (specifically the “GPU History” — where it is evident that the default settings don’t actually take advantage of the M1/2/3.

Aside: How to Save Your Custom Presets

I’ve shown how you can adjust the “Default LM Studio macOS” settings to optimize memory usage and utilize “Apple Metal (GPU).” However, this means that the “Default LM Studio macOS” preset option no longer aligns perfectly with the default settings. Now, even when loading the default settings, I found I was still having to manually adjust these settings each time. To streamline this process, I’ve exported the entire configuration as a JSON preset file, named “Apple Silicon.” You can find it on GitHub as a gist, and it will appear in your presets when you upload it “Import Preset From File…” (see screenshot below).

Import Preset From File… → Upload JSON preset file to have all apple configurations.

Thank You!

Thank you so much for reading this far, and I hope you have as much fun exploring these open source models locally as I have!

To check out how to run a local inference server, please visit this article:

If you have any questions, please feel free to leave a comment!

A big thank you to the following source, who inspired this post:

--

--