Using Ollama to run local LLMs on your computer
With Ollama it is possible to run Large Language Models locally on your PC. In this post I will show you how you can install and use the software.
What is a Large Language Model?
A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word. The famous ChatGPT by OpenAI is based on a large language model, which enables users to refine and steer a converstation.
Install Ollama and download LLMs
Let’s start by downloading Ollama from the offical website ollama.com.
You will find two downloads button on the screen and by pressing one of them you are redirected to the official download page. Here you can select your operating system, in my case Windows and download the corresponding file.
If you use the link in the top right corner, called Models. You will get a list of all available Large Language Models, which can be downloaded and used locally by Ollama.
After the installation you can open a Terminal and use the ollama
command. By calling ollama pull <model name>
you can download the Large Language Model. I want to try Phi-2, a LLM by Microsoft.
After the download of the model is complete, we can use ollama run <model name>
to start a conversation with the corresponding model. You just need to enter your prompt and the model will answer accordingly.
By typing /bye
you can exit the command. If you add the --verbose
parameter to the call, you will receive some additional statistics at the end of the response.
Ollama also acts a server, so we are able to write code to simulate a chat conversation. I will show you two ways how you can access the Ollama server using Python. I assume that you have already Python installed on your machine. Let’s open Visual Studio Code and create a new folder ollama
. In this folder add a file called requirements.txt
, which contains all the needed packages. In our case we need langchain_community
and requests
.
Now we create another file in the folder called main-langchaincommunity.py
. This file uses the langchain_community
package to connect to the Ollama server and invokes a simple command, which will be printed to the console.
If you open another Terminal window. You can switch to the created folder and call our Python script. You will see a programming joke on the console.
I will show you another approach by using the requests
package. Let’s create a new file called main-api.py
in our folder. On localhost:11434
the Ollama server is running and it is providing the endpoint api/generate
to generate a response. We just configure the headers
and the data
objects and finally we are able to call our generate_reponse
method in Python.
If we open the Terminal window again and call our main-api.py
script you will get also the answer from the locally running Large Language Model.
Conclusion
In this post I’ve explained to you, how you can easily install Ollama on your Windows machine and use Large Language Models locally.
You will find the used code on my GitHub repository.