A guide to run LLM on Akash Network with KoboldCPP

TAB
7 min readAug 24, 2024

--

“Akash is an open network that lets users buy and sell computing resources securely and efficiently. Purpose-built for public utility.” https://akash.network/

“KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI” https://github.com/LostRuins/koboldcpp

This guide is not by any means a detailed guide about Akash Console / KoboldCPP / LLM. This guide is intended to demonstrate how to run LLM of your choice on Akash Network.

Also, I will assume you’re somewhat familiar with cryptocurrency and know how to acquire $AKT token. If you’re coming from a different ecosystem, I would suggest to use keplr wallet to interact with Akash Console.

If you’re not a crypto user, Akash Network is working on fiat payment system that will allow user to use Akash Network without having to interact with cryptocurrency. You can track the current development here https://github.com/orgs/akash-network/projects/5?pane=issue&itemId=68556494

For this guide I’ll be using Meta-Llama-3.1–8B-Instruct-Q6_K.gguf from bartowski, and this docker image from Docker Hub https://hub.docker.com/r/koboldai/koboldcpp

A good place to start to get GGUF/GGML quantized model :
https://huggingface.co/bartowski
https://huggingface.co/TheBloke

GGUF file format is the successor to GGML, if you’d like to learn more about GGUF/GGML check out:
https://medium.com/@phillipgimmi/what-is-gguf-and-ggml-e364834d241c
https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

The YAML configuration for this guide can be found at the bottom of the page.

Alright, let’s get started:

First we need to open Akash Console https://console.akash.network/ and connect your wallet.

Open the SDL Builder on the left side bar.

For the Service Name I’ll use ‘koboldcpp’ ( feel free to change it with any name you prefer, as it won’t affect the workload )

And for the Docker Image/OS enter koboldai/koboldcpp ( the image we’re going to use from Docker Hub )

For the resource I’ll enter 2 CPU, 4 GB Memory, tick the GPU option and select Nvidia P100 16 Gi from the drop down menu.

For the model I’m using ( Meta-Llama-3.1–8B-Instruct-Q6_K.gguf ) and the context size ( 8192 ), 16GB VRAM would be plenty to run it with acceptable generation speed + currently it’s one of the cheapest 16GB GPU I could get :p

VRAM Calculator :

https://huggingface.co/spaces/DavidAU/GGUF-Model-VRAM-Calculator

This will help you figure out how much VRAM you’re going to need for the model you’re planning to run.

Note:

  • It’s possible to run GGUF format with CPU and system RAM but the generation will be much slower compared to GPU.
  • You can check the available gpus and price/hr approximation to run them on https://akash.network/gpus/
  • If you want to get bid on all the available GPU on Akash, just select the vendor name, and and leave everything else empty.

For the Ephemeral Storage, enter the amount of storage your model is going to need + approximately 3 GB for KoboldCPP, Cloudflare tunnel and additional system files. For the model I’m using ( Meta-Llama-3.1–8B-Instruct-Q6_K.gguf ) it will require 6.6 GB, therefore I’ll enter 10 GB for the total storage ( 3 GB + 6.6 GB and rounding it up ).

Next will be Environment Variables,

Click on the Edit button and add the first Key and Value

Key: KCPP_MODEL

Value: https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q6_K.gguf ( copy and paste the model download link of your choice from huggingface )

Add the second variables by clicking the “Add Variable” button

Key: KCPP_ARGS

Value: --usecublas --contextsize 8192 --gpulayers 999

When you’re done click on the close button.

Note:

For Expose section click on the Edit button and change

Port: 5001 ( KoboldCPP default port )

As: 80

This will enable you to access the KoboldCPP frontend directly from the Akash provider generated link.

When you’re done click the on the Close button.

Almost done, for the Placement click on the Edit button. I’ll change the Name to akash and set the pricing to 50000

Feel free to change the Name to your preference. For the pricing, it will depend on resource you’re trying to get, you might have to increase the pricing if you didn’t get any bid from the providers.

Click Done when you’re done.

Note:

  • Hover on the “i” to get information about the settings.

Done.

Now lets deploy our workload. Click on the “Deploy” button at the top of the page.

It will redirect you to the YAML configuration page, check if everything is correct and then click on the “Create Deployment” button.

Sign the transaction.

Next you’ll be greeted with the bid page.

You can click on the provider link to get more info about the provider and check on the “Audited” to filter out non Audited providers

Select the provider you want to deploy your workload with,

then click on the “Accept Bid” button and sign the transaction.

Next, you will be redirected to your Deployment Details, click on “Logs” to see the process. Wait a couple of minutes until you see the cloudflare tunnel link.

Now, you have 2 option to access the KoboldCPP frontend, either you copy the cloudflare link and paste it on a new browsing page, or you can connect from the provider generated link by going to the “Leases” tab

Personally I tend to use the cloudflare tunnel instead of the provider direct link. ( both will work just fine tho )

After you choose which link you would like to connect from, you’ll be greeted with the KoboldCPP frontend.

Depending on your model, don’t forget to change the instruction preset in the Settings menu. Also there are default scenario in the scenario menu.

Note :

  • There are many ways to set your system prompt and samplers settings, which could effect your output, but that is beyond the scope of this guide. I would recommend browse around to learn more about the settings. Usually you can find the recommended settings on the author model page of your choice on huggingface.

That’s it, now you can start to chat with your AI.

When you’re done, don’t forget to close your deployment. Head back to Akash Console, click on the … ( meatballs menu ?? ) and Close the Deployment.

Or you could go to the Deployments menu on the left side bar and close it from there.

YAML for this guide :

---
version: "2.0"
services:
koboldcpp:
image: koboldai/koboldcpp
expose:
- port: 5001
as: 80
to:
- global: true
env:
- >-
KCPP_MODEL=https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q6_K.gguf
- KCPP_ARGS=--usecublas --contextsize 8192 --gpulayers 999
profiles:
compute:
koboldcpp:
resources:
cpu:
units: 2
memory:
size: 4GB
storage:
- size: 10GB
gpu:
units: 1
attributes:
vendor:
nvidia:
- model: p100
ram: 16Gi
placement:
akash:
pricing:
koboldcpp:
denom: uakt
amount: 50000
deployment:
koboldcpp:
akash:
profile: koboldcpp
count: 1

If you feel adventures you could try to install SillyTavern as your frontend on your local device to unlock more advance features and connect it to KoboldCPP instance you’ve deployed on Akash Network as the backend.

https://sillytavernai.com/
https://github.com/SillyTavern/SillyTavern

Example:

https://x.com/txartblock/status/1822970075487535537

I hope you find this guide useful. If you have any questions or feedback, feel free to hit me up on twitter ( yes I’m still calling it twitter :p )

https://x.com/txartblock

--

--