ปลดล็อกพลัง Gen AI ในเครื่องของคุณด้วย Ollama

Published in

SET-IT-TEAM

5 min readFeb 28, 2024

Introduction

บทความนี้จะพาไปรู้จักกับเจ้า ollama ครับ ซึ่งเป็นเครื่องมือที่ช่วยให้การรัน generative ai ในเครื่องของเราเอง (แบบ local) เป็นเรื่องง่าย เปิดทางให้เราสามารถสร้างผู้ช่วยส่วนตัวที่ personalize ตอบสนองต่องานที่แตกต่างกันของแต่ละคนโดยไม่ต้องเสียค่าใช่จ่ายรายเดือนใดๆ คับผมมมม

About Ollama

อย่างที่เกรินไปข้างต้นครับว่าเจ้า ollama เป็นโปรแกรม open source ที่ช่วยให้เราสามารถรัน Large Language Model (LLM) ได้โดยตรงจากในเครื่องของเรา เมื่อติดตั้ง ollama แล้วน้องเค้าจะจัดการให้เราแบบครบวงจรตั้งแต่การดาวน์โหลดโมเดล, model weight, configuration และข้อมูลที่จำเป็นอื่นๆ รวมถึงตั้งค่าใช้งาน gpu acceration ในเครื่องเราให้เสร็จสรรพเลยครับ

จุดเด่นของเจ้า Ollama สามารถแบ่งได้ออกเป็นสามจุดใหญ่ๆ ดังนี้ครับ

ติดตั้งง่าย (Easy to setup)
หลากหลายโมเดลให้เลือกใช้ (Fairly learge model library)
ต่อยอดได้ง่าย ตั้งค่าง่าย (Extensitble & Configurable)

1. Easy to setup

Multiple platform support

ณ ตอนที่เขียนบทความนี้เจ้า ollama มีตัวติดตั้งให้ใช้งานทั้งใน windows, macos, linux รวมถึงมี docker image ให้ด้วยครับ เรียกได้ว่าครบทุกแพลตฟอร์มกันไปเลย แต่ต้องออกตัวไว้ก่อนนะครับว่าฝั่ง windows ยังเป็นเวอร์ชั่น preview ประสบการณ์การใช้งานเลยอาจจะไม่เต็มร้อยนะครับ

How to setup on each platform

การติดตั้งก็ง่ายๆ เลยครับ แค่ดาวน์โหลดละติดตั้งตามขั้นตอนปกติของแต่ละ platform เลย

Windows Installation

https://github.com/ollama/ollama/releases/download/v0.1.26/OllamaSetup.exe

MacOS Installation

https://github.com/ollama/ollama/releases/download/v0.1.26/Ollama-darwin.zip

Linux Installation

curl -fsSL https://ollama.com/install.sh | sh

และสำหรับ Linux user สาย tech savvy โปรเจค ollama ก็มี manual installation ให้ด้วยครับ

https://github.com/ollama/ollama/blob/main/docs/linux.md

Prompt first command

เมื่อติดตั้งเรียบร้อยแล้วสามารถสั่งรัน ollama ได้เลยครับ ด้วยคำสั่งนี้

ollama run llama2

ในการสั่งรันครั่งแรกจะมีการโหลด model มาลงในเครื่องเรา ซึ่งขนาดของโมเดลก็จะขึ้นกับจำนวน parameter ของแต่ละโมเดลนะครับ

ตัวอย่างในรูปจะเป็นโมเดล llama2 เวอร์ชั่นพารามิเตอร์ 7b จากทาง Meta ซึ่งจะมีขนาด 3.8 GB ครับ (เพิ่มเติมเรื่องขนาดโมเดล llama2 https://ollama.com/library/llama2/tags)

หลังจากนี้ก็สามารถเริ่ม prompt คำสั่งได้เลยครับ

ทดสอบการ prompt คำสั่งแบบต่างๆ กับโมเดล llama2 7b

2. Multiple model support with fairly large model library

จุดเด่นถัดมาก็คือจำนวนโมเดลที่เจ้า ollama รองรับ เพราะนอกจากจะรองรับการรันโมเดลจากหลายค่าย หลายผู้พัฒนา ไม่ว่าจะเป็น Meta, Google, Amazon ยังสามารถสลับไปมาระหว่างโมเดลได้อย่างสะดวกสะบายอีกด้วยครับ เดี๋ยว section นี้ผมจะสอนวิธีการสลับ model และจะแนะนำโมลเดลเด่นๆ ที่มีอยู่ใน library

How to change model

การเปลี่ยนโมเดลก็ทำได้ง่ายๆ เลยครับ ก่อนอื่นเราต้องออกจาก prompt ของโมเดลเก่าก่อน ด้วยการพิมพ์ /bye นะครับ ต่อมาเราก็เลือกโมเดลที่เราจะเปลี่ยน อย่างในกรณีนี้ผมเลือกเป็นโมเดล llava เราก็พิมพ์คำส่ัง

ollama run llava

ถ้ายังไม่มีโมเดลตัวนี้อยู่ในเครื่องเรา เจ้า ollama ก็จะไปดำเนินการโหลดมาให้เสร็จสรรพพร้อมใช้งานครับ (ถ้าโมเดลโหลดไว้อยู่แล้วก็จะสามารถสลับไปใช้งานได้เลย)

ความสามารถของโมเดลนี้คือสามารถ input เป็นภาพเข้าไปได้นะครับ อย่างที่ผมลองก็คือรูปมือถือเครื่องเก่าที่ผมจับมาลง lineageOS (อัพเกรดจาก android 5.0 เป็น android 13) จะเห็นได้ว่าโมเดลรู้ด้วยว่าเป็น android phone

อีกตัวอย่างเป็นการตกแต่งในที่ทำงาน ช่วงปีใหม่ครับ จากคำบรรยายภาพที่โมเดล generate ขึ้นมาถือว่าเก็บรายละเอียดได้ดีเลยครับ

Introduce outstanding model

นอกจากสองโมเดลที่ได้ลองเล่นให้ดูแล้ว เจ้า ollama ยังรองรับโมเดลอื่นๆ อีกหลายตัวเลยครับ ที่นับคร่าวๆ ใน libray ตอนนี้น่าจะไม่ตำ่กว่า 60 ตัว

Ollama Model Library

library

Get up and running with large language models, locally.

ollama.com

Gemma

gemma

Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind.

ollama.com

ชื่อนี้ทุกคนอาจจะไม่คุ้นเคย แต่ถ้าเรียกว่า Gemini น่าจะคุ้นๆ กันบ้างมั้ยครับ เจ้า Gemma ตัวนี้เป็นโมเดลที่ใช้โครงสร้างเดียวกับ gemini ที่กูเกิลทำขายเลย แต่ตัวนี้จะเป็น opensource เต็มรูปแบบครับ

กูเกิลเพิ่งเปิดตัวมาเมื่อวันที่ 21 กุมภา 2024 ที่ผ่านมานี้เอง ณ วันที่เขียนบทความคือวันที่ 27 เอามาลง ollama ได้แล้ว ไวสุดๆ ไปเลย

จุดเด่นของโมเดลนี้นอกจากจะใช้โครงสร้างเดียวกับ gemini ตัวเสียเงินแล้ว ทาง google ยังอวดว่า gemma ขนาด 7B สามารถเอาชนะคู่แข่งระดับเดียวกันอย่าง llama2 ขนาด 7B จากทาง Meta ได้ด้วย (llama2 คือโมเดลตัวแรกที่ demo ให้ดูในบทความนี้)

ดูประกาศจากทาง google เต็มๆ ได้ที่นี้

Gemma: Introducing new state-of-the-art open models

Gemma is a family of lightweight, state\u002Dof\u002Dthe art open models built from the same research and technology…

blog.google

Orca-mini

orca-mini

A general-purpose model ranging from 3 billion parameters to 70 billion, suitable for entry-level hardware.

ollama.com

จุดเด่นของโมเดลนี้คือดีไซน์มาให้ใช้งานได้ใน entry-level hardware ครับ ใครที่คอมไม่ค่อยแรงอาจจะพิจารณาลองเจ้าตัวนี้ดูครับ มีให้เลือกสามขนาดตั้งแต่ 7b, 13b และ 70b โดยตัวเล็กสุดขนาด 7b ใช้คอมที่มีแรมขั้นต่ำ 8gb ก็รันได้แล้วครับ

เปเปอร์งานวิจัยของโมเดล

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the…

arxiv.org

Codellama

codellama

A large language model that can use text prompts to generate and discuss code.

ollama.com

ตัวนี้เป็นโมเดลที่ specialize ในแง่การเขียนโค้ดและการ discuss เกี่ยวกับโค้ดนะครับ โดยเป็นการพัฒนาต่อยอดจากโมเดล llama2 อีกทีหนึ่ง สามารถเขียนได้หลากหลายภาษาโปรแกรมเช่น Python, C++, Java, PHP, Typescript (Javascript), C#, Bash

3. Extensible & Configurable

ความดีงามอีกอย่างนึงของเจ้า ollama ก็คือด้วยความที่มันเป็น opensouce จึงทำให้มีคอมมูนิตี้โปรเจคมากมายที่มาใช้งานเจ้า ollama ไม่ว่าจะเป็นในแง่ของ ui สำหรับใช้ prompt คำสั่งแบบสวยๆ ไม่ต้องมาพิมพ์ prompt ใน terminal หรือจะเป็น application ในมือถือที่ interface กับเจ้า ollama ที่รันในเครื่องเราแบบนี้ก็มีครับ

Community Project

ตัวแรกที่จะแนะนำก็คือ Ollama-ui

GitHub - ollama-ui/ollama-ui: Simple HTML UI for Ollama

Simple HTML UI for Ollama. Contribute to ollama-ui/ollama-ui development by creating an account on GitHub.

github.com

เป็นทางเลือกที่เร็วสะดวกที่สุดก็ว่าได้เพราะมาในรูปแบบ chrome extension ครับ หลังจากที่เราติดตั้ง ollama ในเครื่องเราเรียบร้อย โหลดโมเดลมาแล้วก็ไปติดตั้ง extension ตัวนี้ได้เลย

https://chrome.google.com/webstore/detail/ollama-ui/cmgdpmlhgjhoadnonobjeekmfcehffco

ติดตั้งแล้ว enable extension แล้วก็ pin เอาไว้สักหน่อย จากนั้นก็กดเข้าไปได้เลยครับ

pin ollama-ui extesion ไว้เพื่อให้เข้าถึงง่าย

เข้ามาแล้วเราก็จะเจอกับ ui ง่ายๆ ให้เราสามารถแชทคุยกับ ollama ได้ โดยเราสามารถเลือกสลับใช้งานโมเดลที่มีอยู่ในเครื่องได้เลยครับ จาก dropdown มุมบนขวา

ทดสอบการใช้งาน ollama-ui คุยกับโมเดล llama2

อีกตัวที่น่าสนใจคือ maid เป็น flutter application ที่รองรับการเชื่อมต่อกับ ollama มีให้ใช้ทั้ง android และ windows

GitHub - Mobile-Artificial-Intelligence/maid: Maid is a cross-platform Flutter app for interfacing…

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI…

github.com

Rest API

นอกจากนี้ ollama ยังรองรับการสั่งงานผ่าน rest api ด้วยครับ โดยหลังจากติดตั้งแล้วเจ้า ollama จะรันใน background รอเราอยู่ที่ localhost:11434 ครับ

curl http://localhost:11434/api/generate -d ‘{
“model”: “llama2”,
“prompt”:”Why is the sky blue?”
}’
curl http://localhost:11434/api/chat -d ‘{
“model”: “mistral”,
“messages”: [
{ “role”: “user”, “content”: “why is the sky blue?” }
]
}’

Full API Documentation

ollama/docs/api.md at main · ollama/ollama

Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/docs/api.md at main ·…

github.com

Ollama Alternative

ไม่กี่วันมานี้ทาง nvidia ก็เพิ่งออกมาเปิดตัว chat with rtx เป็น client สำหรับ chat bot generative ai สามารถโหลดมารันในเครื่องคอมที่มีการ์ดจอ nvidia ตระกูล RTX ตั้งแต่ 30 ขึ้นไปครับ โดยเจ้าต้วนี้จะใช้ประโยขน์จาก tensor core ที่อยู่ใน gpu Geforce RTX 30 ขึ้นไปในการรัน generative ai ให้ประมวลผลได้เร็วยิ่งขึ้นครับ

(เครื่องต้องมี gpu Geforce RTX 30 ขึ้นไปและมี VRAM ไม่ต่ำกว่า 8GB นะครับ)

Chat with RTX Now Free to Download | NVIDIA Blog

New tech demo gives anyone with an NVIDIA RTX GPU the power of a personalized GPT chatbot, running locally on their…

blogs.nvidia.com

Summary

ทามกลางสงคราม AI ระหว่างบริษัท Tech ยักษ์ใหญ่ที่กำลังดูเดือดในช่วงนี้ เราในฐานะผู้บริโภคจึงได้ประโยชน์จากการแข่งขันพัฒนา LLM โมเดลไปเต็มๆ เลยครับ เจ้า ollama จะช่วยให้เราสามารถใช้งานโมเดลเหล่านี้ได้สะดวกมากขึ้น เปิดทางให้เราสามารถสร้าง AI ผู้ช่วยส่วนตัวที่ตอบสนองต่อความต้องการของแต่ละคนได้ดีขึ้นหวังว่าจะช่วยให้งานของทุกคนง่ายขึ้นนะครับ :)

ปลดล็อกพลัง Gen AI ในเครื่องของคุณด้วย Ollama

Introduction

About Ollama

1. Easy to setup

Multiple platform support

How to setup on each platform

Prompt first command

2. Multiple model support with fairly large model library

How to change model

Introduce outstanding model

library

Get up and running with large language models, locally.

Gemma

gemma

Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind.

Gemma: Introducing new state-of-the-art open models

Gemma is a family of lightweight, state\u002Dof\u002Dthe art open models built from the same research and technology…

Orca-mini

orca-mini

A general-purpose model ranging from 3 billion parameters to 70 billion, suitable for entry-level hardware.

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the…

Codellama

codellama

A large language model that can use text prompts to generate and discuss code.

3. Extensible & Configurable

Community Project

GitHub - ollama-ui/ollama-ui: Simple HTML UI for Ollama

Simple HTML UI for Ollama. Contribute to ollama-ui/ollama-ui development by creating an account on GitHub.

GitHub - Mobile-Artificial-Intelligence/maid: Maid is a cross-platform Flutter app for interfacing…

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI…

Rest API

ollama/docs/api.md at main · ollama/ollama

Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/docs/api.md at main ·…

Ollama Alternative

Chat with RTX Now Free to Download | NVIDIA Blog

New tech demo gives anyone with an NVIDIA RTX GPU the power of a personalized GPT chatbot, running locally on their…

Summary

Written by Micky Chanachai