บันทึกผลการศึกษาและทดลองใช้ Local LLM +Vector Database ในเครื่อง

Published in

T. T. Software Solution

15 min readSep 16, 2024

บทความนี้ จะเป็นการศึกษาต่อเนื่องจาก บทความที่แล้ว ตาม URL ข้างล่างนี้ นะครับ โดยในครั้งนี้ จะเป็นการศึกษาเรื่อง LLM (Large Language Model)

บันทึกการศึกษา Generative AI และ Vector Database

สวัสดีครับ บทความนี้จะเป็น บันทึกการศึกษา Generative AI และ Vector Database ของผมนะครับ โดยตอนแรก ผมคิดว่า จะศึกษาถึง…

medium.com

ในบทความนี้ ผมได้เรียนรู้สิ่งต่างๆ เหล่านี้ครับ

Model มีกี่ประเภท
NLP (Natural Language Processing) คืออะไร?
รู้จักกับ Language Model
RNN vs Transformer
ออกแบบ Solution เพื่อทดลองทำเล่น
ทดลองใช้ Sentence-transformer model
LLM (Large Language Model) คืออะไร
การทำให้เครื่องรองรับการประมวลผล LLM ด้วย GPU
ทดลองใช้ LLM (GPT-Neo 125M)
ลองทำ Code Phototype Solution จริงๆกัน
ทดลองใช้ LLM ของแทร่ (scb10x/typhoon-7b)
มาลองทำอย่างอื่นกับ LLM ของแทร่ กันบ้าง

โดย Solution ที่ทดลองทำ จะมีหน้าตาตามภาพ ในต้นบทความ นะครับ

โมเดลมีกี่ประเภท

จากสัปดาห์ที่แล้ว ที่ผมสรุปความเข้าใจเรื่อง Neural Network ออกมาเป็นภาพนี้

ผมก็มีความสงสัยเอาเองว่า โมเดล ในโลกมันมีกี่ประเภท ผมเลยเริ่มต้นจาก การลองเข้าไปดูใน https://huggingface.co/ ซึ่งเป็นที่ที่รวม model ไว้ให้ หลากหลาย ทีนี้ ผมมาสังเกตุในกรอบซ้ายแดง ผมคิดว่า ถ้าแบ่งตามประเภท งาน หรือ Task ก็เข้าใจง่ายดี

ภาพข้างต้นแสดงการแบ่งประเภทของโมเดลจาก Hugging Face โดยจัดกลุ่มตามประเภทของงาน (tasks) ที่รองรับ ซึ่งสามารถแบ่งออกเป็นหมวดหมู่ต่าง ๆ ได้ดังนี้:

1. Multimodal (งานหลายประเภท):

โมเดลเหล่านี้รองรับการทำงานกับข้อมูลหลากหลายรูปแบบ เช่น รูปภาพ, ข้อความ, และวิดีโอ โดยแต่ละโมเดลในหมวดนี้มีความสามารถดังนี้:

Image-Text-to-Text: แปลงข้อมูลจากรูปภาพและข้อความเป็นข้อความ (เช่น คำบรรยายจากภาพ)
Visual Question Answering: ตอบคำถามเกี่ยวกับภาพ
Document Question Answering: ตอบคำถามจากเอกสาร
Video-Text-to-Text: แปลงข้อมูลจากวิดีโอและข้อความเป็นข้อความ

2. Computer Vision (วิสัยทัศน์คอมพิวเตอร์):

โมเดลในกลุ่มนี้ทำงานเกี่ยวกับภาพและวิดีโอ โดยมีความสามารถต่าง ๆ เช่น:

Depth Estimation: ประมาณความลึกจากภาพ
Image Classification: จัดหมวดหมู่ของภาพ
Object Detection: ตรวจจับวัตถุในภาพ
Image Segmentation: แยกส่วนของภาพ
Text-to-Image: สร้างภาพจากข้อความ
Image-to-Image: แปลงภาพหนึ่งไปเป็นอีกภาพหนึ่ง
Unconditional Image Generation: สร้างภาพใหม่จากการสุ่ม
Zero-Shot Image Classification: จัดหมวดหมู่ภาพโดยไม่ต้องฝึกฝนข้อมูลมาก่อน

3. Natural Language Processing (การประมวลผลภาษาธรรมชาติ):

โมเดลในกลุ่มนี้เกี่ยวข้องกับการประมวลผลข้อมูลในรูปแบบข้อความ เช่น:

Text Classification: จัดหมวดหมู่ข้อความ
Token Classification: จัดหมวดหมู่ token หรือหน่วยของข้อความ
Question Answering: ตอบคำถามจากข้อความ
Translation: แปลข้อความระหว่างภาษา
Summarization: สรุปข้อความ
Text Generation: สร้างข้อความใหม่
Sentence Similarity: คำนวณความคล้ายคลึงระหว่างประโยค

4. Audio (เสียง):

โมเดลเหล่านี้ทำงานเกี่ยวกับข้อมูลเสียง เช่น:

Text-to-Speech: แปลงข้อความเป็นเสียง
Automatic Speech Recognition (ASR): แปลงเสียงพูดเป็นข้อความ
Audio Classification: จัดหมวดหมู่เสียง
Voice Activity Detection: ตรวจจับการพูดในสัญญาณเสียง

5. Tabular (ข้อมูลตาราง):

โมเดลเหล่านี้ใช้ในการวิเคราะห์ข้อมูลตาราง เช่น:

Tabular Classification: จัดหมวดหมู่ข้อมูลตาราง
Time Series Forecasting: ทำนายข้อมูลอนาคตจากชุดข้อมูลที่เป็นลำดับเวลา

6. Reinforcement Learning (การเรียนรู้แบบเสริมกำลัง):

Reinforcement Learning: การเรียนรู้เชิงเสริมกำลัง ที่ให้ตัวแบบพัฒนาและเรียนรู้จากการทดลองและข้อผิดพลาด
Robotics: ใช้ในการควบคุมหุ่นยนต์

7. Other (อื่น ๆ):

Graph Machine Learning: การเรียนรู้เชิงกราฟ ใช้ในการวิเคราะห์ข้อมูลในรูปแบบกราฟ

การแบ่งหมวดหมู่นี้ทำให้สามารถเลือกโมเดลที่เหมาะสมสำหรับแต่ละงานได้ง่ายขึ้น

NLP (Natural Language Processing) คืออะไร?

ทีนี้ ผมก็เลยมาดูว่า หมวดไหน ใกล้เคียงกับ LLM มากที่สุด ผมคิดว่ามันคือ NLP เลย มาลองอ่านเรื่อง NLP ดังนี้

NLP (Natural Language Processing) หรือ การประมวลผลภาษาธรรมชาติ คือสาขาหนึ่งของปัญญาประดิษฐ์ (Artificial Intelligence, AI) ที่เน้นการทำให้เครื่องจักรหรือคอมพิวเตอร์สามารถเข้าใจ, ตีความ, และตอบสนองต่อภาษามนุษย์ได้อย่างมีประสิทธิภาพ ไม่ว่าจะเป็นการพูดหรือการเขียนในลักษณะที่เป็นธรรมชาติ

จุดมุ่งหมายของ NLP:

ทำให้คอมพิวเตอร์สามารถ เข้าใจ และ ประมวลผลข้อมูลจากข้อความหรือเสียงพูด ได้ เช่น การวิเคราะห์เอกสาร การแปลภาษา การสร้างข้อความ หรือการตอบคำถาม
การทำให้คอมพิวเตอร์สามารถ โต้ตอบกับมนุษย์ โดยใช้ภาษาธรรมชาติ เช่น การสนทนาอัตโนมัติ (Chatbot) หรือระบบผู้ช่วยเสียง (Voice Assistant) เช่น Siri, Alexa เป็นต้น

ส่วนประกอบสำคัญของ NLP:

Tokenization: การแยกข้อความออกเป็นหน่วยย่อย เช่น คำหรือกลุ่มคำ
Part-of-Speech Tagging (POS Tagging): การระบุประเภทของคำแต่ละคำในประโยค เช่น คำนาม คำกริยา เป็นต้น
Named Entity Recognition (NER): การตรวจจับและจำแนกชื่อเฉพาะ เช่น ชื่อคน สถานที่ หรือองค์กร
Sentiment Analysis: การวิเคราะห์ความคิดเห็นหรืออารมณ์ของข้อความ เช่น การตรวจจับว่าข้อความนั้นเป็นบวกหรือลบ
Machine Translation: การแปลข้อความจากภาษาหนึ่งไปยังอีกภาษาหนึ่ง
Text Generation: การสร้างข้อความใหม่จากข้อมูลที่ให้มา
Speech Recognition: การแปลงเสียงพูดเป็นข้อความ
Question Answering: การตอบคำถามโดยอิงจากข้อมูลที่มีอยู่

การใช้งาน NLP:

การค้นหาข้อมูล (Search Engines): การจัดลำดับและการแสดงผลลัพธ์ที่เกี่ยวข้องกับคำค้นหา
การแปลภาษา (Machine Translation): การแปลข้อความระหว่างภาษาต่าง ๆ เช่น Google Translate
ผู้ช่วยเสียงอัจฉริยะ (Voice Assistants): การทำงานของ Siri, Google Assistant, Alexa เป็นต้น
การสรุปเนื้อหา (Summarization): สรุปเอกสารหรือบทความขนาดยาวให้สั้นลง
การวิเคราะห์ความคิดเห็น (Sentiment Analysis): ใช้ในการวิเคราะห์ความรู้สึกของผู้ใช้จากโซเชียลมีเดียหรือความคิดเห็นในผลิตภัณฑ์

NLP เป็นเครื่องมือสำคัญที่ช่วยให้คอมพิวเตอร์สามารถเชื่อมโยงกับมนุษย์ได้อย่างมีประสิทธิภาพ

รู้จักกับ Language Model

ทีนี้ ผมก็คิดเอาเองต่อว่า ก่อนจะไปถึง Large Language Model ผมก็น่าจะควรรู้จัก Language Model ก่อน ดีไหม เลยดูมา ดังนี้

Language Models (LMs) เป็นหัวใจสำคัญของการประมวลผลภาษาธรรมชาติ (NLP) และใช้ในการทำความเข้าใจและสร้างข้อความในรูปแบบที่เป็นธรรมชาติ โมเดลเหล่านี้ทำหน้าที่เรียนรู้ความสัมพันธ์ระหว่างคำหรือข้อความในภาษาธรรมชาติ เพื่อใช้ในการแก้ปัญหาต่าง ๆ เช่น การแปลภาษา การสร้างข้อความ หรือการตอบคำถาม

ประเภทของ Language Models:

การแบ่งประเภทของ Language Models สามารถทำได้ตามโครงสร้างและวัตถุประสงค์ของการฝึกโมเดล โดยมีประเภทสำคัญดังนี้:

1. Traditional Models (ยุคแรกเริ่ม):

n-gram Models: เป็นโมเดลที่คาดการณ์คำต่อไปโดยอิงจากจำนวน n คำก่อนหน้า ตัวอย่างเช่น bigram (n=2) และ trigram (n=3) ข้อเสียคือจำกัดความสัมพันธ์ในระยะยาว และต้องการข้อมูลจำนวนมากเพื่อความแม่นยำ
Markov Models: ใช้หลักการของกระบวนการ Markov โดยสมมติว่าการพยากรณ์คำถัดไปขึ้นอยู่กับสถานะปัจจุบัน (หรือคำก่อนหน้าในประโยค) ข้อเสียคือคล้ายกับ n-gram ตรงที่มันมีข้อจำกัดในการจำความสัมพันธ์ระยะไกลในข้อความ

2. RNN-based Models (ยุคกลางๆ):

โมเดลที่ใช้โครงสร้างของ Recurrent Neural Networks (RNNs) สามารถจัดการกับข้อมูลที่มีลำดับเช่นข้อความได้ดีขึ้น เนื่องจากสามารถจดจำข้อมูลจากคำก่อนหน้าได้

RNNs (Recurrent Neural Networks): เป็นโครงข่ายประสาทเทียมที่วนลูปในตัวเอง ทำให้สามารถจัดการกับข้อมูลที่มีลำดับต่อเนื่องได้ แต่มีปัญหาเรื่องการจดจำข้อมูลระยะยาว (Vanishing Gradient Problem)
LSTMs (Long Short-Term Memory): เป็นรูปแบบที่พัฒนามาจาก RNN เพื่อแก้ไขปัญหาการจดจำข้อมูลระยะยาว โดยใช้โครงสร้างพิเศษที่ช่วยให้จำข้อมูลได้ดีกว่า RNN ทั่วไป
GRUs (Gated Recurrent Units): คล้ายกับ LSTM แต่โครงสร้างเรียบง่ายกว่า จึงทำงานได้เร็วขึ้นในบางงาน

3. Transformer-based Models (ล่าสุด เริ่มปี 2017):

โมเดลเหล่านี้ได้รับความนิยมอย่างมากในช่วงหลัง เนื่องจากสามารถจัดการกับข้อมูลระยะยาวได้ดีกว่าโมเดลแบบ RNN และยังสามารถฝึกได้อย่างขนานกันในหลายลำดับขั้นตอน

BERT (Bidirectional Encoder Representations from Transformers): โมเดลที่สามารถอ่านข้อความจากทั้งสองทิศทาง (ซ้ายไปขวาและขวาไปซ้าย) ทำให้เข้าใจบริบทได้ดีขึ้น เหมาะสำหรับงานเช่นการจัดหมวดหมู่ข้อความและตอบคำถาม
GPT (Generative Pre-trained Transformer): โมเดลที่เน้นการสร้างข้อความโดยคาดการณ์คำถัดไป ทำให้เหมาะสำหรับงานสร้างข้อความอัตโนมัติ เช่น การเขียนประโยคต่อ
T5 (Text-To-Text Transfer Transformer): ใช้แนวทางการแก้ปัญหาในรูปแบบ text-to-text ซึ่งหมายความว่าทุกงาน (เช่น การแปล การสรุป) จะถูกมองว่าเป็นปัญหาของการแปลงข้อความ
XLNet: พัฒนามาจาก BERT และใช้การฝึกแบบ autoregressive และ bidirectional ทำให้มีความสามารถในการสร้างข้อความและเข้าใจบริบทที่ดีขึ้น
ALBERT: เป็นรุ่นย่อของ BERT ที่ออกแบบให้เบากว่าและฝึกได้เร็วกว่า แต่ยังคงความแม่นยำสูง
RoBERTa: พัฒนามาจาก BERT โดยใช้ข้อมูลการฝึกมากขึ้นและปรับกระบวนการฝึกให้ดีขึ้น
ELECTRA: ใช้วิธีการฝึกแบบใหม่ที่ทำให้ประหยัดทรัพยากรมากกว่า BERT โดยสร้างโมเดลที่พยากรณ์ว่าแต่ละคำในประโยคถูกแทนที่หรือไม่

4. Specialized Models:

โมเดลเหล่านี้ถูกออกแบบมาเพื่อแก้ปัญหาเฉพาะ เช่น การแปลภาษา หรือการปรับขนาดโมเดลให้เล็กลง

Seq2Seq (Sequence-to-Sequence Models): โมเดลที่ใช้ในการแปลงข้อความจากลำดับหนึ่งเป็นอีกลำดับหนึ่ง เหมาะสำหรับการแปลภาษาและการสรุปข้อความ
Multilingual Models: โมเดลที่ถูกฝึกให้รองรับหลายภาษา เช่น mBERT (Multilingual BERT) และ XLM-R (Cross-lingual Language Model-Robust)
Distillation Models: เช่น DistilBERT ที่เป็นโมเดลย่อของ BERT เพื่อลดขนาดและความซับซ้อน แต่ยังคงความแม่นยำพอสมควร

การใช้งาน:

Language Models สามารถใช้ในงานหลากหลายประเภท เช่น:

การสร้างข้อความใหม่ (Text Generation)
การแปลภาษา (Machine Translation)
การสรุปข้อความ (Summarization)
การตอบคำถาม (Question Answering)
การจัดหมวดหมู่ข้อความ (Text Classification)

การเปลี่ยนแปลงจาก RNN ไปสู่ Transformer-based Models โดยเฉพาะอย่างยิ่งโมเดลอย่าง BERT และ GPT ได้สร้างความก้าวหน้าอย่างมากในวงการ NLP เนื่องจากช่วยแก้ปัญหาด้านประสิทธิภาพและความสามารถในการเข้าใจภาษาที่มีความซับซ้อนได้ดียิ่งขึ้น

RNN vs Transformer

ทีนี้ จากที่ผมศึกษามา มันมีจุดนึงที่น่าสนใจ คือ จุดเปลี่ยนระหว่าง RNN ไปยัง Transformer ลองอ่านตามกันดูนะครับ, แต่โดยสรุปเลย จะเป็นข้างล่างนี้
Vanishing Gradient Problem: เมื่อข้อมูลยาวมาก RNN มีแนวโน้มที่จะสูญเสียข้อมูลเก่าที่ผ่านมาเนื่องจากการลดทอนของสัญญาณ gradient ระหว่างการ backpropagation ทำให้ไม่สามารถจดจำข้อมูลระยะไกลได้ดี
Transformer ถูกออกแบบมาเพื่อแก้ปัญหาของ RNN โดยเฉพาะเรื่องการประมวลผลข้อมูลแบบขนาน (parallel) และการจำข้อมูลระยะไกล

ภาพที่คุณส่งมาแสดงโครงสร้างของ RNN (Recurrent Neural Networks) และ Transformer ซึ่งเป็นสองสถาปัตยกรรมที่ใช้กันอย่างแพร่หลายในงาน NLP เรามาเปรียบเทียบความแตกต่างและดูว่าทำไม Transformer ถึงถือว่าดีกว่าในหลายๆ ด้าน

1. RNN (Recurrent Neural Networks)

RNN เป็นโครงข่ายที่เหมาะกับข้อมูลที่มีลำดับเช่นข้อความ โดยจะนำข้อมูลจากลำดับก่อนหน้ามาใช้ในการคำนวณผลลัพธ์ของลำดับถัดไป
RNN จะทำการวนลูปในโครงข่าย (recurrent) โดยสถานะปัจจุบันของ RNN จะขึ้นอยู่กับสถานะก่อนหน้า ซึ่งช่วยให้มันสามารถจดจำข้อมูลในอดีตได้
ข้อเสียของ RNN:
Vanishing Gradient Problem: เมื่อข้อมูลยาวมาก RNN มีแนวโน้มที่จะสูญเสียข้อมูลเก่าที่ผ่านมาเนื่องจากการลดทอนของสัญญาณ gradient ระหว่างการ backpropagation ทำให้ไม่สามารถจดจำข้อมูลระยะไกลได้ดี
Sequential Processing: RNN ต้องทำงานแบบลำดับต่อเนื่องทีละขั้นตอน ทำให้การประมวลผลเป็นแบบลำดับ ซึ่งไม่สามารถขนานกันได้ (parallel) ส่งผลให้มีความช้าในการฝึกและคำนวณ

2. Transformer

Transformer ถูกออกแบบมาเพื่อแก้ปัญหาของ RNN โดยเฉพาะเรื่องการประมวลผลข้อมูลแบบขนาน (parallel) และการจำข้อมูลระยะไกล
โครงสร้างของ Transformer แบ่งออกเป็น Encoder และ Decoder ซึ่งแต่ละตัวประกอบด้วยโมดูลสำคัญเช่น Multi-Head Attention และ Feed Forward Neural Networks พร้อมการใช้ Positional Encoding เพื่อบอกลำดับของข้อมูล
Multi-Head Attention ช่วยให้โมเดลสามารถเข้าถึงทุกคำในประโยคพร้อมกันได้ ทำให้สามารถเข้าใจบริบทของทั้งประโยคได้ในคราวเดียวกัน ไม่เหมือน RNN ที่ต้องประมวลผลทีละคำ
ข้อดีของ Transformer:
Parallelization: Transformer สามารถประมวลผลข้อมูลหลายๆ คำพร้อมกันได้ในครั้งเดียว เนื่องจากใช้กลไก Self-Attention แทนการประมวลผลข้อมูลทีละลำดับ ทำให้ฝึกได้เร็วกว่าและมีประสิทธิภาพมากขึ้น
Long-Range Dependencies: การใช้ Attention ทำให้ Transformer สามารถจับความสัมพันธ์ระหว่างคำที่อยู่ไกลกันในประโยคได้ดีกว่า RNN
ไม่มี Vanishing Gradient: Transformer ไม่เจอปัญหา Vanishing Gradient เนื่องจากโครงสร้างของมันไม่มีการวนลูปเหมือน RNN

สรุปความแตกต่างหลักระหว่าง RNN และ Transformer:

การประมวลผลข้อมูล: RNN ต้องทำงานแบบลำดับ (sequential) ทีละขั้นตอน ในขณะที่ Transformer สามารถประมวลผลแบบขนาน (parallel) ได้
การจัดการข้อมูลระยะไกล: Transformer ใช้ Attention Mechanism เพื่อเชื่อมโยงคำที่อยู่ไกลกันในประโยคได้อย่างมีประสิทธิภาพ ในขณะที่ RNN อาจไม่สามารถจำข้อมูลที่อยู่ไกลกันได้ดี
ประสิทธิภาพ: Transformer สามารถฝึกได้เร็วกว่า RNN เพราะสามารถขนานการคำนวณได้ ไม่เหมือน RNN ที่ต้องทำงานทีละคำ
การใช้งานจริง: โมเดลอย่าง BERT และ GPT ที่เป็น Transformer-based นั้นประสบความสำเร็จอย่างมากในงาน NLP ต่างๆ เช่น การสรุปข้อความ, การแปลภาษา, และการสร้างข้อความ

โดยสรุป, Transformer ได้รับความนิยมมากกว่า RNN ในปัจจุบัน เนื่องจากมีความยืดหยุ่นสูง ประสิทธิภาพดีกว่า และสามารถขยายการทำงานไปยังข้อมูลขนาดใหญ่ได้

อ่านเพิ่มได้ที่นี่ นะครับ

RNN vs Transformers or how scalability made possible Generative AI?

LLMs are built on top of the Transformer architecture, but before Transformers, the leading architecture for building…

shchegrikovich.substack.com

ออกแบบ Solution เพื่อทดลองทำเล่น

เพื่อทำการศึกษา NLP และ LLM ที่จับต้องได้ ผมเลยออกแบบ solution สำหรับทดลองเขียนเล่น ซึ่งได้ออกมา ตามภาพด้านล่างครับ

ภาพข้างต้นมาแสดงกระบวนการทำงานที่เกี่ยวข้องกับ Sentence-transformer model และ LLM model โดยสรุปการทำงานดังนี้:

1. Sentence-transformer model (ฝั่งซ้ายของภาพ):

เมื่อผู้ใช้ส่งข้อความเข้ามา (เช่น “Can you explain cat?”) โมเดล Sentence-transformer จะแปลงข้อความนั้นเป็น vector หรือเวกเตอร์ซึ่งเป็นการแปลงข้อความให้อยู่ในรูปแบบตัวเลขที่คอมพิวเตอร์สามารถเข้าใจได้
หลังจากแปลงเป็นเวกเตอร์แล้ว จะนำเวกเตอร์นี้ไปค้นหาใน Vector Database (DB) เพื่อดูว่ามีเวกเตอร์ใดที่เคยถูกบันทึกไว้แล้วและมีความคล้ายคลึงกับคำถามที่ได้รับหรือไม่
ถ้ามีเวกเตอร์ที่คล้ายกัน: จะคืนผลลัพธ์จากฐานข้อมูลกลับไปให้ผู้ใช้ได้ทันที
ถ้าไม่มีเวกเตอร์ที่ตรงกัน: ระบบจะส่งคำถามนั้นต่อไปยัง LLM (Large Language Model) เพื่อให้ LLM สร้างคำตอบใหม่

2. LLM model (ฝั่งขวาของภาพ):

LLM จะทำการสร้างคำตอบโดยการใช้เวกเตอร์จากข้อความที่ได้รับมาคูณกับน้ำหนัก (weights) ภายในโมเดลเพื่อตอบคำถาม
เมื่อสร้างคำตอบแล้ว ระบบจะบันทึกเวกเตอร์ของคำตอบนั้นลงในฐานข้อมูลเวกเตอร์ เพื่อให้สามารถใช้ในอนาคตหากมีคำถามที่คล้ายกันเข้ามาอีก
สุดท้าย LLM จะส่งผลลัพธ์ (response) กลับไปให้ผู้ใช้

สรุปการทำงาน:

ข้อความที่ผู้ใช้ส่งเข้ามาจะถูกแปลงเป็นเวกเตอร์โดย Sentence-transformer model และนำไปค้นหาในฐานข้อมูลเวกเตอร์
ถ้ามีผลลัพธ์ที่ตรงกัน จะส่งคำตอบกลับไปทันที แต่ถ้าไม่ตรงกัน ระบบจะใช้ LLM ในการสร้างคำตอบใหม่
คำตอบที่สร้างได้จาก LLM จะถูกบันทึกลงในฐานข้อมูลเวกเตอร์เพื่อใช้งานในอนาคต

ข้อดีของระบบนี้:

การใช้ Vector Database ช่วยในการตอบคำถามที่เคยถูกถามมาก่อนโดยไม่ต้องสร้างคำตอบใหม่ ซึ่งช่วยประหยัดทรัพยากร
การใช้ LLM ช่วยให้สามารถสร้างคำตอบใหม่ในกรณีที่ไม่เคยมีคำถามนั้นมาก่อน

สำหรับการรัน Milvus ให้ไปรันจาก docker-compose จากบทความก่อนหน้าของผมนะครับ

ทดลองใช้ Sentence-transformer model

Sentence-transformer model คือโมเดลที่ใช้สร้าง sentence embeddings หรือ เวกเตอร์ของประโยค ซึ่งเป็นการแปลงข้อความให้เป็นเวกเตอร์เชิงตัวเลข (numerical vector) โดยโมเดลจะเข้าใจความหมายหรือบริบทของประโยคและเปลี่ยนประโยคนั้นให้เป็นเวกเตอร์ ซึ่งสามารถนำไปใช้ในงานต่างๆ ได้ เช่น การค้นหาความคล้ายคลึงของประโยค การจัดกลุ่ม หรือการเก็บใน vector database เพื่อการค้นหาที่มีประสิทธิภาพมากขึ้น

all-MiniLM-L6-v2:

โมเดล all-MiniLM-L6-v2 เป็นหนึ่งในโมเดลตระกูล sentence-transformer ซึ่งถูกออกแบบมาให้มีขนาดเล็กแต่ยังคงความแม่นยำสูงในการสร้าง embeddings ของประโยค โดยมีคุณสมบัติสำคัญดังนี้:

โมเดลนี้ถูกออกแบบให้มีขนาดเล็กและรวดเร็วกว่าโมเดล transformer ขนาดใหญ่ เช่น BERT หรือ RoBERTa โดยลดขนาดของโมเดลลงแต่ยังคงความสามารถในการจับบริบทของประโยค
L6 ในชื่อโมเดลหมายถึงจำนวนเลเยอร์ของโมเดลที่มีทั้งหมด 6 ชั้น ซึ่งน้อยกว่าโมเดลใหญ่ เช่น BERT ที่มีถึง 12 ชั้น ทำให้การคำนวณทำได้รวดเร็วขึ้นและใช้ทรัพยากรน้อยลง

การทำงานของโมเดล:

ตัวอย่างการใช้งาน:

การค้นหาความคล้ายคลึงของประโยค: เมื่อคุณมีประโยคจำนวนมากในฐานข้อมูลและต้องการหาประโยคที่มีความหมายใกล้เคียงกับประโยคที่กำหนด โมเดลนี้จะช่วยเปลี่ยนประโยคทั้งหมดเป็นเวกเตอร์ จากนั้นคำนวณความคล้ายคลึงของเวกเตอร์เพื่อค้นหาประโยคที่ใกล้เคียงที่สุด
การเก็บใน vector database: คุณสามารถเก็บเวกเตอร์เหล่านี้ไว้ในฐานข้อมูลแบบเวกเตอร์ เช่น Milvus หรือ FAISS เพื่อค้นหาข้อความอย่างมีประสิทธิภาพเมื่อข้อมูลมีขนาดใหญ่

โมเดล all-MiniLM-L6-v2 ถือว่าเหมาะสมมากสำหรับงานที่ต้องการความเร็วและการใช้ทรัพยากรน้อย แต่ยังต้องการความแม่นยำในระดับสูง เช่น การค้นหาความคล้ายคลึงของประโยคในระบบขนาดใหญ่

รันอันนี้ ก่อนนะคับ

pip install sentence-transformers

ตัวอย่าง Code สำหรับสร้าง vector

import warnings
from sentence_transformers import SentenceTransformer

# Suppress warnings related to transformers and PyTorch
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)

# Load the model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Example sentences
sentences = [
    "This is an example sentence.",
    "This is another example."
]

# Generate embeddings
embeddings = model.encode(sentences)

# Display the embeddings
for i, sentence in enumerate(sentences):
    print(f"Sentence: {sentence}")
    print(f"Embedding: {embeddings[i]}")

ตัวอย่างผลการรัน

LLM (Large Language Model) คืออะไร

LLM (Large Language Model) คือโมเดลภาษาขนาดใหญ่ที่ถูกฝึกด้วยข้อมูลจำนวนมหาศาลและมีโครงสร้างซับซ้อนซึ่งทำให้สามารถเข้าใจและสร้างข้อความที่มีความหมายได้อย่างมีประสิทธิภาพ โดย LLM มักมีจำนวนพารามิเตอร์ (หน่วยคำนวณภายในโมเดล) สูง ซึ่งช่วยให้มันสามารถเรียนรู้ความซับซ้อนของภาษาและบริบทต่าง ๆ ได้ดี

LLM ถูกใช้งานในหลากหลายรูปแบบ เช่น การสร้างข้อความ, การแปลภาษา, การตอบคำถาม, การทำสรุปข้อมูล เป็นต้น โมเดลเหล่านี้ถูกสร้างบนสถาปัตยกรรมแบบ Transformer ซึ่งช่วยในการจับความสัมพันธ์ระยะยาวในประโยคหรือข้อความ

ตัวอย่าง LLM ได้แก่ OpenThaiGPT, scb10x/typhoon-7b, และ EleutherAI/gpt-neo-125M

การทำให้เครื่องรองรับการประมวลผล LLM ด้วย GPU

ให้ดำเนินการดังนี้

Install PyTorch with CUDA Support

You might be running a CPU-only version of PyTorch. To ensure PyTorch uses the GPU, you need to install the version of PyTorch that supports CUDA (which allows PyTorch to use your NVIDIA GPU).

To install PyTorch with CUDA support, follow these instructions:

Go to the PyTorch installation page and select your configuration.
For example, if you are using CUDA 11.8, you can install PyTorch with CUDA support via pip:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

ทำการทำสอบเบื้องต้น ด้วย Code ดังนี้

import torch
print(torch.version.cuda)  # Should return the CUDA version, e.g., '11.8'
print(torch.cuda.get_device_name(0))  # Should print the name of your GPU

ตัวอย่างผลการทดสอบ

คำอธิบาย

ทดลองใช้ LLM (GPT-Neo 125M)

หลังจากเปิดใช้ GPU แล้ว เรามาลองสร้าง chat ง่ายๆ เพื่อคุยกับ LLM ตัวเล็กๆก่อน เช่น GPT-Neo 125M

GPT-Neo-125M เป็นโมเดลภาษาในตระกูล GPT-Neo ที่ถูกพัฒนาโดย EleutherAI เป็นเวอร์ชันที่มีขนาดเล็กโดยมีพารามิเตอร์ 125 ล้านตัว โมเดลนี้ใช้สถาปัตยกรรม Transformer เหมือนกับ GPT รุ่นอื่น ๆ แต่เน้นให้ทำงานได้ดีในงานสร้างข้อความ (text generation) เช่น การสนทนาและการตอบคำถาม ในขณะที่ใช้ทรัพยากรน้อยกว่าโมเดลขนาดใหญ่ เช่น GPT-3 ทำให้เหมาะสำหรับการใช้งานในระบบที่มีข้อจำกัดด้านหน่วยความจำและกำลังประมวลผล.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Print out whether the model will run on GPU or CPU
if torch.cuda.is_available():
    print("Using GPU:", torch.cuda.get_device_name(0))
else:
    print("Using CPU")

# Set cache directory to avoid re-downloading the model and tokenizer
cache_dir = "./gpt_neo_cache"

# Load the tokenizer and model from the cache or download if not present
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M", cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M", cache_dir=cache_dir)

# Add a pad token if not already present (using eos_token as pad_token)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Move the model to GPU if available
model.to(device)

def chat_with_model():
    print("Chatbot: Hello! I am a chatbot powered by GPT-Neo 125M. Type 'exit' to end the conversation.")
    
    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Tokenize input and move to GPU if available
        inputs = tokenizer(user_input, return_tensors='pt', padding=True, truncation=True).to(device)

        # Generate text using the model with sampling enabled (do_sample=True)
        outputs = model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=100,
            pad_token_id=tokenizer.eos_token_id,
            temperature=0.7,  # Controls randomness
            top_p=0.9,        # Nucleus sampling for diversity
            do_sample=True    # Enable sampling to allow temperature and top-p
        )

        # Decode and print the response from the model
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"Chatbot: {response}")

# Start the chat
chat_with_model()

ผลการทดลองรัน

จากผลที่รัน จะสังเกตุว่า คำตอบที่ตอบมา มันดูไม่ค่อยฉลาดเท่าไร คือเนื่องจาก มันเป็นโมเดล LLM ขนาดเล็ก แต่เพื่อทดลองเป็น phototype ง่ายๆ เราจะใช้มันไปก่อน

ลองทำ Code Phototype Solution จริงๆกัน

เมื่อเราได้ทำความเข้าใจ และเล่น Hello World กับทั้ง Sentence-transformer model และ LLM ตัวเล็กๆแล้ว เรามาลองทำ Phototype Solution จริงๆกันนะครับ

ก่อนรัน รันอันนี้กันก่อนนะ

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers sentence-transformers pymilvus

Code นะครับ

from sentence_transformers import SentenceTransformer
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility, db
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
import torch
import json

# Milvus connection details
MILVUS_HOST = 'localhost'
MILVUS_PORT = '19530'
MILVUS_URI = f"http://{MILVUS_HOST}:{MILVUS_PORT}"

# Database and collection name variables
database_name = "my_database"
collection_name = "text_embeddings_with_responses"

# Set cache directory for GPT-Neo model and tokenizer
cache_dir = "./gpt_neo_cache"

# Step 0: Remove existing database and collections
def remove_existing_database():
    # Connect to Milvus
    conn = connections.connect("default", host=MILVUS_HOST, port=MILVUS_PORT, db_name=database_name)

    # Check if the database exists
    if database_name in db.list_database():
        print(f"Database '{database_name}' exists. Proceeding to drop all collections.")

        # Use the existing database
        conn = connections.connect(host=MILVUS_HOST, port=MILVUS_PORT, db_name=database_name)

        # List and drop all collections
        collections = utility.list_collections()
        for collection in collections:
            utility.drop_collection(collection)
            print(f"Dropped collection: {collection}")

        # Drop the database
        db.drop_database(database_name)
        print(f"Database '{database_name}' has been dropped.")
    else:
        print(f"Database '{database_name}' does not exist.")

    # Create a new database
    db.create_database(database_name)
    print(f"Database '{database_name}' created.")

# Step 1: Define the schema for the collection
def create_collection():
    # Load the pre-trained model for embedding
    model = SentenceTransformer('all-MiniLM-L6-v2')

    # Define the fields schema (primary key, text, and embedding vector)
    fields = [
        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),  # Auto-generated ID
        FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=500),  # Store the original text
        FieldSchema(name="response", dtype=DataType.VARCHAR, max_length=500),  # Store the response
        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384)  # 384 is the dimension of the embeddings
    ]

    # Define the collection schema
    schema = CollectionSchema(fields, description="Collection to store text, response, and embeddings")
    
    # Create collection
    collection = Collection(collection_name, schema)
    
    return collection, model

# Step 2: Insert embeddings and response into Milvus with corresponding text
def insert_texts(collection, model, texts, responses):
    # Convert the text into embeddings
    embeddings = model.encode(texts)

    # Prepare data for insertion
    data = [
        texts,  # Text field
        responses,  # Response field
        embeddings.tolist()  # Embedding field
    ]

    # Insert the data into the collection
    collection.insert(data)
    
    # Ensure the data is saved
    collection.flush()

# Step 3: Create index for faster search
def create_index(collection):
    # Create an index on the "embedding" field
    index_params = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 128}}
    collection.create_index(field_name="embedding", index_params=index_params)

# Step 4: Search for similar text in the collection
def search_similar_texts(collection, model, query_text, top_k=5):
    # Convert query text into embeddings
    query_embedding = model.encode([query_text])  # Query is encoded as a list

    # Define the search parameters
    search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
    
    # Load collection into memory for search
    collection.load()

    # Perform similarity search
    results = collection.search(query_embedding, "embedding", param=search_params, limit=top_k, output_fields=["id", "text", "response"])

    # Check if there is a similar result
    if results[0]:
        for result in results[0]:
            if result.distance < 0.1:  # If the distance is small enough, consider it similar
                return result.entity.get("response")
    
    return None

# Step 5: Generate response using GPT-Neo 125M
def generate_response_gpt_neo(prompt):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    # Print if GPU is being used
    if device == 'cuda':
        print("Using GPU for inference.")
    else:
        print("Using CPU for inference.")

    # Load GPT-Neo 125M model and tokenizer with cache directory
    tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-125M", cache_dir=cache_dir)
    model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M", cache_dir=cache_dir).to(device)

    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=100, pad_token_id=tokenizer.eos_token_id)

    # Explicitly set clean_up_tokenization_spaces to False to suppress the warning
    response = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    return response

if __name__ == "__main__":
    # Step 0: Remove existing database
    remove_existing_database()

    # Step 1: Create Collection and Load Model
    collection, model = create_collection()

    # Step 3: Create Index
    create_index(collection)

    # Chat loop
    while True:
        query_text = input("You: ")

        # Exit conditions
        if query_text.lower() in ['exit', 'quit']:
            print("Exiting the chat. Goodbye!")
            break

        # Step 4: Search for a similar text in the collection
        existing_response = search_similar_texts(collection, model, query_text)
        
        if existing_response:
            print("Response from Database")
            print(f"Bot: {existing_response}")
        else:
            # Step 5: Generate response using GPT-Neo
            print("Response from GPT-Neo")
            new_response = generate_response_gpt_neo(query_text)
            print(f"Bot: {new_response}")
            
            # Insert the new query and response into the collection
            insert_texts(collection, model, [query_text], [new_response])

ผลการรัน นะครับ, จากที่ดู คิดว่า มันจับประโยคที่คล้ายกัน ยังได้ไม่ค่อยดีเท่าไร กว่าจะไปดึงจาก DB มา ก็แบบต้องคล้ายมากๆ เช่น แบบนี้ ถึงจะคล้าย ก็ค่อยเอาไว้ปรับปรุงกันต่อไปครับ วันนี้ขอเป็น phototype ก่อนแล้วกัน

Please explain cat for me?

Please explain cat for us?

ทดลองใช้ LLM ของแทร่ (scb10x/typhoon-7b)

จากตัวอย่างที่ลองกันที่ผ่านมา มันเป็น LLM ขนาดเล็ก, ทีนี้ ถ้าเราจะลอง LLM จริงๆ ผมแนะนำให้ลองเล่น scb10x/typhoon-7b ดูครับ

ก่อนรัน ติดตั้งของเหล่านี้ด้วยครับ

pip install transformers tokenizers
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Code สำหรับลองรันครับ

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Ask user to select device preference
user_input = input("Select device (cpu/gpu): ").lower()

# Check if GPU is available and user selected it
if user_input == 'gpu' and torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device('cpu')
    if user_input == 'gpu' and not torch.cuda.is_available():
        print("GPU not available, falling back to CPU")
    else:
        print("Using CPU")

# Set cache directory to avoid re-downloading the model and tokenizer
cache_dir = "./typhoon-7b_cache"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("scb10x/typhoon-7b", cache_dir=cache_dir)

# Load the model and use half precision (FP16)
# Only use FP16 if GPU is selected
torch_dtype = torch.float16 if device.type == 'cuda' else torch.float32
model = AutoModelForCausalLM.from_pretrained("scb10x/typhoon-7b", cache_dir=cache_dir, torch_dtype=torch_dtype)

# Move the model to the selected device
model.to(device)

# Tokenize input
input_text = "ช่วยอธิบายว่าแมวคืออะไร?"
inputs = tokenizer(input_text, return_tensors='pt', padding=True, truncation=True)

# Move inputs to the selected device
input_ids = inputs['input_ids'].to(device)
attention_mask = inputs['attention_mask'].to(device)

# Generate text using the model on the selected device
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=50)

# Decode and print the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated text:", generated_text)

อย่างไรก็ตาม ผมได้กดรันด้วย window 11 pro ของผม ผมพบปัญหาดังนี้

ใช้ GPU ไม่ได้ เพราะว่า memory ใน GPU ผม ไม่พอ (ไม่ได้ก๊อปรูปเก็บไว้) — อันนี้ ตอนแรกผมก็งงๆ ว่า GPU มันมี memory ด้วยหรือ เลยไว้ศึกษาต่อเพิ่มเติม ภายหลัง
ผมได้เขียน code ให้มัน ใช้ Model จาก Cache ในเครื่อง แต่พอรันจริง มันโหลด โมเดล มาให้ใหม่ตลอด ทำให้การรัน ค่อนข้างนานและลำบาก
รันค้างไว้ทั้งคืน ก็ยังรันไม่เสร็จ และใช้ resource เครื่องเยอะมากๆ

Error ตอนที่ Out of memory

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.04 GiB is allocated by PyTorch, and 3.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables

หลังจากนั้น ลองกลับมารันด้วย CPU แทน แต่รันยันเช้า ก็ยังไม่เสร็จ

จากรูป ผมเลย Ctrl+C ปิดมันไปก่อน เพราะจะใช้เครื่องทำงานอื่นต่อครับ ไม่งั้นใช้งนาเครื่องไม่ได้เลย

สรุป Fail ครับ และยังแก้ไขไม่ได้ T_T เป็นการบ้านไว้ลองต่อภายหลัง

มาลองทำอย่างอื่นกับ LLM ของแทร่ กันบ้าง

หลังจาก Fail จากการให้มัน response คล้าย chat, เลยเปลี่ยนให้ มันเป็นตัวช่วยสร้าง vector แล้ว นำไปเก็บใน vector DB จากนั้น เอาประโยคมาค้นหา ประโยคใกล้เคียงที่มีอยู่ใน vector DB ครับ, ซึ่ง code เป็นตาม ข้างล่างนี้

from transformers import AutoTokenizer, AutoModelForCausalLM
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility, db
import torch
import json

# Milvus connection details
MILVUS_HOST = 'localhost'
MILVUS_PORT = '19530'
MILVUS_URI = f"http://{MILVUS_HOST}:{MILVUS_PORT}"

# Database and collection name variables
database_name = "my_database"
collection_name = "thai_text_embeddings"

# Ask user to select device preference once
user_input = input("Select device (cpu/gpu): ").lower()

# Check if GPU is available and user selected it
if user_input == 'gpu' and torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device('cpu')
    if user_input == 'gpu' and not torch.cuda.is_available():
        print("GPU not available, falling back to CPU")
    else:
        print("Using CPU")

# Step 0: Remove existing database and collections
def remove_existing_database():
    # Connect to Milvus
    conn = connections.connect("default", host=MILVUS_HOST, port=MILVUS_PORT)

    # Check if the database exists
    if database_name in db.list_database():
        print(f"Database '{database_name}' exists. Proceeding to drop all collections.")

        # Use the existing database
        conn = connections.connect(host=MILVUS_HOST, port=MILVUS_PORT, db_name=database_name)

        # List and drop all collections
        collections = utility.list_collections()
        for collection in collections:
            utility.drop_collection(collection)
            print(f"Dropped collection: {collection}")

        # Drop the database
        db.drop_database(database_name)
        print(f"Database '{database_name}' has been dropped.")
    else:
        print(f"Database '{database_name}' does not exist.")

    # Create a new database
    db.create_database(database_name)
    print(f"Database '{database_name}' created.")

# Step 1: Define the schema for the collection and load the Typhoon-7B model
def create_collection():
    # Load the pre-trained Typhoon-7B model for embedding
    model_name = "scb10x/typhoon-7b"  # Adjust this if the model name is different
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    # Assign a pad_token if it's missing
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token  # Use eos_token as the padding token

    # Define the fields schema (primary key, text, and embedding vector)
    fields = [
        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),  # Auto-generated ID
        FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=500),  # Store the original text
        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=4096)  # Typhoon-7B produces large embeddings, adjust if needed
    ]

    # Define the collection schema
    schema = CollectionSchema(fields, description="Collection to store Thai text and embeddings")
    
    # Create collection
    collection = Collection(collection_name, schema)
    
    return collection, tokenizer, model

# Step 2: Insert embeddings into Milvus with corresponding text and auto-generated id
def insert_texts(collection, tokenizer, model, texts):
    model = model.to(device)

    # Tokenize and convert the text into embeddings using Typhoon-7B
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True).to(device)
    
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
        hidden_states = outputs.hidden_states[-1]  # Take the last layer hidden states
        embeddings = hidden_states.mean(dim=1)  # Average pooling for sentence embeddings

    # Prepare data for insertion
    data = [
        texts,  # Text field
        embeddings.tolist()  # Embedding field
    ]

    # Insert the data into the collection
    collection.insert(data)
    
    # Ensure the data is saved
    collection.flush()

# Step 3: Create index for faster search
def create_index(collection):
    # Create an index on the "embedding" field
    index_params = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 128}}
    collection.create_index(field_name="embedding", index_params=index_params)

# Step 4: Search for similar text and print all fields (id, text, distance) as JSON from loop
def search_similar_texts(collection, tokenizer, model, query_text, top_k=5):
    model = model.to(device)

    # Convert query text into embeddings
    inputs = tokenizer([query_text], return_tensors="pt", padding=True, truncation=True).to(device)
    
    with torch.no_grad():
        query_embedding = model(**inputs, output_hidden_states=True).hidden_states[-1].mean(dim=1)  # Average pooling for query embedding

    # Define the search parameters
    search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
    
    # Load collection into memory for search
    collection.load()

    # Perform similarity search
    results = collection.search(query_embedding.tolist(), "embedding", param=search_params, limit=top_k, output_fields=["id", "text"])

    # Prepare results in a list of JSON objects from loop
    json_results = []
    for result in results[0]:
        result_json = {
            "ID": result.id,
            "Text": result.entity.get("text"),  # Keep the text as it is
            "Distance": result.distance
        }
        print(json.dumps(result_json, indent=4, ensure_ascii=False))  # Print each JSON result with `ensure_ascii=False`
        json_results.append(result_json)  # Also append to list if needed later


if __name__ == "__main__":
    # Step 0: Remove existing database
    remove_existing_database()

    # Example Thai texts to store
    texts = [
        "สุนัขเป็นสัตว์เลี้ยงที่ซื่อสัตย์ต่อเจ้าของ",
        "แมวชอบปีนป่ายและเล่นในที่สูง",
        "ช้างเป็นสัตว์บกที่ใหญ่ที่สุดในโลก",
        "ประเทศไทยเป็นประเทศที่มีวัฒนธรรมที่หลากหลาย",
        "ม้าเป็นสัตว์ที่วิ่งเร็วและแข็งแรง"
    ]

    # Step 1: Create Collection and Load Typhoon-7B Model
    collection, tokenizer, model = create_collection()

    # Step 2: Insert Thai Texts
    insert_texts(collection, tokenizer, model, texts)

    # Step 3: Create Index
    create_index(collection)

    # Step 4: Search for Similar Texts and print each JSON from loop
    query_text = "ประเทศไทยเป็นประเทศที่มีธรรมชาติสวยงาม"
    search_similar_texts(collection, tokenizer, model, query_text)

ผลการรันครับ ใช้เวลารัน ประมาณ 15 นาที ครับ

ก็ถือว่าสำเร็จ เพราะได้คำตอบออกมา ใกล้เคียงข้อความที่ใช้ค้นหาเลย

อย่างไรก็ตาม การรันนี้ ก็ใช้ CPU/RAM/การอ่านเขียน DISK ของเครื่องเต็มสูบเหมือนกันครับ ระหว่างที่มันทำงาน แต่ผมไม่ทันได้ capture รูปมาไว้ เพราะลืมครับ

หวังว่าบทความนี้จะมีประโยชน์แก่ผู้อ่าน ขอบคุณที่ติดตามนะครับ