ลองทำ Perplexity clone เล่น ๆ ครับ

5 min readJun 17, 2024

(Note: for my non-Thai fans, the English version will come soon :)

เรื่องมันเริ่มต้นมาจากการที่ผมอยากลองท้าทายตนเองดูครับว่า ถ้าผมจะลองทำ application ที่มีฟังก์ชั่นคล้าย ๆ Perplexity (search engine ผสม LLM) ดู และไม่ใช้ API KEY ใด ๆ เลยจะทำได้มั้ย มีไอเดียยังไงบ้าง เลยเกิดไอเดียของโพสนี้ขึ้น โดยวิธีทำคร่าว ๆ ของผม:

ใช้ scraper เพื่อดึงข้อมูล จาก DuckDuckGo — วิธีนี้จะทำให้เราไม่ต้องพึ่ง search engine service อย่าง Serp และไม่ต้องใช้ Vector store เพื่อดึงข้อมูลครับ

ใช้ Ollama เพื่อ serve Qwen โมเดล — ใช้เป็น Open source model เพราะ ไม่อยากกังวลเรื่อง token limit

ใช้ Multi agents technique เพื่อทำให้ model สามารถคิดคำตอบที่สมเหตุสมผล ออกมาได้ครับ — หรือก็คือ การลด Hallucation

ถ้าสนใจแล้วกดอ่านต่อได้เลย หรือถ้าไม่ scroll ผ่าน ๆ ก็ยังดีครับ

ใช้ scraper เพื่อดึงข้อมูล จาก DuckDuckGo

วิธีการทำ RAG นั้น เราจำเป็นต้องมีวิธีการดึงข้อมูลมาใส่กับ prompt ที่เรายิงเข้าไปที่ LLM ใช่มั้ยครับ โดยวิธีที่หลาย ๆ คนใช้มักเป็นการใช้ Vector store เพื่อหาข้อมูลที่มีบริบทใกล้เคียงกับตัวคำถามมากที่สุด

แต่ในโจทย์ผมอยากจะทำให้แตกต่างออกไปด้วยการไม่ใช้เทคนิคที่เขาใช้ ๆ กันครับ โดยผมเลือกที่จะใช้วิธียิงคำถามไปที่เข้าไปที่ DuckDuckGo หรือที่เรารู้จักกันว่าเป็น search engine ที่มีความเป็นส่วนตัวสูงนั้นเองครับ (จริง ๆ Google ก็ทำได้นะ แต่การป้องกัน scraper ของ Google ค่อนข้างโหดอยู่ เลยมาใช้ของเว็บนี้แทนครับ)

การทำงานของเว็บไซต์ DuckDuckGo

เมื่อเราลอง query อะไรสักอย่างเข้าไปใน search engine ของ DuckDuckGo มันจะการสร้างตัวแปรตัวหนึ่งชื่อ vqd ขึ้นมาครับ เราจำเป็นต้องแนบตัวแปรตัวนี้เพื่อยิง api ของ DuckDuckGo ที่ได้มามาจากการ inspect ด้วยครับ

def search(q: str) -> list:
    vqd = get_vqd(q)

    raw_search_results = get_data(q, vqd)
    cleaned_results = [
        clean_data(raw_search_result)
        for raw_search_result in raw_search_results
    ]
    return cleaned_results

จากตัวอย่าง code เราจะเห็นว่า function search จะทำการเอาตัวแปร vqd จาก get_vqd ก่อนนะครับ โดยวิธีนี้จะ request หน้าเว็บ แล้วสกัดข้อมูลด้วย BeautifulSoup ครับ แล้ว vqd ที่ได้มาจะไปใช้เพื่อ request API ที่ inspect มาจากข้างหลังบ้านครับ

ตัวอย่างผลลัพธ์

if __name__ == "__main__":
    print(search("ดูดวงราศีเมถุน"))

# Output:
# [
#   {'header': '...', 'sample_text': '...', 'url': '', 'post_date': ''}
#   {'header': '...', 'sample_text': '...', 'url': '', 'post_date': ''}
# ...
# ]

ใช้ Ollama เพื่อ serve Qwen โมเดล

Ollama อาจเรียกได้เป็น Docker สำหรับ LLM เลยก็ได้ครับ มันคือ tool ที่ทำให้ serve LLM เป็น API ง่ายมาก โดยไม่จำเป็นต้องใช้ GPU เลย อย่างกรณีผม คือ ใช้ Macbook Pro M3 Pro ในการสาธิตครับ (เอาจริงก็แอบแรงอยู่ 555)

Introduction to Ollama

สำหรับวิธีการติดตั้ง Ollama สามารถ ดูได้จากลิงค์นี้ครับ

ถ้าติดตั้งเสร็จแล้ว เราจะสามารถใช้คำสั่ง ollama จาก terminal ของเราได้ครับ

example:

# ถ้าลองคำสั่งนี้ดูต้องไม่เกิด error ครับ
ollama help

จากนั้น Run

ollama serve

เพื่อเริ่ม Ollama service ครับ

Our large language model

ณ เวลาที่ผมกำลังทำ project นี้ model ที่ชื่อว่า Qwen2 ก็ถูกปล่อยออกมาทันที และที่คะแนนการจาก benchmark ที่ดูดีมาก ๆ เลยถือโอกาสลองใช้ไปในตัวครับ

เราสามารถ pull model ด้วยคำสั่งนี้ครับ

ollama pull qwen2:7b-instruct-q6_K

คำสั่งจะการดึงโมเดล Qwen2 เวอร์ชั่น 7b-instruct-q6_K มาไว้บนเครื่อง หรือจะดึงเวอร์ชั่นอื่น ๆ ได้จากลิงค์นี้ครับ

การดาวโหลดจะใช้เวลาสักพักหนึ่งเมื่อดาวโหลดเสร็จสามารถทดลองรันโมเดลด้วยคำสั่ง

ollama run qwen2:7b-instruct-q6_K

ซึ่งจะมาให้เราสามารถ โต้ตอบ กับโมเดลได้เหมือนกับ Chat interface เลยครับ ถ้าต้องการออกให้พิมพ์ว่า /bye

การเรียกใช้ใน Python

import ollama

MODEL = "qwen2:7b-instruct-q6_K"
OLLAMA_HOST = "http://localhost:11434"
ollama_client: ollama.Client = ollama.Client(host=OLLAMA_HOST)

response = ollama_client.chat(
    model=MODEL,
    messages=[
        {
            "role": "system",
            "content": "SYSTEM PROMPT"
        },
        {"role": "user", "content": "USER_PROMPT"},
    ],
    stream=False,
)
print(response["message"]["content"])

Multi-agent technique

ตอนนี้เราก็ได้มี model ที่เราต้องการจะใช้แล้วนะครับ อันดับถัดไป คือ การเรียกใช้ model พวกนั้นให้ทำงานตามที่กระบวนการที่เราตั้งไว้

จากรูปภาพผมจะให้ process ส่งคำถามของ user เข้าไปที่ scraper tool เพื่อ query ข้อมูลจาก DuckDuckGo แล้วเอา ผลลัพธ์ของการ query ส่งต่อไปที่ agent ชื่อ “validate and summerize” เพื่อสรุปข้อมูล และส่งต่อให้ user proxy เพื่อตอบคำถาม user ครับ

validate and summerize

Agent ตัวนี้จะทำการตรวจสอบว่าข้อมูลที่ได้สามารถเอาไปใช้ตอบคำถาม user ได้มั้ย ถ้าได้จะทำสรุปให้ด้วย แต่ถ้าไม่ได้จะทำการคัดข้อมูลนั้น ๆ ออกไป วิธีการนี้ออกไป

initial:

validate_document เช็คว่า document สามารถเอาไปใช้ได้เปล่า

context เอา document แต่ละตัวเข้าฟังก์ชั่น validate_document

def validate_document(question, document):
    summary_prompt = get_summary_prompt(question, document)
    response = ollama_client.chat(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an expert in lawfirm who are assigned to consider whether a text data source "
                    "is useful to answer a user question or not. If yes, you will summarize the text "
                    "which corespond user's question for another expert to write answer the user , otherwise, do nothing. "
                    """You answer must be in JSON format with field:
"is_useful": boolean determining whether the source is useful,
"summarize: string your summarization refering the part for the text or empty string if not useful
"""
                    " return your answer only  and do not include prologue, prefix or suffix"
                ),
            },
            {"role": "user", "content": summary_prompt},
        ],
        stream=False,
    )
    return response["message"]["content"]

def get_context(question: str) -> str:
    contexts = []

    search_results = search(question)
    for search_result in search_results:

        document = search_result["header"]
        context = validate_document(question, document)


        is_useful = context["is_useful"]
        search_result["is_useful"] = is_useful
        search_result["summarize"] = context["summarize"]
        if is_useful == 1:
            contexts.append(search_result)
        if len(contexts) >= 5:
            break

    line_template = """Header:
    {header}

    Summary:
    {summarize}

    Reference: {url}
    """

    writing_prompt = "\n".join([line_template.format(**context) for context in contexts])
    return writing_prompt

User proxy

พอเราได้วัตถุดิบที่จะมาใช้เขียนคำตอบให้ user แล้ว โดยเราจะให้ agent อีกตัวรับข้อมูลที่สรุปมา

initial:

def write_the_output(question: str, context_prompts: str):
    response = ollama_client.chat(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful user's assistance, and your secretary already "
                    "prepared gists from the search engine results for "
                    "you to answer user's question. "
                    "Your duty is to answer the question "
                    "with confidence using the prepared "
                    "data source as a reference. "
                    "You must add your idea to support user to understand sources. "
                    "You must add the reference of data source with URL "
                    "and encourage user to find out more information with it. "
                    "You must answer with the same language that the user uses."
                ),
            },
            {"role": "user", "content": f"""Question: {question}

Context:
{context_prompts}
"""},
        ],
        stream=True,
    )
    return response

ประกอบ code ทั้งหมดเข้าด้วยกัน

def suplexity(question):
    context_prompts = get_context(question)
    stream = write_the_output(question, context_prompts)
    return stream

Note: ผมตั้งชื่อใหม่ว่า Suplexity นะครับ นึกถึง Brock Lesner เลย 555

ตัวอย่าง คำถาม — คำตอบ

Question: “ดูดวงราศีเมถุน”

display_answer(suplexity("ดูดวงราศีเมถุน"))

Answer:

จะเห็นว่า ผลลัพธ์ที่ออกมาจะมีการอ้างอิงลิงค์ที่มาของคำตอบด้วย เช่น อย่างลิงค์ก็จะนำทางไปยังคอลัมท์ดูดวงของเว็บ Sanook ครับ

https://www.sanook.com/horoscope/264603/

Question: “สรุปข่าวหุ้นไทย วันนี้”

Answer:

ผลลัพธ์อาจจะดูไม่ค่อย สวยมาก เพราะ ทำแบบรีบ ๆ ด้วย แต่ก็หวังว่าจะเป็นแนวทางพัฒนาการทำ LLM RAG ที่เสริมความคิดสร้างสรรค์ขอท่านได้นะครับ

สำหรับ full code สามารถดูได้ที่ลิงค์ github ข้างล่างได้นะครับเลยนะครับ

GitHub - batprem/mini-perplexity-clone

Contribute to batprem/mini-perplexity-clone development by creating an account on GitHub.

github.com

Reference:

https://www.promptingguide.ai/techniques/rag

Download Ollama on macOS

Get up and running with large language models.

ollama.com

https://ollama.com/library/qwen2