Takeaways from YouTube session — Prompt-Engineering for open- Source LLMs

Feng Li
3 min readJan 24, 2024

--

Duffin Creek, Pickering, ON, Jan 21, 2024

Dr. Sharon Zhou had this amazing live session on YouTube yesterday which covers prompt, RAG for Mistral. This post is some of my takeaways. Please check out this YouTube link for details.

1 Prompt-engineering is not engineering

Prompt is to come up with a style of communication with LLMs. That style is expected by specific LLMs to best understand our inquiries so they respond with their best capabilities.

So prompt a LLM is just to write a string which is in favor of that LLM. Prompt is just string.

2 How do we know which LLM likes which style?

Different LLMs like different styles of communication/prompt.

When using close-source LLMs/AI Apps like ChatGPT, it does prompt for our text before sending to GPT4. In addition we are also seeing lots of tips how to talk to ChatGPT.

From OpenAI API doc, we can see prompt examples when constructing payload sending to OpenAI APIs. We used it in one previous post to call OpenAI API from Snowflake function like below: “role” and “content” in prompt string are telling GPT 3.5 turbo about our text who understands them.

    my_payload_dict = {"model": "gpt-3.5-turbo", 
"messages": [{"role": "user",
"content": my_sentence}],
"temperature": 0.7}

While Anthropic Claude needs to be fed with “human”/”assistant” format strings to chat.

When using open-source LLMs, like Mistral, Llama2, documentation mentions what format should be used to fine tune Mistral base model.

Prompt example sources can be found from Gitub of this session. Essentially following format of the string is expected by Mistral model so we can look for proper responses.

hydrated_prompt = f"<s>[INST] <<SYS>>\n{prompt['system']}\n<</SYS>>\n{prompt['user']} [/INST]"

As we know that LLM base model (Foundation Model in Bedrock) will just predict next words, fine tuning trains them to be able to chat and reasoning. So it’s the fine tuning that decides what style a fine-tuned LLM would like to talk. (Check Understanding Generative AI for details.)

3 “Pants”

“Pants” is an analogy of the format of the prompt string from Dr. Sharon Zhou. It’s like LLM model expects our text to wear “pants” to show up so it won’t get surprised or embarrassed and doesn’t know what to say!

4 RAG is just a way of prompt our text

RAG just concatenates custom data messages to out text doing prompt.

This simple RAG Python code shows what’s going on under the hood. In the source code:

When custom data is being loaded, it splits into chunks which then are being embedded to vectors.

When our question text comes in, it’s embedded into vectors too and used to find top-k matches from above custom vectors. This top-k strings are concatenated with our original question text to form an “augmented” prompt to be sent to Mistral model.

No vector store is used in this simple demo but vector store is only needed to keep custom embedded vectors unless you have TB data.

Happy Reading!

--

--

Feng Li

Software Engineer, playing with Snowflake, AWS and Azure. Snowflake Data Superhero 2024. SnowPro SME, Jogger, Hiker. LinkedIn: https://www.linkedin.com/in/fli01