Steering LLMs with Prompt Engineering

Published in

Better Programming

9 min readJun 2, 2023

*Image via iStock by Getty Images under license to William Zheng*

Large Language Models (LLMs) have captured our attention and imagination in the past six months since the announcement of ChatGPT. However, LLMs’ behaviors are often stochastic in nature, making it difficult for them to be integrated into a business application with well-defined limits. In this article, we will explore some ways of making LLMs more predictable and controllable through prompt engineering.

“So, what is prompt engineering?”

In a sense, if you have used ChatGPT, you were engaging in prompt engineering. As we ask the GPT3.5/4 (that’s the LLM behind ChatGPT… if you’re reading this in 2023) a question, then many times providing it with additional follow-up information, we are essentially prompting the LLM to produce a downstream answer that we find useful. The great thing about this process is that it comes naturally to us, like holding a back-and-forth conversation with another human, except with an AI/LLM instead. In short, prompt engineering is basically the methodology of using the appropriate prompts to produce the desired response back from the LLM.

But, what if we want to do this programmatically? What if we can design our prompts beforehand and then use them to steer or control the LLM response?

To get a general idea of what we’re talking about, think about the typical back-and-forth prompt interactions, where after the initial prompt, we are engaged in a context loop of prompts and responses with the LLM:

The part we want to handle programmatically is the shaded area labelled here as the “prompt loop context.”

In the programmatic version of our process, it will look like this:

Programmatic prompt loop context with LLM

Here conceptually, we can provide a set of prompts (it can be sequential or have more complex algorithmic logic) that will be issued to the LLM as well as to our programmatic control environment. It will allow us to preconfigure and execute the process in an automated way.

Still, to make this work well, there are some key assumptions to consider:

We need to know what kind of prompts can produce what kind of response from the LLM.
We need to anticipate generally the kind of prompts a potential user will issue to the LLM.

Unfortunately (or fortunately), the way to deterministically weigh up these assumptions is not an exact science, at least as of the time of my writing this. But depending on your requirements, you may have some idea of what is involved in assumption no. 2 (e.g., you’re anticipating your users to ask about promotions for clothing items and not coming to ask your business about politics). And then, with some testing and practice with the LLM, you may also improve your understanding of the LLM’s response capability to satisfy assumption no. 1. So, with the background context out of the way, let’s proceed with the code walkthrough.

Code Walkthrough

This walkthrough will be covered through a series of steps, with code written in Python. We will also use OpenAI’s GPT3.5 as our main LLM (I will cover why we may want to use a secondary LLM later). There are several programming options to engage with OpenAI’s LLM via their API; you can use the direct OpenAI API implementation or use LangChain, or for this tutorial, we will be using an open source Python library called PanML. You can find PanML’s GitHub and documentation here: https://github.com/Pan-ML/panml

Step 1: Setup OpenAI API key

If you haven’t already done so, creating an OpenAI API key requires an OpenAI account. The sign-up process and creating the API key are relatively quick to do.

Just go to their website: https://platform.openai.com/, and once you have created an account, click on your profile icon in the top right corner of the page and then select “View API keys.” Next, you should see the option for creating the key “+ Create new secret key.” You can copy this key and make sure to keep it safe. You will need to use this key in the latter step.

Step 2: Install PanML

pip install -U panml

Step 3: Setup LLM using OpenAI’s API

lm = ModelPack(model="text-davinci-003", source="openai", api_key="Your key")

We instantiate the LLM using OpenAI’s API backend, using the API key as input to the api_key function argument.

Step 4: Test it out

lm.predict("How to improve my fitness?", max_length=600)["text"]

1. Start with a goal: Decide what you want to achieve with your fitness program. 
Do you want to lose weight, build muscle, or improve your overall health? 
2. Make a plan: Create a plan that outlines how you will reach your goal. 
...

Step 5: Modify the prompt

# Set the prompt modifier
prompts = [
    {"prepend": "As a runner, tell me:"},
]

lm.predict("How to improve my fitness?", 
           prompt_modifier=prompts, max_length=600)["text"]

1. Increase your mileage gradually. Start by adding a few miles to your 
weekly runs and build up gradually over time. 
2. Incorporate interval training into your runs. Interval training involves 
alternating between periods of high-intensity running and periods of rest 
or low-intensity running.
...

In this step, we have included an additional text: “As a runner, tell me,” where it is prepended to the initial prompt: “How to improve my fitness?”. This results in a different response, with the context being more specific and relevant for a runner — which makes sense.

You may have noticed that we have introduced a function argument: prompt_modifier. The prompt_modifier is a Python list variable designed to include the prompts we want to include in our programmatic prompt loop context. You can think of each element in the list as a prompt add-on that corresponds to the sequence position of our programmatic prompt loop.

Step 5: Modify the prompt with added prompt sequence

# Set the prompt modifier
prompts = [
    {"prepend": "As a runner, tell me:"},
    {"prepend": "What tools can I use to support each point:"}
]

lm.predict("How to improve my fitness?", 
           prompt_modifier=prompts, max_length=600)["text"]

Tools: 
1. A running app to track your mileage and progress. 
2. A timer to help you keep track of your intervals.
...

Now, we have executed a chain of prompts to produce a final output from the LLM. And the result has been steered towards telling us more about the tools that can improve my fitness in the running context.

Step 6: Modify the prompt for output filtering

In this use-case scenario, we are showing how we can filter the output of LLM by preventing it from engaging in certain topics.

Consider the scenario that you’re making an LLM application about cooking and lifestyle. During use, a user may engage with the LLM by putting forward various queries, and there will likely be some queries that fall outside the scope of the intended purpose of your LLM application. For example:

# Query the LLM
queries = [
    "What is the best way to cook steak?",
    "How do you vote in the US election?",
]

output = lm.predict(queries, max_length=600)
print("\n\n".join(output))

 The best way to cook steak is to use a combination of high heat and 
short cooking time. Start by preheating a heavy skillet or grill over 
high heat. Season the steak with salt and pepper and then add it to the 
hot pan. Sear the steak for 1-2 minutes per side, then reduce the heat 
to medium-high and cook for an additional 3-4 minutes per side, or 
until the steak reaches the desired doneness. Let the steak rest for 
5 minutes before serving. 

 In the United States, voting in federal elections is done through a 
state-run process. To vote in a federal election, you must be a U. S. citizen, 
at least 18 years old on Election Day, and a resident of the state in which 
you are voting. You must also register to vote in your state before you can 
cast a ballot. Registration requirements vary by state, so you should check 
with your local election office for specific information.

So, even though you may consider the capability of the LLM to be quite advanced, you still may not want to surface the query’s response for various reasons. A potential method to control and “filter” our LLM response can be achieved by using the prompt_modifier, and specifically, the control in the form of a Python function:

# Custom keyword filter function
def my_keyword_filter(text):
    keywords_to_refuse = ["politic", "election"]
    text = text.lower()
    refuse = [word for word in keywords_to_refuse if word in text]
    
    # Set responses based on keywords
    if len(refuse) == 0:
        return f"Break into details: {text}"
    else:
        return "Produce response to politely say I can't answer"
    
# Set the prompt modifier
prompts = [
    {},
    {'transform': my_keyword_filter},
]

# Query the LLM
queries = [
    "What is the best way to cook steak?",
    "How do you vote in the US election?",
]

output = lm.predict(queries, prompt_modifier=prompts, max_length=600)
print('\n\n'.join(output))

1. Preheat a heavy skillet or grill over high heat. 
2. Season the steak with salt and pepper. 
3. Add the steak to the hot pan. 
4. Sear the steak for 1-2 minutes per side. 
5. Reduce the heat to medium-high. 
6. Cook for an additional 3-4 minutes per side. 
7. Check the steak for desired doneness. 
8. Let the steak rest for 5 minutes before serving.

I'm sorry, I'm not able to answer that at this time.

In this use case, we include our own custom keyword filter function into the prompt_modifier for execution. The demo shown here is just a simple example, showing the filtering logic applied based on keywords found in the context of our prompt loop. Then, we can instruct the LLM to refuse to answer when certain keywords are caught.

Step 7: Modify the prompt for LLM-assisted output filtering

As a variation of the above filtering approach, we can achieve a similar outcome by employing an LLM to help filter out the topics we don’t want to respond to. Here, we are leveraging an LLM’s capability in semantic understanding to hopefully provide more effective coverage aligned with our filter's intent. We test this by removing the “election” keyword from our topics with the intuition that our evaluation LLM will identify queries about elections as similar to “politics” and then filter them out in the final response.

First, we will need to set up an LLM for evaluation. We have opted for Google’s FLAN-T5 model (large) in this example. You can play around with other ones or smaller models (as long as it is good enough for the purposes of topic classification):

# Set the evaluation LLM
lm_eval = ModelPack(model="google/flan-t5-large", source="huggingface")

# Custom topic filter function
def my_topic_filter(text):
    topics_to_refuse = ["politics"]
    
    # Use LLM to evaluate topic similarity
    topics = lm_eval.predict(f"Identify one word topics in:\n {text}")['text'].split(',')
    refuse = 'no'
    for topic in topics:
        for refuse_topic in topics_to_refuse:
            refuse = lm_eval.predict(f"Answer yes or no. Is {topic} similar to {refuse_topic}?")['text'].lower()
            if refuse == 'yes':
                break
                
    # Set responses based on LLM evaluations
    if refuse == 'no':
        return f"Break into details: {text}"
    else: 
        return "Produce response to politely say I can't answer"

# Set the prompt modifier
prompts = [
    {},
    {'transform': my_topic_filter},
]

# Query the LLM
queries = [
    "What is the best way to cook steak?",
    "How do you vote in the US election?",
]

output = lm.predict(queries, prompt_modifier=prompts, max_length=600)
print('\n\n'.join(output))

1. Preheat a heavy skillet or grill over high heat. 
2. Season the steak with salt and pepper. 
3. Add the steak to the hot pan. 
4. Sear the steak for 1-2 minutes per side. 
5. Reduce the heat to medium-high. 
6. Cook for an additional 3-4 minutes per side. 
7. Check the steak for desired doneness. 
8. Let the steak rest for 5 minutes before serving. 

I'm sorry, I'm not able to answer that at this time.

Now, by using the evaluation LLM for topic similarity classification, we can achieve a similar result without the burden associated with defining all of the relevant keywords in our initial filter. However, this method may come at the cost of trade-offs regarding increased memory, compute processing, and latency since we are issuing more calls to an LLM in our prompt loop.

Conclusion

So, that’s it! In this article, we have covered the general idea behind prompt engineering, its potential use cases, some ideas about how we can use prompt engineering to steer or control our LLM, and a code walkthrough demonstrating all of this in action using PanML and OpenAI APIs.

As a final note, PanML is an open source high-level Python library designed to help data scientists and machine learning engineers experiment and run LLMs in their local environment with ease. The code and documentation links to the library are as follows:

Thanks for taking the time to read this article. I plan to write more about data science and machine learning. Thus, if you find value in this piece, hit the follow button to stay updated with more articles by me.

Want to Connect?

You can find my profiles on LinkedIn and Twitter.