Introducing Promptimizer – an Automated AI-Powered Prompt Optimization Framework

9 min readJul 26, 2024

GitHub - austin-starks/Promptimizer: A system to optimize any arbitrary prompt

A system to optimize any arbitrary prompt. Contribute to austin-starks/Promptimizer development .

github.com

An argument that LLM Practitioners often make is that prompt engineering is more of an art than a science. It requires gut feel, manual tweaking, and lots of practice to create the perfect prompt that conforms to your goals and expectations.

But what if… it didn’t have to?

Close your eyes, and imagine a world in which you could give an AI model a list of inputs and expected outputs, and it automatically generates the best possible prompt for your specific use-case.

Now open your eyes. You are now in this world.

I’m sick and tired of prompt engineering. So I’m making a prompt optimizer (Part 1)

Prompt engineering sucks.

ai.plainenglish.io

The Automatic Prompt Optimizer

I created an open-sourced prompt optimization engine. Utilizing genetic algorithms, the Promptimizer iteratively improves any arbitrary prompt to reach the best possible performance.

Practically, this means:

Robust, accurate prompts: With this automated approach, you can create the best possible prompts, while you spend your effort on determining the real use-cases and shaping how you want it to respond
Evaluation framework: Get concrete metrics on how your prompts are performing (in and out of sample)
Unparalleled steerability: Control everything about how your AI responds, including the tone, response length, accuracy, or any quantifiable metric

In one example, the accuracy of the prompt increased from 70% to 84–85%. This is a dramatic improvement that used a relatively small (40 examples) dataset.

Graph measuring the change in the average performance over time

I’m sick and tired of prompt engineering. So I made an automated prompt optimizer (Part 2)

When I first started learning about artificial intelligence, it had absolutely nothing to do with large language…

medium.com

Here’s how you can achieve similar improvements with any prompt you can imagine.

How to use the Promptimizer

I just tried out the new GPT-4o mini model and am BEYOND Impressed

The easiest way to find new stocks. Despite how powerful GPT-4o is, I don't tend to use it often…

nexustrade.io

Step 1: Define Your Goal

When creating your system prompt, you need to understand the desired behavior of the LLM.

Most commonly, the goal is to create syntactically valid JSON that corresponds to the user input. For example, if I’m creating an LLM-Powered Stock Screener, I want to return a valid SQL query that I can run against my database.

An example of an AI-Powered Stock Screening application

However, the goal doesn’t always have to be returning data in a certain format. Perhaps you want the model to ask certain questions or get clarification before diving in.

For example, if I’m creating an LLM-Powered legal assistant, if the user is asking a legal question, the first step is NOT to just start talking about how different jurisdictions have different laws and that they should consult a lawyer for legal advice…

The very first step is for the model to ask the user where do they live. Then, a reasonable next step is for the model to fetch information about the laws in that jurisdiction.

Whatever you want your agent to do, it must have a concrete, definable, and quantifiable goal and sub-goals. Then, you will create a list of system prompts that accomplishes those goals.

I used NexusTrade to Identify Fundamentally Strong AI Stocks. They are DEMOLISHING the Market

No Exaggeration. These 5 stocks are leaving the rest of the market in the dust. 🤔 How can we harness the power of AI…

nexustrade.io

Step 2: Create a list of approximately 5 (different) system prompts

This is the step that will undoubtedly take the most time, but if you’re familiar with LLM Applications, it shouldn’t take you more than an hour.

You must sit down and create a population of prompts. While it is possible to get an LLM to generate prompts for you, you will have much more success if you do the leg work in creating unique, different prompts that accomplish your goal.

Within this framework, a prompt is an object with the following 3 attributes:

systemPrompt: Instructions that steer the model towards the desired behavior
examples: A list of conversations that you’d want to have with the model
model: What specific model that you’re using, e.g. GPT-4o mini

Currently, only the system prompt and examples are changed during the optimization process. However, you can imagine a world in which the model that is used is also optimized automatically. Maybe Haiku is great at certain tasks and GPT-4o-mini is better at others!

After creating your list of prompts, get ready to define your model behavior.

I changed my Analytical Data Storage to BigQuery and Holy Hell!

I can't stress enough how game-changing it was when I switched my analytical data storage to BigQuery.

nexustrade.io

Step 3: Create a list of inputs and populate their desired outputs

Outputs and inputs for directing my AI-Powered Stock Screener

Similar to supervised learning, in order to steer the model towards the desired behavior, we need to know exactly how we want the model to respond to a wide range of inputs.

To do this, you will update the file input.ts . You will add filenames and inputs you want the model to understand. There’s already a concrete example populated in the repo.

Then, you will execute the script populateGroundTruth.ts . This script allows you to create ground truths in a semi-automated way.

The script is likely more involved than you need. For example, it includes logic for querying a table in big query and presenting the results. This is because my specific use-case required evaluating the outputs of queries, but again, this framework can be used to optimize any arbitrary prompt.

The more examples you include, the more accurate results you’ll get. But be careful! It’s also true that the more expensive it will be to optimize. You may have to get creative in figuring out how to balance between cost and accuracy.

Anthropic Dominates OpenAI: A Side-by-Side Comparison of Claude 3.5 Sonnet and GPT-4o

Anthropic is beating OpenAI at their own game. Austin Starks ∙ 9 min read ∙ View on Medium Anthropic Dominates OpenAI…

nexustrade.io

Step 4: Create a scoring heuristic for your model

Using some method (such as a large language model), you need to be able to quantify how close your output is to your desired output. You can do this using the LLM-based “Prompt Evaluator” within the repo.

The “Prompt Evaluator” takes the output of the model and the expected output and returns a score. While, in theory, the scores can be unbounded, a good start is to score each answer on a scale from 0 to 1. Another alternative is to have a range from 0 to 5 or -1 to 1. As long as the scoring guide makes sense, you’ll create an algorithm that works towards it.

Scoring heuristic for an AI-Powered Stock Screener

This scoring mechanism will allow our LLM to strive towards a goal. The closer the model follows the desires output, the better the score it will get.

Just like in reinforcement learning, you can give the model positive reward for behaving like you want it to, and a punishment (or negative reward) for behaving how you don’t want it to.

After we have our scoring system, our final goal is to utilize AI to change the desired behavior of our prompt over time.

I created an AI-Powered Financial Assistant that’s CRAZY Powerful!

How can AI revolutionize financial analysis and trading for retail investors? 🤔 I’ve just created an AI-powered…

nexustrade.io

Step 5: Use AI to improve your prompt towards your goals

Using the genetic optimization algorithm in main.ts, optimize your prompts to make them closer to the goal state.

I’m biased towards genetic algorithms and chose it as the optimization algorithm for a number of reasons. For one, I graduated Cornell with a degree in biology… I like to think that my degree wasn’t a complete waste of money!

More importantly, the algorithm quite simply works very well for nearly any problem. It is used to generate a population of viable candidate solutions.

Here are the 5 phases to genetic algorithms:

Initialization: An initial population is generated
Selection: Individuals in the populated are “selected” to reproduce, with more fit individuals being more likely to be selected
Crossover (or recombination): We combine the “genes” of our parents to create new offspring (or solutions)
Mutation: We create unexpected change to our offspring that can have positive or detrimental effects on its fitness
Evaluation: We then calculate the fitness of our offspring using the AI Stock Screener Prompt Evaluator or other quantifiable methods (like the length of the string).

While this article won’t go into how each step is implemented, you can check out my past article or browse the GitHub repo to see how it works.

I’m sick and tired of prompt engineering. So I’m making a prompt optimizer (Part 1)

Prompt engineering sucks.

ai.plainenglish.io

The Promptizer automatically handles most of the advanced data science stuff for you. For example, it will split the ground truths into a training and validation set, so that we can figure out how well our prompts generalize to unseen data.

The end result of the optimization process is several prompts, each of them objectively better than the original.

Step 6: Graph the change in your prompts performance

Graph measuring the change in the average performance across the generations

When you’re done, you will likely be curious how much (if at all) your prompt improved over time. Did the prompt actually improve? Or did you waste $80 for nothing?

The repo contains utilities for helping you determine this.

First, there is a function within the main.ts that formats the data into a JSON file.

Then, there is a Python script (graph.py) that generate graphs so you can see how the performance of your prompt changed over time.

Concluding Thoughts

We are leveraging the strength of different AI algorithms.

Large Language Models are great at generating text, specifically text that conforms to a certain specification. In contrast, the old-school genetic algorithms are great at optimizing pretty much anything, because it doesn’t require gradient information like neural networks.

The combination of the two is extremely powerful. It creates a robust framework for optimizing any prompt, eliminating the need for tedious prompt engineering!

However, please be cautious when utilizing this framework. Due to the number of API calls to OpenAI, the optimization process is surprisingly very expensive. It absolutely saves you time and will improve the accuracy of your prompt, but it will cost you a pretty penny, even with relatively small sample sizes.

Overall, I’m happy to release my technique to the wild, and allow others look at it, copy it, and contribute to a world where manual prompt engineering is a thing of the past.

Contributions to the repo are welcome!

GitHub - austin-starks/Promptimizer: A system to optimize any arbitrary prompt

A system to optimize any arbitrary prompt. Contribute to austin-starks/Promptimizer development by creating an account…

github.com

Thank you for reading! If you’re intrigued by the potential of AI in finance and want to see the results of optimized prompts, I invite you to explore NexusTrade, where this optimized AI Stock Screener is just one of many innovative features.

NexusTrade — AI-Powered Algorithmic Trading Platform

By far the best algorithmic trading experience. Learn to conquer the markets by deploying algorithmic trading…

nexustrade.io

Follow me: LinkedIn | X (Twitter) | TikTok | Instagram | Newsletter

Introducing Promptimizer – an Automated AI-Powered Prompt Optimization Framework

GitHub - austin-starks/Promptimizer: A system to optimize any arbitrary prompt

A system to optimize any arbitrary prompt. Contribute to austin-starks/Promptimizer development .

I’m sick and tired of prompt engineering. So I’m making a prompt optimizer (Part 1)

Prompt engineering sucks.

The Automatic Prompt Optimizer

I’m sick and tired of prompt engineering. So I made an automated prompt optimizer (Part 2)

When I first started learning about artificial intelligence, it had absolutely nothing to do with large language…

How to use the Promptimizer

I just tried out the new GPT-4o mini model and am BEYOND Impressed

The easiest way to find new stocks. Despite how powerful GPT-4o is, I don't tend to use it often…

Step 1: Define Your Goal

I used NexusTrade to Identify Fundamentally Strong AI Stocks. They are DEMOLISHING the Market

No Exaggeration. These 5 stocks are leaving the rest of the market in the dust. 🤔 How can we harness the power of AI…

Step 2: Create a list of approximately 5 (different) system prompts

I changed my Analytical Data Storage to BigQuery and Holy Hell!

I can't stress enough how game-changing it was when I switched my analytical data storage to BigQuery.

Step 3: Create a list of inputs and populate their desired outputs

Anthropic Dominates OpenAI: A Side-by-Side Comparison of Claude 3.5 Sonnet and GPT-4o

Anthropic is beating OpenAI at their own game. Austin Starks ∙ 9 min read ∙ View on Medium Anthropic Dominates OpenAI…

Step 4: Create a scoring heuristic for your model

I created an AI-Powered Financial Assistant that’s CRAZY Powerful!

How can AI revolutionize financial analysis and trading for retail investors? 🤔 I’ve just created an AI-powered…

Step 5: Use AI to improve your prompt towards your goals

I’m sick and tired of prompt engineering. So I’m making a prompt optimizer (Part 1)

Prompt engineering sucks.

Step 6: Graph the change in your prompts performance

Concluding Thoughts

GitHub - austin-starks/Promptimizer: A system to optimize any arbitrary prompt

A system to optimize any arbitrary prompt. Contribute to austin-starks/Promptimizer development by creating an account…

NexusTrade — AI-Powered Algorithmic Trading Platform

By far the best algorithmic trading experience. Learn to conquer the markets by deploying algorithmic trading…

Written by Austin Starks

Responses (1)