Self-Optimizing Prompt Automation

Published in

Operations Research Bit

10 min readDec 27, 2023

In the rapidly evolving field of artificial intelligence (AI), the ability to refine natural language prompts for better user interaction and response quality is a significant advancement. The downside to this is that manually literately refining prompts is a drawn-out process. This article delves into a cutting-edge process known as Prompt Automation, a machine learning-driven methodology designed to iteratively enhance the relevance and precision of prompts used in AI applications.

Introduction to Prompt Automation

Prompt Automation emerges as a solution to the often-labor-intensive task of crafting and revising prompts that guide AI in generating useful and contextually appropriate responses. Through an automated feedback loop, this process employs natural language processing (NLP) and machine learning algorithms to refine prompts iteratively.

Setting the Stage with Libraries and Resources

The foundation of prompt automation lies in leveraging powerful programming languages and libraries. Python, a language renowned for its simplicity and robust library ecosystem, is at the core of this process. Key libraries include Pandas for data manipulation, NumPy for numerical computing, Scikit-learn for machine learning, Matplotlib for visualization, and NLTK for natural language processing.

# Setup and Initial Imports
import openai
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import AgglomerativeClustering
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import nltk
from nltk.corpus import stopwords
import time
import openpyxl

Secure API Authentication

Secure access to AI services such as OpenAI is ensured by using unique API keys, which authenticate requests made to the AI models for prompt generation and embedding retrieval.

# Set OpenAI API Key
openai.api_key = ' '

Natural Language Processing at Work

The NLP aspect involves downloading stopwords from the NLTK corpus, essential for filtering out common words to focus on the more meaningful content within prompts. Tokenization of strings is performed to break down prompts into fundamental components for further processing.

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

def tokenize_string(input_string):
    tokens = [word.strip(",.!?") for word in input_string.split() if word.lower() not in stop_words]
    return tokens

Harnessing Embeddings for Similarity

The heart of prompt automation lies in generating and comparing embeddings — high-dimensional vectors that capture the semantic essence of text. Using OpenAI’s API, embeddings for given texts are retrieved, and cosine similarity measures are used to assess the closeness between prompts and target concepts.

def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    embedding = openai.Embedding.create(input=[text], model=model)['data'][0]['embedding']
    return embedding

def get_word_embeddings(input_string, model="text-embedding-ada-002"):
    tokens = tokenize_string(input_string)
    embeddings = {token: get_embedding(token, model) for token in tokens}
    return embeddings

# Visualization Helper
def visualize_embeddings(embeddings, labels):
    reduced = TSNE(n_components=2).fit_transform(embeddings)
    plt.figure(figsize=(10, 6))
    for i, label in enumerate(labels):
        plt.scatter(reduced[i, 0], reduced[i, 1], label=label)
    plt.legend()
    plt.show()

def similarity_score(embedding1, embedding2):
    return cosine_similarity([embedding1], [embedding2])[0][0]

def flatten_dictionary(data_dict):
    flat_string = ""
    
    for key, value in data_dict.items():
        if isinstance(value, dict):
            flat_string += flatten_dictionary(value)
        else:
            flat_string += f"{key}: {value} "
            
    return flat_string

def average_embeddings(embeddings_dict):
    """Averages a dictionary of embeddings."""
    embeddings_list = list(embeddings_dict.values())
    return np.mean(embeddings_list, axis=0)

Excel Document setup

In the quest to streamline the enhancement of AI-generated prompts, the practical implementation of Prompt Automation relies on a structured approach to data organization. Central to this process is the utilization of an Excel document, meticulously prepared to record each stage of prompt evolution. The document is composed of several columns, each serving a distinct purpose:

Prompt ID: A unique identifier is allocated to each prompt, establishing a clear reference system for tracking iterative changes.

Iteration Number: This column meticulously logs the sequence of refinements that a prompt undergoes, offering insights into the evolution and fine-tuning process.

Input Prompt: Initially, the document houses the original prompts, which encapsulate the tasks or queries presented to the AI.

Model’s Improved Prompt: As the name suggests, this space is reserved for the AI’s enhanced version of the original prompt, reflecting the incremental improvements from the feedback loop.

Match Score: Numerical representation of the alignment between the improved prompt and the model’s output, providing a quantifiable measure of the refinement.

Similarity to Target: This metric gauges the resonance between the enhanced prompt and the predefined target concept, ensuring the prompts stay on course.

Average Similarity to Control: A comparison against a set of control prompts, this average similarity score helps to maintain the distinctiveness of the improved prompts.

Feedback: Constructive comments and guidance are documented here, serving as a cornerstone for subsequent iterations, guiding the AI towards an ever-refined output.

This Excel framework is not just a record-keeping tool but an integral component of the Prompt Automation process, enabling the systematic improvement of prompts through a feedback loop mechanism. Each column is a testament to the iterative journey of a prompt, from its inception to its refined state, underscoring the dynamic and responsive nature of AI in enhancing user interaction.

# Iterative Prompt Improvement
df = pd.read_excel(r'C:\Users\AkimFitzgerald\Documents\Propmpt OP.xlsx')

# Handle potential NaN values or non-string data in 'Input Prompt' column
df['Input Prompt'] = df['Input Prompt'].fillna('').astype(str)
df['Models Improved Prompt'] = df['Models Improved Prompt'].astype(str)
results_df = pd.DataFrame()
iterations = 3

Contextualizing AI with Target Dictionaries

At the heart of this methodology lies the target context dictionary. This is a carefully curated map of key-value pairs, where each entry represents a distinct aspect of the scenario the AI is expected to understand and generate text for. For instance, in an application designed for intelligence analysts, the dictionary includes entries like “Analysis,” “Intelligence Analyst,” and “Objective,” each followed by a prompt that sets a specific scene or task.

The target context dictionary is not only a set of instructions; it’s a blueprint for the AI’s thought process, nudging it towards a particular way of thinking and writing. It instructs the AI to behave as an intelligence analyst would, to maintain honesty, to take a critical approach, and to structure information in a way that’s typical for reports in that field.

Flattening for Focus

The flatten_dictionary function is a practical solution to transform this multi-dimensional dictionary into a flat string. This transformation is not merely for aesthetics; it serves a functional purpose. A flattened dictionary becomes a single string of text that can be fed into an embedding model, which in turn translates it into a form that machine learning algorithms can work with more effectively.

Benchmarking Against Control

To gauge the AI’s alignment with the intended context, we compare the ‘target context’ with a set of ‘control contexts.’ These controls are everyday scenarios like “Grocery Shopping” or “Attending a Concert” — activities unrelated to the intelligence analysis context. By embedding both the ‘target’ and ‘control’ contexts, we can measure how closely the AI’s output matches our target and how distinctly it stands out from the irrelevant control contexts.

Embedding and Measuring

The process uses OpenAI’s embedding model to convert these contexts into embeddings — numerical vectors that capture the semantic essence of the text. The target_embedding is the vector representation of our flattened target context, and each control_embedding is the vector for one of our control activities. By comparing these, we can quantitatively assess whether the AI’s generated text is on point (i.e., closer to the target embedding) or if it’s veering off-topic (i.e., closer to the control embeddings).

The Feedback Loop

Equipped with these metrics, the AI enters a feedback loop where it receives guidance on how to adjust its text generation to more closely match the target context. The system is programmed to try several times (MAX_RETRIES) and wait a short duration (DELAY_BETWEEN_CALLS) between attempts, ensuring a robust process that aims for the highest quality result.

Through this sophisticated interplay of context dictionaries, embedding, and iterative feedback, AI systems can be trained to generate text that’s not only contextually appropriate but also fine-tuned to specific nuances of any given task — embodying the self-optimizing nature of AI in Prompt Automation. This continual refinement is a journey towards an AI that not only understands our instructions but also crafts its responses with an eye for detail and an understanding of the subtleties that make human communication so rich and varied.

The Iterative Refinement Loop

The iterative process begins with an initial prompt, which is then compared against a target concept’s embedding and a set of control embeddings representing common topics. The similarity scores inform the nature of feedback given to the AI model, which then generates an improved version of the prompt. This cycle repeats, with the aim of increasing the alignment of prompts with the target concept while ensuring differentiation from the control topics.

Visualization for Clarity

To visualize the process and results, dimensionality reduction techniques like t-SNE are employed, which provide a two-dimensional representation of the high-dimensional embeddings, making it easier to understand the relationships and distances between various prompts.

Feedback Loop and AI Interaction

At the core of this system is a feedback loop, wherein each iteration’s output is fed back into the AI model, along with the comparative similarity scores and constructive feedback. This loop is crucial for the gradual refinement of prompts.

Practical Implementation and Results Documentation

The process is realized by reading from and writing to Excel files, which serve as the input and output points for prompt data. The results of each iteration, including the improved prompts and their similarity scores, are meticulously documented, allowing for a transparent and traceable improvement process.

target = context_dictionary = {
    "Analysis": "You will analyze social media data concerning {country}.",
    "Intelligence Analyst": "Imagine yourself as an intelligence analyst from the Customs and Border Patrol agency in the United States.",
    "Objective": "Your objective is to compile a comprehensive report on recent events in {location}, {country} on {date}, presenting your findings with the expertise of a seasoned professional.",
    "Date of Analysis": "The date of analysis is {date}.",
    "Perspective": "Acknowledge that the data you're examining might encompass diverse perspectives, and not all information is guaranteed to be entirely accurate or impartial.",
    "Critical Approach": "Maintain a critical approach, framing your report in terms of 'Here is what the documents or sources are indicating or suggesting,' rather than presenting established facts.",
    "Honesty": "Please refrain from fabricating information and only report on the provided context.",
    "Immigration Event Report": {
        "Recent Events": "Report as many recent events as possible, provide as much detail as possible regarding each event, there is no need to mention a lack of information.",
        "Event Details": "Include an event title, a highly detailed 4 to 6 sentence description, and full source links. Ensure the correct full source links are provided for each event.",
        "Combining Data Points": "If multiple data points are pointing towards the same event, combine them into one summary. Only one summary per event, even if there is more than one source talking about the event. Keep an intuitive eye out for sources talking about the same event, but calling certain things by different names.",
        "Grouping Events": "Regarding grouping similar events into the same summary, combine events discussing the same object or event into the same summary. For example, if multiple events discuss the same monument then group them into the same summary.",
        "Relevant Topics": "Report only on events that are happening in and around {location} that have to do with immigration, transnational crime, terrorism, or United States border security. Please refrain from reporting on statistics. If there are no events to report on regarding these event topics then your response can say that there is nothing to report on for {date}",
        "Integrity": "Above all DO NOT LIE or fabricate events.",
        "Report Structure": {
            "American Related Events": "American Related Events in and around {location}, {country}:",
            "Event Title": "Event Title: (A short concise event title)",
            "Description": "Description: (A highly detailed description of the event in 3 to 6 sentences) Do not implicitly or explicitly refer to any document that the summary came from in the wording of the summary.",
            "Source Links": "Source Links: (Provide the source link to all documents being referenced here)"
        }
    },
    "Note": "Include as many related events as the provided sources indicate. But do not duplicate events and do not make anything up."
}

target_flat = flatten_dictionary(target)


control = [
    "Grocery Shopping",
    "Morning Commute",
    "Family Picnic",
    "Weekend Hiking",
    "Watching a Movie",
    "Dining Out",
    "Attending a Concert",
    "Home Cleaning",
    "Reading a Book",
    "Playing Video Games",
    "Visiting the Beach",
    "Cooking Dinner",
    "Attending School",
    "Work Meeting",
    "Gardening",
    "Online Shopping",
    "Gym Workout",
    "Listening to Music",
    "Hosting a Party",
    "Renovating Home",
    "Studying for Exams",
    "Taking a Nap",
    "Going for a Run",
    "Photography Session"
]

target_embedding = get_embedding(target_flat)

control_embeddings = [get_embedding(item) for item in control]


MAX_RETRIES = 3
DELAY_BETWEEN_CALLS = 5 


def call_openai_api(prompt_to_ask):
    for attempt in range(MAX_RETRIES):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo-0301",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt_to_ask}
                ]
            )
            return response
        except openai.error.OpenAIError as e:
            print(f"Error on attempt {attempt + 1}: {str(e)}")
            if attempt < MAX_RETRIES - 1:
                print("Retrying in a few seconds...")
                time.sleep(DELAY_BETWEEN_CALLS)
            else:
                raise  # re-raise the exception if max retries are reached



for index, row in df.iterrows():
    initial_prompt  = row['Models Improved Prompt']
    prompt_id = row['Prompt ID']
    Models_output = row['Models Improved Prompt']
    previous_target_similarity = 0
    # Check for undesired values and handle or skip them
    if Models_output.strip().lower() == "nan":
        continue

    for i in range(iterations):
        
        prompt_embeddings = get_word_embeddings(initial_prompt)
        desired_output_embedding = get_embedding(Models_output)
        average_prompt_embedding = average_embeddings(prompt_embeddings)

        match_similarity = similarity_score(desired_output_embedding, average_prompt_embedding)
        target_similarity = similarity_score(target_embedding, desired_output_embedding)
        
        # Compute average similarity with all control embeddings
        control_similarities = [similarity_score(control_embedding, desired_output_embedding) for control_embedding in control_embeddings]
        avg_control_similarity = np.mean(control_similarities)

        # Constructing the feedback loop
        improvement = target_similarity - previous_target_similarity
        if improvement > 0.05:  # Threshold for significant improvement
            feedback = f"Significant improvement observed! The response is aligning more with the concept of '{target}' (Similarity: {target_similarity:.2f}) and less with the control topics (Average Similarity: {avg_control_similarity:.2f})."
        elif improvement > 0:
            feedback = f"Minor improvement observed. Try aligning more with the concept of '{target}' (Similarity: {target_similarity:.2f}) and less with the control topics (Average Similarity: {avg_control_similarity:.2f})."
        elif improvement == 0:
            feedback = f"No change observed. Your response (Similarity: {target_similarity:.2f}) seems aligned with the desired concept, but try to differentiate more from control topics (Average Similarity: {avg_control_similarity:.2f})."
        else:
            feedback = f"Some regression observed. Realign with the concept of '{target}' (Similarity: {target_similarity:.2f}) and differentiate more from control topics (Average Similarity: {avg_control_similarity:.2f})."


        prompt_to_ask = f"Original prompt: '{Models_output}'. Initial response: {Models_output}. Match score: {match_similarity}. {feedback} Can you provide an improved response?"

        # Update the previous target similarity for the next iteration
        previous_target_similarity = target_similarity
        
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0301",
            messages=[
                {"role": "system", "content": "You are a helpful assistant"},
                {"role": "user", "content": prompt_to_ask}
            ]
        )
        ai_response = response['choices'][0]['message']['content'].strip()

        new_row = pd.DataFrame({
            'Prompt ID': [prompt_id],
            'Iteration Number': [i+1],
            'Models Improved Prompt': [ai_response],
            'Match Score': [match_similarity],
            'Similarity to Target': [target_similarity],
            'Average Similarity to Control': [avg_control_similarity],
            'Feedback': [feedback]
        })

        time.sleep(DELAY_BETWEEN_CALLS)
        results_df = pd.concat([results_df, new_row], ignore_index=True)


        Models_output = ai_response



results_df.to_excel('Prompt Automation Refined Output Test2.xlsx', index=False)

Conclusion: The Future of Prompt Automation

Prompt Automation stands as a testament to the potential of AI to enhance its own performance through self-improvement mechanisms. It saves time and resources while ensuring that AI-generated prompts are as relevant and effective as possible. As this technology matures, it promises to revolutionize how we interact with AI systems, making them more intuitive and responsive to our needs.

Self-Optimizing Prompt Automation

Written by Akim Fitzgerald