Prompt Engineering 1: Top 10 Best Prompting Practices for LLMs
Post 1 in our new series on Prompt Engineering on Medium
NEW (since Oct 2024): You can now also listen to an abridged and relaxed podcast episode of this Medium post on Spotify.
Table Of Contents
1. GIGO or why Are “Good” Prompts Important for Large Language Models (LLMs)?
2. Why Can’t Johnny or Jenny Prompt?
3. How Can We Address These Issues?
4. The Top 10 Best Prompting Practices
↳ a. Best practice 1: Be clear and specific
↳ b. Best practice 2: Provide context
↳ c. Best practice 3: Be concise
↳ d. Best practice 4: Adjust the style and tone
↳ e. Best practice 5: Adjust the LLM’s temperature for creativity and consistency
↳ f. Best practice 6: Control the length and detail of responses
↳ g. Best practice 7: Experiment with different phrasings
↳ h. Best practice 8: Iteratively refine your prompt by using output fragments
↳ Best Practice 9: Try different LLMs and choose the right one for the task / context
↳ Best practice 10: Customize LLMs (keywords: fine-tuning / customization)
5. Conclusion
Note
This series of posts related to prompt engineering is a byproduct of the ongoing IT project BPM as a Service (https://www.bpmaas.ai/) of WAITS Software- und Prozessberatungsgesellsch. mbH, where generative AI is being combined with business process management (BPM).
Our hope is that these prompt-engineering posts will also benefit other readers in their own use of generative AIs!
1. GIGO or why Are “Good” Prompts Important for Large Language Models (LLMs)?
Large Language Models, such as
- OpenAI’s GPT-4,
- Google’s Gemini,
- Microsoft’s Copilot,
- Meta’s open source LLaMA,
- x.AI’s Grok, and others,
are based on complex neural network architectures. They then rely on large data sets to generate mono- or multimodal content (text, image, audio, etc.) in a probabilistic way via text inputs (prompts).
However, as with other human or technical fields, LLMs follow the GIGO principle: Garbage in — garbage out. This means that if you feed these language models vague, generic, ambiguous, or nonsensical input, you should expect the output to be just as poor.
In short, the phrasing of the input (the prompt) significantly affects the results of all LLMs.
But what does a well-formulated prompt look like? We provide the answer to this question in the following sections.
2. Why Can’t Johnny or Jenny Prompt?
Since prompts rely on natural language input, it seems intuitive to assume a low barrier for using LLMs.
In other words, why prepare or think much about the structure or wording of a prompt when you can just go for it like hitting the I’m feeling lucky button on Google search?
The findings of the 2023 article Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts therefore are interesting, but not really surprising:
- Lack of systematic approaches
Although interacting with LLMs via simple instructions seems easy, non-experts find it difficult to create robust and systematic prompts. A key reason for this is that they tend to build prompts through opportunistic trial and error — avoiding a more structured approach. - Lack of concrete examples and dialogues
Although concrete examples and dialogues can significantly improve LLM outputs, non-LLM-experts tend to avoid them, viewing them as too specific and instead favor direct instructions. - Tendency to overgeneralize
Prompt novices often overgeneralize from a single success or failure, leading to exaggerated conclusions about the LLM’s capabilities. - Expectations based on anthropomorphic projections
Many users attribute human-like abilities to LLMs, leading to inappropriate prompts and expectations. - Lack of testing and optimization
Participants rarely collected systematic data to test and optimize their prompts.
3. How Can We Address These Issues?
Here are some suggestions on how to address the issues mentioned in the previous section:
- More Education
Non-LLM-experts need a better understanding of how LLMs work, in particular the fact that LLMs’ ability to probabilistically predict the next word produces impressive results that mimic human language but are not identical to the language process(es) of humans (see, for instance, Hans-Jörg Schmid (2020), The Dynamics of the Linguistic System: Usage, Conventionalization, and Entrenchment, Oxford University Press. - Realistic Expectations
LLM users need to adjust their expectations to align with LLM strengths and weaknesses. That is, instead of relying on magical thinking or violating the GIGO principle, prompt design should be based on a solid understanding of LLM limitations, such as challenges with reasoning, world knowledge, and common sense insights. - Explainable AI (XAI)
We need more and better explainable AI (XAI) to make the black-box processes within LLMs more transparent and understandable. - Better Prompting Tools
Tools that support prompt optimization must be further improved. - More Training
Practical skills in prompt engineering need further development: see the top 10 best prompting practices in the next section.
4. The Top 10 Best Prompting Practices
a. Best practice 1: Be clear and specific
- Clarity and specificity are crucial in prompting. Every LLM relies heavily on input, so unclear, generic, or vague prompts lead to inaccurate results.
- Example: Instead of asking, What are the benefits of exercise? try, Explain the health benefits of regular bodyweight training (calisthenics) for adults over 40?
b. Best practice 2: Provide context
- Context helps the LLM understand the nuances of a request, especially with complex questions. The more precise the context, the better the result — regardless of the LLM used.
- Example: As a small business owner looking to expand online, which social media marketing strategies would you recommend?
c. Best practice 3: Be concise
- Overly long prompts can confuse the models, as they attempt to process too much information simultaneously. Instead, focus on brevity to ensure clarity.
- Example: Explain the main features of LLMs: (1) Training data, (2) Transformer architecture, (3) Context understanding.
Note
The ideal length of a prompt depends on the task, and there is a tension between conciseness and the need to provide enough context (see best practice 2)!
d. Best practice 4: Adjust the style and tone
- Regardless of the LLM, you can influence style and tone by specifying the desired formality, technicality, or creativity in the prompt.
- Example for formal style: Explain the environmental impact of electric cars in a formal tone.
- Example for informal style: Write a casual, humorous blog post about the pros and cons of flying to Mars with Elon Musk.
e. Best practice 5: Adjust the LLM’s temperature for creativity and consistency
- LLM temperature affects how creative or predictable the responses are:
- Lower temperatures lead to more focused and consistent responses.
- Higher temperatures result in more creative and sometimes unpredictable outcomes. - Example for low temperature, i.e., precise answers: Explain how GPS works in simple, factual language.
- Example for high temperature, i.e., creative answers: Write a cyberpunk poem in the style of Arthur Rimbaud’s “Illuminations” and William Gibson’s “Neuromancer” about future urban co-existence between AI and humans.”
The resulting genAI image based on this poem looks like this, for example (Note: I have also added to the prompt that it should be positive and optimistic, otherwise the cyberpunk atmosphere might be too depressing):
f. Best practice 6: Control the length and detail of responses
You can influence response length by providing clear guidelines:
- Example for detailed responses: Explain the process of photosynthesis in detail, including the role of chlorophyll and sunlight.
- Example for shorter responses: Summarize the key points of photosynthesis in three sentences.
g. Best practice 7: Experiment with different phrasings
- Experimenting with different phrasings can yield better results, as changes in wording affect how the model interprets the input.
- Example: If What are the pros and cons of remote work? produces vague results, try rephrasing it as Describe the pros and cons of working from home for employees.
h. Best practice 8: Iteratively refine your prompt by using output fragments
- Continually refining and fine-tuning prompts over multiple iterations is key to high-quality outputs — regardless of the LLM used. For example as follows:
- Prompt 1: Explain climate change.
Output 1: […] - Refined Prompt 2: Explain climate change in more detail and describe its main causes (Tip: Incorporate fragments from Output 1 that you like or don’t like to improve the next response.).
Output 2: […] - Refined Prompt 3: Explain climate change, its main causes, and its effects on oceans and weather. (Again, use fragments from Output 2 you like / don’t like).
Output 3: […]
etc.
Best Practice 9: Try different LLMs and choose the right one for the task / context
- Since various LLMs have different strengths and weaknesses, it’s worth trying them for different tasks: for example, OpenAI’s ChatGPT-4, Google’s Gemini, Meta’s LLaMA, and x.AI’s Grok each have unique
- training data,
- values (fairness, etc.), and
- focus areas
that may support specific applications better than others.
Note on values orientation and its impact on LLM outputs
Different LLMs adhere to different ethical guidelines, heavily influenced by the values of the companies that developed them. These values affect aspects like
- transparency,
- fairness,
- bias control, and
- the handling of sensitive or controversial topics.
- x.AI (Elon Musk and his team), for instance, prioritizes more freedom of speech with fewer content restrictions, potentially producing more unfiltered and open answers, but also risking controversial or offensive content.
- Azure OpenAI (Microsoft): Microsoft’s models emphasize accountability, transparency, and fairness, offering stricter filters against inappropriate content and more bias protections.
They may be less creative but more reliable in sensitive areas like education or healthcare. - Copilot (GitHub, Microsoft): Copilot is heavily focused on code generation, optimized to avoid producing problematic or insecure code, emphasizing technical accuracy and ethics in software development.
In sum, these value orientations significantly influence the results, especially with sensitive topics: Some models impose restrictions to ensure they don’t produce harmful or unethical content, while others offer more freedom, potentially leading to diverse but controversial results.
Best practice 10: Customize LLMs (keywords: fine-tuning / customization)
- Many LLMs offer customization options through user-defined instructions or fine-tuning, allowing the model to better suit your tasks and needs.
- Example: Use custom instructions in ChatGPT or fine-tuning options in models like LLaMA to tailor the output to your requirements
5. Conclusion
When trying to build good LLM prompts, it is a good idea to keep the following best practices in mind:
- Use clear and specific, but context-rich formulations.
- Precise instructions and continuous refinement yield high-quality results . regardless of the model.
- By adjusting style, tone, temperature, experimenting with different LLMs, and customizing / fine-tuning models, you can get the most out of any generative AI.
- The choice of the right model depends heavily on which ethical values and technical priorities are most important for your task.
It can be worthwhile to test different models to find the one that best aligns with your goals and requirements because the values and ethical guidelines of the developers influence what topics are allowed or prohibited and what type of content you can expect.
Finally, it is also worth noting that, depending on the task or problem, it can be helpful to use individual or a combination of prompt pattern(s) to achieve high-quality output.
However, this will be the topic of one of our upcoming Prompt Engineering posts.
Thanks for your time and attention!
Author for WAITS Software und Prozessberatungsgesellschaft mbH, Cologne, Germany: Peter Bormann — August 2024