Advanced Guide to OpenAI Prompts: User Intent Classification, Thought Process, and Tool Usage (Building a Chinese-English Translation GPT Example)

8 min readAug 1, 2024

Continuing from the previous article Beginner’s Guide to OpenAI’s Official Prompt Framework: Introduction for Prompt Newbies, this piece explains the remaining prompt tutorials from OpenAI engineers.

Based on my own daily needs, I’ve constructed two prompt examples to help everyone better apply the abstract concepts provided in the prompt tutorials, resulting in a more tangible experience.

User Intent Classification

If a chatbot needs to address various types of questions, rather than a specific issue, it should perform intent recognition before executing the workflow. This allows ChatGPT to classify user questions before executing the corresponding workflows.

Here, I’ll use the example of building a Chinese-English translation chatbot. This need arose from my actual workflow for writing AI articles to enhance efficiency.

It needs to translate Chinese into colloquial American English.
It needs to explain unfamiliar English words and use less familiar words during the conversation in a more formal and serious tone.
When I input Chinese or English, ChatGPT should directly classify the intent and provide the desired result without needing instructions like “please translate,” simplifying the input process.

We will still base it on the framework from the previous article, making modifications for specific task requirements within the framework:

<systemprompt>
<backstory>The user is Joe, a Chinese high school student studying in the USA, who is not proficient in English. He needs to understand the English expression of some Chinese sentences and query the pronunciation, Chinese meaning, different usage examples, and commonness of unfamiliar English words. Joe will directly input the Chinese sentences that need translation and the English words that need querying without providing other instructions.</backstory>
<role>You are his English learning assistant, helping him solve the Chinese-English translation issues mentioned in the backstory for the following 2 scenarios, expanding his English vocabulary and enhancing his grasp of English sentences in daily life.
</role>
<stepbystepthinking>
- First, detect whether the user input is English or Chinese
    - 1. If the user input is detected as Chinese, translate it into 3 different English sentences with varied expressions for Joe to choose the most suitable one.
    - 2. If the user input is detected as English, answer Joe's question according to the format provided in """fewShotExample""", helping him learn more English words and content. For each word, search for its pronunciation link using web browser plugins from https://www.merriam-webster.com/, https://www.britannica.com/, and https://dictionary.cambridge.org.
- Second, verify the authenticity of the pronunciation links. They must be real URLs, providing Joe with the required word pronunciation [IMPORTANT!!!].
- Third, [IMPORTANT!!!!] you must present the above thinking process step-by-step to Joe, so he better understands your generation process and ensures the information provided is accurate.
</stepbystepthinking>
<fewShotExample>
When the user input is detected as English, output the result in the following format:
[Repulsive](https://dictionary.cambridge.org/zhs/%E5%8F%91%E9%9F%B3/%E8%8B%B1%E8%AF%AD/repulsive)
1. Disgusting, repulsive: Commonness 80%
   - The repulsive smell from the garbage can made me feel sick.
2. Repellent, having repulsive force: Commonness 20%
   - The repulsive force between the two magnets was so strong that they couldn't be pushed together.
Synonyms:
- Disgusting
- Revolting
- Offensive
</fewShotExample>
<expectedAnswer>
# When the input is in English
### Thinking Process

1. **Detect Language**: The user input word is “suboptimal”.
2. **Role Requirements**: As per the role, provide the word's Chinese meaning, pronunciation link, different usage examples, and commonness.
3. **Search Pronunciation Link**: Use the browser tool to search for the pronunciation link on dictionary.cambridge.org.
4. **Provide Definition and Examples**: Extract definitions, examples, and synonyms from reliable dictionary sources and provide them to Joe as per the format.### Final Result
[Suboptimal](https://www.merriam-webster.com/dictionary/suboptimal) 
1. Suboptimal: Commonness 80%
   - The team's suboptimal performance cost them the championship.
   - Living on junk food generally means that you have a suboptimal diet.
   
2. Below optimal standard or condition: Commonness 20%
   - Human activity has pushed animals into fragmented and suboptimal habitats.
   - The inaccessibility of medications can lead to suboptimal adherence.Synonyms:
- Inferior
- Substandard
- Deficient# When the input is in Chinese
### Thinking Process1. **Detect Language**: The user input is the Chinese sentence "脚踝因为扭伤肿了个打包".
2. **Role Requirements**: As per the role, provide 3 different English sentences with varied expressions.### Final Result1. My ankle is swollen because of a sprain and looks like a balloon.
2. Due to the sprain, my ankle has swelled up quite a bit.
3. My ankle is swollen from the sprain and it's puffed up significantly.These three expressions convey that the ankle is swollen due to a sprain, and Joe can choose the one that suits his needs best.
</expectedAnswer>
</systemprompt>

If the prompt does not specify the website to obtain the word pronunciation link, in my tests, it does not return a real pronunciation link. Therefore, I included the following websites for word pronunciation in the prompt. According to tests, Merriam can provide pronunciations.

https://www.merriam-webster.com/ https://www.britannica.com/ https://dictionary.cambridge.org

While searching for real pronunciation links, I encountered a problem: ChatGPT can generate real links, but they cannot be directly clicked.

For example, the blue text below cannot be clicked, even though the markdown syntax requires it. Direct URLs also cannot be clicked.

It might be a problem with different websites. Britannica links can be clicked, but Merriam and Cambridge Dictionary links cannot. If anyone has ideas about this issue, you can share them in the comments. I found that some website URLs can be directly accessed by GPT, while ChatGPT filters out others.

In the end, I used only Britannica’s dictionary, reversed the order, providing results first, then analysis. This way, you can use ChatGPT’s built-in pronunciation to know how the word sounds. I let the response repeat the word three times, clicking the small speaker icon to hear ChatGPT’s built-in TTS pronunciation.

The Britannica link below is also clickable.

Step-by-Step Thinking

To make ChatGPT accurately execute each step, show it step by step how it thinks. This way, not only can we clearly see the execution process, but ChatGPT can also execute more accurately, especially in solving logic-based problems like math questions.

In the Chinese-English translation case above, I tested a situation: if ChatGPT does not present its thinking process, its explanation of English word meanings might become English explanations rather than the required Chinese explanations.

If it presents its thinking process step by step, it will accurately follow the prompt flow.

The following three screenshots:

The first image includes both the above Third-generation thinking process prompt and the expectedAnswer example I provided.

The second image omits the Third-generation thinking process prompt, even though the expected output still includes the thinking process. However, the generated result is far from the requirement, generating Chinese explanations in all content, not following the prompt format.

The third image deletes the expectedAnswer provided, still retaining the workflow’s Third-generation thinking process prompt. The result, though a bit lengthy, generates content as required by the prompt.

The conclusion: To make ChatGPT generate as we want, it is necessary to let ChatGPT generate its thinking process. This is even more effective than providing reference examples and expected outputs.

Reference examples and expected outputs further standardize and refine the effect we want to achieve.

Of course, sometimes we need ChatGPT to think step by step and hide the thinking process. This can also be set.

The following image shows the user’s input — — → only displays the thinking process when written, otherwise directly generates the result.

Postscript

Besides the above two complex modules, the official tool calls provided by OpenAI prompt engineers include code interpreter, DALL-E, and web browser. These tools can be listed in the prompt, explaining what they do and when to call them.

When customizing GPT, there is also the action module, which can connect to other app APIs to achieve more complex and customized needs. Another article can be dedicated to the action module.

Learning basic prompts, first understand this article and the previous one Beginner’s Guide to OpenAI’s Official Prompt Framework: Introduction for Prompt Newbies, to clearly understand the impact and role of each prompt on the model, depending on the different application scenarios.

Once you have a clear understanding of the fuzzy boundaries of prompts, you can prepare to learn API calls, further enhancing ChatGPT’s capabilities.

The image below shows the efficiency improvement effects brought by different parameters of LLMs and prompts. Models with parameters like 175B benefit significantly from prompts, as larger model parameters mean more training data, and much of the knowledge we want is trained into the model. We need to ask better questions to extract that knowledge.

When prompts reach their bottleneck, further improvement requires more complex mechanisms like SFT, Agent, RAG, etc.

The 2020 OpenAI paper, “Language Models are Few-Shot Learners,” explains why few-shot learning is needed instead of full-shot fine-tuning.

When this paper was published, it was still GPT-3, before GPT-3.5 became popular worldwide. Prompts were still somewhat like examples. The paper mainly emphasized few-shot learning, which is in-context learning.

Fine-tuning for every specific task wastes time and affects the model’s practical usability. Fine-tuning requires preparing thousands of specific task cases with annotations.
Fine-tuning for specific tasks can lead to model overfitting, performing well only on trained tasks but reducing generalization abilities for real-world applications and other tasks.
Large models’ generalization abilities are somewhat like human beings; we can solve new problems with a few hints. Similarly, models can solve specific tasks through few-shot task descriptions.

The last point makes me feel that creating prompts is like setting questions for the model, asking it to write essays like giving writing prompts, and summarizing articles like giving reading comprehension questions.

Large models, especially those above 100B parameters (the larger the model, the more it learns), learn to solve various downstream tasks during training. In the inference stage, a little guidance can help them complete the task.

A reliable way to improve LM accuracy in all of these settings is by scaling up: increasing the number of parameters and the amount of computation used during training and inference.
In fact, some generalization properties only emerge in very large models, including much improved zero- and few-shot learning.
Recently, GPT-3 (Brown et al., 2020) demonstrated that large LMs can perform zero- and few-shot learning without fine-tuning through in-context learning. Notably, many of these in-context zero- and few-shot learning behaviors emerge or amplify at scale.

by : pamperherself

Advanced Guide to OpenAI Prompts: User Intent Classification, Thought Process, and Tool Usage (Building a Chinese-English Translation GPT Example)

User Intent Classification

Step-by-Step Thinking

Postscript

Written by pamperherself