The ‘Quizium’ Journey in Prompt Engineering

7 min readDec 4, 2023

AI-Driven Quiz Creation: The Quizium Approach

At the forefront of educational innovation, our team at Quizium is pioneering a new era in learning. By harnessing the power of ‘prompt engineering’ and utilizing advanced Large Language Models (LLMs) like OpenAI’s ChatGPT, we generate high-quality quiz questions from YouTube video content. Our expertise in crafting prompts ensures these questions parallel the depth, accuracy, and educational value of those created by human education experts. This initiative marks a significant advancement in educational technology, blending scientific inquiry with AI proficiency to redefine the realms of learning and knowledge assessment.

Defining Excellence in AI-Generated Quizzes

It is important to define what ‘high quality’ questions mean in this context. Without a clear definition of ‘high quality’, we cannot determine our direction or method of achieving it. Let’s take a multiple-choice question (MCQ), for example. The basic structure of an MCQ consists of a stem, a correct answer, and distractors. For a high-quality MCQ, each component should meet specific criteria and collectively form a cohesive whole. The question should assess higher-order thinking skills rather than mere recall, be clearly worded, grounded in the provided content, solvable without ambiguity, and collectively cover a broad range of the video’s content.

One might question the qualifications needed to be a prompt engineer in this project, as merely generating MCQs seems straightforward with the assistance of advanced LLMs like ChatGPT: simply ask it to generate MCQs based on the provided text. However, to ensure the LLMs consistently produce high-quality MCQs, the prompt engineer must have a solid understanding of how LLMs work, their strengths and weaknesses, and effective techniques for commanding them in natural language to generate desired outputs, i.e., prompting techniques.

Beyond the Basics: Maximizing LLMs in Educational Innovation

Decoding LLM Mechanics for Advanced Prompt Engineering

Andrew Ng, a respected figure in machine learning, recently highlighted an essential point relevant to our work: understanding the intricacies of how LLMs function is vital for effective prompt engineering [1]. He notes that the seemingly small differences in how we structure prompts can lead to vastly different outcomes. For example, a prompt encouraging an LLM to reason through a problem before answering can produce more accurate responses than one that doesn’t (see the image below). This is because LLMs generate output by predicting the most likely next word, so structuring a prompt to encourage logical reasoning can lead to more coherent and accurate responses. This insight is crucial in our field; knowing the mechanics behind these models allows us to craft prompts that not only ask the right questions but do so in a way that leverages the LLM’s strengths. It’s not just about using the technology, but understanding it deeply to create tools that truly enhance learning and decision-making processes.

Two similar prompts yield distinctly different results: one correct, one incorrect. Image Credit: The Batch, DeepLearning.AI[1]

Harnessing the Strengths of LLMs in Quiz Creation

One of the areas that make LLMs significantly different from the previous generation of language models is their ability to summarize long texts. Summarization, which may sound quite easy, actually requires a superior ability to understand context effectively. Given their close relationship to the process of summarization, quiz questions, which can essentially be viewed as concise summaries of key concepts, posed a significant challenge for earlier generation models. This challenge persisted until the advent of LLMs, whose advanced capabilities in distilling and synthesizing essential information made them particularly adept at generating effective quiz questions. The Transformer architecture, the backbone of LLMs, enables all this thanks to its core innovation known as the attention mechanism [2]. This mechanism allows the model to focus on different parts of the input text when predicting an output, enabling it to capture long-range dependencies and nuanced relationships between words and phrases, crucial for understanding and summarizing long articles. Each word in a sequence processed by a Transformer is understood in the context of every other word, leading to a more nuanced and context-aware representation. This is essential for summarization, where understanding the context and how different parts of the text relate to each other is key.

Navigating the Limitations of LLMs in Quiz Development

Despite these capabilities, current Transformer-based LLMs exhibit some limitations in tasks such as counting specific words in a text. Measuring word frequency is crucial for certain types of questions, like fill-in-the-blanks, due to its relevance to the learning objectives and the level of the questions. Therefore, Quizium’s prompt engineers must be aware of this issue and understand its origins. The core design of LLMs is oriented towards recognizing and generating language patterns, rather than focusing on specific, detail-oriented tasks like word counting. This emphasis on pattern recognition over detail means that LLMs excel at understanding context, generating coherent text, and handling a wide array of language-related tasks, but are less adept at tasks requiring precise attention to detail, such as counting. The operation of LLMs, based on probabilities and predictions rather than specific rule-based algorithms, is another significant factor. Counting words is a precise, rule-based task, ideally suited for specific algorithms. However, LLMs are designed for estimating, inferring, and generating language, not for executing exact calculations or detailed counts. This architectural trait persists regardless of the size of the text window and fundamentally affects the ability of LLMs to perform precise counting tasks accurately. To measure word frequency, one could define a function using a conventional programming language and enable the models to call this function when needed, through OpenAI’s API or the LangChain framework.

Evolving Prompting Techniques for Enhanced AI Efficiency

So far, we have explored the functionality, strengths, and limitations of LLMs, using specific examples to illustrate these points. Taking all these into account, we now need to think about how to make the best out of the LLMs for desired outcomes. Recently, machine learning scientists and engineers have been exploring various prompting techniques for different applications. One of the most widely used is the “Chain of Thought” prompting [3], which involves guiding the AI through a logical, step-by-step reasoning process in its prompts, closely mirroring how LLMs process information as explained earlier. This method not only enhances the clarity and accuracy of responses but has also inspired more advanced techniques such as ReAct (Reason + Act) [4] and Tree of Thought [5]. Additionally, human-psychology-inspired techniques like EmotionPrompt are gaining traction. These combine the original prompt with emotional stimuli, such as “This is very important to our students,” and have shown effectiveness in boosting generative task performance. For those keen on the specifics of how emotional prompting aids task performance, I recommend the original paper by Google DeepMind [6]. Incorporating these innovative techniques into our product is part of our roadmap for the major upgrade planned by the end of this year, aiming to further enhance the educational experience we provide.

Precision and Vision: Redefining the Future of Learning with AI

In the dynamic landscape of AI language models, the role of prompt engineering at Quizium is akin to that of a skilled archer, where precision, expertise, and dedication are key in crafting prompts that enhance our understanding and effective use of these technologies. The meticulous art of prompt engineering, much like archery, transcends mere technical skill. It embodies a deeper understanding of our tools and objectives, striving not just for accuracy, but for the true enrichment of learning through AI. Each carefully crafted prompt serves as an arrow, precisely aimed at enhancing educational content. This process, rooted in the precision of our AI tools, also embraces the adaptability required to translate varied educational materials into questions that engage, inform, and elevate the learning experience. Reflecting on the intricate balance of LLMs’ strengths and limitations, and the innovative prompting techniques that harness their full potential, our journey at Quizium is a testament to the evolving nature of education technology. As we advance, the fusion of AI’s capabilities with our deepening understanding of its potential continues to open new horizons in education. This journey is not just about hitting the target but redefining it, ensuring that our educational endeavors through AI are as impactful and meaningful as a well-aimed arrow. In this continuous pursuit, we stand at the forefront of an AI revolution, crafting not only questions but shaping the future of learning itself.

➡️ Explore Quizium — Try Now!

References

[1] Ng, Andrew. “The Hidden Value of Deep Technical Knowledge.” The Batch, DeepLearning.AI, 23 Aug. 2023.

[2] Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).

[3] Wei, Jason, et al. “Chain-of-thought prompting elicits reasoning in large language models.” Advances in Neural Information Processing Systems 35 (2022): 24824–24837.

[4] Yao, Shunyu, et al. “React: Synergizing reasoning and acting in language models.” arXiv preprint arXiv:2210.03629 (2022).

[5] Yao, Shunyu, et al. “Tree of thoughts: Deliberate problem solving with large language models.” arXiv preprint arXiv:2305.10601 (2023).

[6] Li, Cheng, et al. “Emotionprompt: Leveraging psychology for large language models enhancement via emotional stimulus.” arXiv preprint arXiv:2307.11760 (2023).