Behind Quizium: Technology to create Question in videos

3 min readNov 29, 2023

Quizium is a service that utilizes AI technology to create questions based on Youtube videos that you can then solve.
I’d like to take you behind the scenes of my experience creating a service called Quizium.

Ok Let’s start it

Utilizing AI based on LLM and Whisper Speech Technology for Text Extraction:

Our video-based question generation service leverages OpenAI’s LLM, a robust language model, to provide a new dimension of automation and text extraction capabilities. Firstly, the LLM comprehends the text within the video and extracts important sentences and keywords, enhancing the understanding of the content.

Text Analysis Using LLM

The LLM understands and extracts semantically relevant sentences from the video’s text. For instance, it can identify and extract sections emphasizing the “significance of artificial intelligence” in a lecture video.

Application of Whisper Speech Technology

Whisper, OpenAI’s speech synthesis technology, accurately converts spoken content from the video into text. This allows for the swift extraction of spoken content from the video, integrating it with the analyzed text from LLM for richer information.

Integrated Text-Based Question Generation

By combining LLM and Whisper technology, dynamically generated questions based on the extracted text information are created. For example, objective questions or summary questions are automatically generated based on the extracted core content, providing learners with a variety of question types.

The essence of this technology lies in extracting text information from diverse content sources and dynamically generating precise and varied types of learning questions based on it. LLM excels in understanding context and extracting crucial information, while Whisper Speech Technology accurately transforms natural speech into text. This enables learners to effectively comprehend video content and accumulate knowledge through diverse learning experiences.

Development Challenge: Lack of Video Transcripts

One of the most challenging aspects in developing our video-based question generation service was the absence of accurate transcripts for each video. The lack of precise transcripts directly impacted the accuracy of our language model, making it difficult to extract nuances and contextual information.

Despite integrating Whisper Speech Technology, conveying nuances accurately remained a persistent challenge. The diversity in video content, coupled with variations in language patterns, further complicated the situation.

However, by combining the language model, Whisper, and contextual analysis, we succeeded in extracting meaningful information even in the absence of accurate transcripts. This experience underscored the ongoing effort required for creating a flexible AI solution.

Efforts to Enhance Problem Quality: Significance of Prompt Refinement

We are dedicated to continual efforts in elevating the quality of problems as a core value of our service. High-quality problems are essential for providing users with effective and meaningful learning experiences.

In particular, we have paid significant attention to prompt refinement. A precise and clear prompt serves as the key to guiding the model in the desired direction. If the prompt is ambiguous or unclear, the model’s response becomes unpredictable, potentially leading to a decline in quality. Therefore, we have consistently explored and adjusted various prompts to understand which yields the most suitable outcomes for different problems.

These efforts contribute to providing outstanding problems that are easily understandable and conducive to focused learning. Prompt refinement stands as a crucial element in ensuring our service continually evolves, delivering high utility to users.