Using GCP Transcription Service and Custom NLP Models for Evaluating Educational Content

Saqib Awan
Published in
5 min readFeb 4, 2021



Evaluating and analyzing the educational content delivered in University courses as well as in other types of educational settings is an important task to maintain quality and standard of education. Ensuring that instructors are delivering content that complies with a set of standards and desired outcomes for the specific course is important for educational institutions to maintain consistent quality of education. Educational content comprises multiple forms of media such as lecture recording, slides, and different types of documents. Analyzing hours of lectures and other associated media to ensure compliance with a standard can be an extremely tedious and time-consuming task. With the recent advances in ML, especially in converting speech to text and state-of-the-art Natural Language Processing techniques, it is possible to automate this evaluation and compliance-checking process.

The Compliance Checking Problem for Education Content

As an example of educational content, each course offered in a University setting is composed of video recordings of lectures, slides, and assignments in the form of PDF or other types of documents. Furthermore, we can assign specific learning outcomes to different parts of the course, such as topics covered, terminology introduced, etc. These two sets of information i.e. the content delivered by the instructor and the text describing the desirable outcomes to comply with can help us score each part of the delivered content against the outcome. This could ensure that the quality of education was maintained throughout the course.

The Solution

GCP (Google Cloud Platform) provides an online transcription service that exposes a Rest-API for transcribing speech data to text. It is highly effective in transcribing speech in multiple languages and has advanced features like content-filtering, speaker recognition, and multiple fine-tuned ML models for specific scenarios like video and phone recording, etc. This service can be combined with custom NLP models to compare the delivered content and outcomes and score each lecture against desired outcomes.

As a basic analysis, a similarity score between transcribed speech and desired outcome text using NLP topic modeling, text summarization and keyword extraction techniques applied to lecture transcripts and associated slides can give us an initial assessment of how much of the outcomes were actually covered in the content. Further analysis can be performed using advanced state of the art Transformer based Question and Answering models to answer specific questions about content whether something was actually covered or answered in the lectures.

The following diagram depicts the architecture of the Trillo Solution for the Educational Content Evaluation problem.

Solution Components

The solution consists of the following components:


This is Trillo’s flagship service creation and orchestration engine running on top of GCP Compute services and utilizes several other GCP services such as Cloud Storage, CloudSQL, etc. It orchestrates the flow of the whole application and invokes multiple back-end Microservices running as part of the Trillo Workbench for GCP.

Microservice Front-end

This is the main Web-service exposing REST-API for transcription and content scoring. It takes the following inputs in JSON based request payloads:

  • Path to GCP Cloud Storage where the lecture’s video recording file resides for which the scoring is desired
  • Path to GCP Cloud Storage where the accompanying content files such as PDFs, slides, etc. reside
  • Path to GCP Cloud Storage where the output transcribed file is going to be stored (both JSON and .txt formats are supported)
  • Path to GCP Cloud Storage where the scored output file (a JSON formatted file) needs to be stored

This service routes the request to the specific back-end service for further processing.

Transcription Service

This service performs the following steps:

  • Automatically downloads the input Video Recording file
  • Converts it to .wav format PCM speech file
  • Sends the PCM file to GCP Transcription service
  • Receives the converted text data and creates output files in both .txt and JSON formats

Content Scoring Service

This performs the following steps:

  • Downloads the transcription file from GCP Cloud Storage
  • Downloads other associated files such as PDF, lecture slides, etc. from the GCP Cloud Storage
  • Downloads outcome files that describe the standard for the specific course in question
  • Converts all document files (PDF, word, slides, etc.) to plain text.
  • Runs NLP algorithm on the text as well as on the standard outcome files
  • Generates a score file that contains scores for the input files against each of the list of desired outcomes
  • Uploads the scores file to the given GCP Cloud Storage path

Limitations of Content Evaluation

Content scoring with basic similarity executed on text summarization based NLP models has some limitations that we need to carefully consider when building and using such models. They are as follows:

A basic similarity model may not give an adequate analysis of content for some use cases. Here are some examples:

  • We want to categorize content qualitatively which essentially is a classification problem
  • Ask specific questions about different parts of the lectures and assignments of a course e.g. for how long was this specific topic covered, how descriptive were the questions from students answered on a particular topic etc.
  • Evaluate the assessments for the course e.g. Quizzes, exams, etc. in terms of relevance to the topic and its coverage, etc.

For such cases, we need more advanced NLP models and labeled data. This means we need to put in significant initial effort in labeling data. Some of the models that could help here are classification models with labeled data such as the ones for Sentiment Analysis, Question and Answer models, comprehension models, and specific language models that are first trained on the corpus of a specific discipline such as medical, legal, Engineering, Computers, etc. Embedding vectors learned through training of such language models could then be used as the basis for advanced models for more specific tasks.


Using Machine Learning to evaluate and analyze educational content to ensure quality is an important task for many educational institutions. Recent advances in ML, especially in Speech-To-Text and NLP have made it possible to automate such analysis to a large extent. This can significantly reduce human effort and save countless hours of tedious work. Human intervention could still be helpful but the evaluation could be done with ML first and then passed on to the humans, thus significantly reducing the work involved. This could also have used for training of instructors to improve their craft and also result in the improvement of the curriculum itself. However, there are some limitations of basic similarity-based techniques, but they could be overcome by putting an effort into data labeling and domain-specific language model training.

Trillo has created several successful solutions for its customers on GCP in Speech-to-Text and NLP space and is helping businesses harness the power of ML for their processes with cost-effective solutions that work and deliver continuous value. If you are interested in having us build a solution at a very low cost then contact us at