Using GCP Transcription Service and custom NLP Models for Evaluating Content Relevance

Saqib Awan
trillo-platform
Published in
6 min readFeb 4, 2021

Introduction

Evaluating and analyzing the media content for compliance against some given standards is a key requirement in many fields and domains such as medical, legal, and on-line learning to name a few. Compliance checking of delivered content-based services is important to maintain quality and standard for organizations providing content-based services. Media content comprises multiple forms such as video and audio recording, presentation slides, and different types of additional documents. Analyzing hours of media to ensure compliance with a set of standards and goals can be an extremely tedious and time-consuming task.

With the recent advances in ML, especially in converting speech to text and state-of-the-art Natural Language Processing techniques, it is possible to automate this evaluation and compliance-checking process.

The Compliance Checking Problem for Media Content

As an example of media content, let’s take the example of online computer-programming training offered by an institution. The content, in this case, is composed of video recordings of lectures, slides, and descriptions of coding projects and assignments in the form of PDF or other types of documents. Furthermore, we can set specific desired goals, in the form of descriptive text, to different parts of the training material, such as what should be the breadth and depth of topics covered, terminology introduced, etc. These two sets of information i.e. the content delivered by the instructor and the text describing the standard could help us score each part of delivered content against the standard for compliance purposes. This could ensure that the quality was maintained throughout the course.

The Solution

GCP (Google Cloud Platform) provides an online transcription service that exposes a Rest-API for transcribing speech data to text. It is highly effective in transcribing speech in multiple languages and has advanced features like content-filtering, speaker recognition, and multiple fine-tuned ML models for specific scenarios like video and phone recording, etc. This service can be combined with custom NLP models to compare the delivered content and standard and score each lecture’s transcript.

As a basic analysis, a similarity score between transcribed speech and standard describing text can be calculated using NLP topic modeling, text summarization, and keyword extraction techniques applied to lecture transcripts and associated slides. This could give us an initial assessment of how much of the goals were actually achieved in the content. Further analysis can be performed using advanced state-of-the-art Transformer based Question and Answering models to answer specific questions about content whether something was actually covered or answered in the lectures.

The following diagram depicts the architecture of the Trillo Solution for the Content Relevance Evaluation problem.

Solution Components

The solution consists of the following components:

  1. Trillo-Workbench

This is Trillo’s flagship service creation and orchestration engine running on top of GCP Compute services and utilizes several other GCP services such as Cloud Storage, CloudSQL, etc. It orchestrates the flow of the whole application and invokes multiple backend Microservices running as part of the Trillo Workbench for GCP.

2. Microservice Front-end

This is the main Web-service exposing REST-API for transcription and content scoring. It takes the following inputs in JSON based request payloads:

  • Path to GCP Cloud Storage where the lecture’s video recording file resides for which the scoring is desired
  • Path to GCP Cloud Storage where the accompanying content files such as PDFs, slides, etc. reside
  • Path to GCP Cloud Storage where the output transcribed file is going to be stored (both JSON and .txt formats are supported)
  • Path to GCP Cloud Storage where the scored output file (a JSON formatted file) needs to be stored

This service routes the request to the specific back-end service for further processing.

Transcription Service

This service performs the following steps:

  • Automatically downloads the input Video Recording file
  • Converts it to .wav format PCM speech file
  • Sends the PCM file to GCP Transcription service
  • Receives the converted text data and creates output files in both .txt and JSON formats

Content Scoring Service

This performs the following steps:

  • Downloads the transcription file from GCP Cloud Storage
  • Downloads other associated files such as PDF, slides, etc. from the GCP Cloud Storage
  • Downloads standards files that describe the standard for the specific domain in question
  • Converts all document files (PDF, word, slides, etc.) to plain text.
  • Runs NLP algorithm on the text as well as on the standards files
  • Generates a score file that contains scores for the input files against each of the list of desired goals
  • Uploads the scores file to the given GCP Cloud Storage path

Important use cases for Content Relevance

Content relevance and analysis are applicable to several domains. Some examples where it can be extremely useful are:

  • Compliance with legal financial instruments
  • Healthcare transcripts and their relevance vis-a-vis ICD codes (pretty much like a human being will assess the transcription to decide what is code)
  • The relevance of legal testimony if it meets certain goals or deductions. Does it score higher on a certain line of argument vs other

Limitations of Basic Content Relevance

Content scoring with basic similarity executed on text summarization-based NLP models has some limitations that we need to carefully consider when building and using such models. They are as follows:

  • A basic similarity model may not give an adequate analysis of content for many use cases. Here are some examples:
  • We want to categorize content qualitatively which essentially is a classification problem
  • Ask specific questions about different parts of the content e.g. for how long was this specific topic covered by a lawyer in a legal proceeding or by an instructor in an online course
  • How descriptive and relevant were the questions answered by the relevant person on a particular topic etc.
  • What was the feedback from stakeholders such as analyzing sentiments in comments and judicial remarks in case of legal proceedings etc.
  • Evaluate the assessments e.g. Quizzes, exams, etc. in terms of relevance to the topic and its coverage, etc.

For many such cases, we need more advanced NLP models and labeled data. This means we need to put in significant initial effort in labeling data. Some of the models that could help here are classification models with labeled data such as the ones for Sentiment Analysis, Question and Answer models, comprehension models, and specific language models that are first trained on the corpus of a specific discipline such as medical, legal, Engineering, Computers, etc. Embedding vectors learned through training of such language models could then be used as the basis for advanced models for more specific tasks.

Conclusion

Using Machine Learning to evaluate and analyze educational content to ensure quality is an important task for many educational institutions. Recent advances in ML, especially in Speech-To-Text and NLP have made it possible to automate such analysis to a large extent. This can significantly reduce human effort and save countless hours of tedious work. Human intervention could still be helpful but the evaluation could be done with ML first and then passed on to the humans, thus significantly reducing the work involved. This could also have used for training of instructors to improve their craft and also result in the improvement of the curriculum itself. However, there are some limitations of basic similarity-based techniques, but they could be overcome by putting an effort into data labeling and domain-specific language model training.

Trillo has created several successful solutions for its customers on GCP in Speech-to-Text and NLP space and is helping businesses harness the power of ML for their processes with cost-effective solutions that work and deliver continuous value. If you are interested in having us build a solution at a very low cost then contact us at info@trillo.io

--

--