Part 1: Developing a Model to Synthesize Professors’ Lectures

David Morley
Intel Student Ambassadors
3 min readMar 4, 2019
Note based synthesis models could greatly affect the way students learn

The first challenge when confronting the issue of converting speech into notes, is determining how to train our model. There are plenty of text resources out there, and plenty of notes, but finding ones that match up perfectly is admittedly more challenging. Part 1 of this series will focus on some of the avenues that were explored in order to develop a representative data set as well as a preliminary overview of the techniques that are being considered for the development of the model.

The first breakthrough when pondering this issue was the significant similarity that exists between textbooks and lectures. Although books are admittedly different, they have more content, are written in a different style, and can never quite teach in the same way as a professor, their overall formulation of ideas, and breakdown of information, gives them many of the critical attributes that our model desires. By having the option to use books to simplify the issue of obtaining lectures from professors, it is more likely that corresponding notes can be found, and it is easier to break the model into simplified structures.

Another major concept that came to light upon more thought into the problem was the potential application for recursive logic. As textbooks are made of chapters, which are made of paragraphs, which are made of sentences, and so on and so forth, it seems like a recursive model, which repeatedly breaks the problem into smaller and smaller subsets, could have great potential. By having our network first understand individual words, then their context, and finally their meaning in paragraphs, just another loop in the network could be the difference between a large-scale summary and an overview of a sentence. This type of reasoning gave much assurance to the validity of the model with smaller data sets and opened up the possibility of using very small data sets and just breaking them down into very small sizes. With this reasoning a thousand-page textbook could easily provide tens of thousands of paragraphs, and hundreds of thousands of sentences, allowing our model to quickly learn without the logistics of finding huge sources of information.

The final insight was the abundance of university’s lectures that are publicly available on YouTube. Through a service such as youtubedl, thousands of lectures can be translated into text, giving a relative abundance of high-quality lecture content, that is easily accessible and ready to use. Although this does not solve the problem of gathering corresponding note data, it seems plausible that there are some students who have taken notes on these YouTube videos. The main obstacle would be finding these notes, or developing a school program to generate them, for the model.

Approaches to the Challenge

Using the recursive insight that was mentioned earlier, the challenge becomes analyzing the base case, the meaning of one word, and then finding out its significance in the context of other words, sentences, et cetera. An implementation that seems quite powerful for this use case is the idea of word vectors, where a corpus classifies words into their own high dimensional vector space based on their content and relation to other words. After using a standard implementation of the word vector model (likely the word2vec model developed at Google) the issue then becomes finding the most important elements of each subset. To approach this task, I propose a convolutional neural net that filters the word vector layers based on their uniqueness from other words (helping to identify those words that are vocabulary or newly introduced) and also by average pooling to grasp the general idea of the sentence. Each input to the model would likely contain one individual word vector, each clustered into sentences and then, filtered into main ideas. After having the model successfully grasp main ideas of sentences, it would build up to paragraphs, and then finally pages. Later layers could use the data gained from previous ones to improve training times and emulate the recursive structure of the model. The next article will talk about the classification of sentences by their main idea, results, and how well this same structure scaled.

--

--

David Morley
Intel Student Ambassadors

UCLA student currently interested in artificial intelligence and control theory; fascinated with natural language processing.