Blog #1: All the Ideas.

Published in

GatesNLP

5 min readApr 9, 2019

👋🏼you! Welcome to our collection of thoughts, challenges, and updates as we begin our NLP Capstone Journey.

So, who are we?

We are GatesNLP, the next biggest thing in NLP (ha!) — if you didn’t yet get the joke, do note that we are not in any way officially sponsored by Bill & Melinda Gates, just inspired by them :) Our team consists of 3 Allen School students — Bryan Hanner, Mitali Palekar and Swojit Mohapatra.

Below are some ideas that we are considering pursuing:

Idea 1: Analyzing model drift to understand what models are really learning

Today’s models are becoming extremely complex and hard to intuitively understand, especially with the advent of neural networks. As such, for this project, we want to develop a greater insight into what models are really learning.

To do that, we plan to focus on the summarization task. We first plan to summarize passages using several different state of the art models, specifically extractive and abstractive summarization techniques (enabling us to create summaries that emphasize different aspects but are all supposedly good). We then plan to extract the information that is encoded in the different summaries produced. From there, we continue to retrain the models on the summarized texts to see what a summary of a summary looks like for different state of the art models. This has been shown in the diagram below.

Method to analyze model drift on the summarization task

As these models continue to train, we develop greater insights into what type of information different summarization models are encoding. Once we do this, we then seek to develop a metric to translate how the information being encoded in summaries translates to information about what models are really learning, what information is being weighted in the development and how drift appears to manifest.

Once we finish this basic technique, we also plan on extending this problem to using different datasets/passages and then seeing how different passages affect the type of summaries and information that is being encoded within summaries. This will enable us to analyze similar problems but from a slightly different perspective.

For our stretch goals, we hope to extend this method of analysis to different NLP problems such as question-answering, machine translation, etc. Additionally, we think one of our most challenging areas with this type of problem is developing a metric to translate information being encoded to insights about what models are really learning. We believe that honing in on this metric and further developing it beyond its most basic state might also constitute a stretch goal.

Idea 2: Summarizing textbooks to glean out the most important points

For the second project, we aim to create a short summary based on the contents of a textbook, evaluated by comparing to a “gold standard” summary. It’s imperative to find good data for this gold standard. It’s also necessary to define what it means to be a good summary. We also need to decide on a list or a text summary? It can be different depending on the type of textbook. For instance, if it’s a self-help book, one might be able to do a list of key ideas. But if it’s a story, it might be a short essay.

Potential evaluation metrics

Have humans rate the output from a scale of 1–10 to test how good it is. Higher weight to people who have experience in humanities.
BLEU
METEOR
hLEPOR
ROUGE
MEWR

The only stretch goal I can think of is to support query-based summaries that input a particular user query as well as the textbook. For instance, a summary of the social structure during Hamlet. It can also be used to control the conciseness of the summary.

Rough Project Plan:

We kept a buffer time because projects take longer than originally estimated.

Data [2–4 weeks]: We need to find out the most relevant data for the task. Preferably, we’ll need more than 10 textbooks to build a good summary. We’ll have to start by getting a good handle of what our data contains and then parsing it accordingly on the models we decide on.

Model [3–4 weeks]: We need to find out the best models suited for the task. The model would ideally figure out superficial/trivial details. It’ll give a good idea of what the summary must look like. We need to research the best plan of action for this task, and also figure out how much computing resources we would need.

Web Interface [Optional]: For the project to take in queries and output summaries. The web interface would be a simple one and shouldn’t take more than 3 days to implement.

Idea 3: Natural language queries to find people to ask for help

Our final project idea is a system that models individuals based on what they have written about and then allows the user to ask for help with a natural language question and get pointed to the most helpful resources. This is similar to other works, namely GrapAL, which allows the research community to query publications in the Semantic Scholar literature graph through a SQL-like domain-specific language. The goal of our work would be to synthesize all the author’s work into one representation to give specific expert suggestions instead of publication suggestions. The essence of our architecture would be to input the author and some portion or form of the text he/she has written, and along with the users’ question, would output a relevance for each resource or person we are considering in our model. We would then pick the items with the highest relevance for the reader to use. Since a labelled dataset does not exist for this problem, we would likely train using a proxy to a person’s relevance: perhaps how many citations they have received in recent years.

Basic Architecture for the author-ranking system

At first, we would start with a specific domain of knowledge, such as natural language processing since much of the papers are open-source, and then branch out to other areas as time allows. Stretch goals would include redesigning our model and adding more data to handle a broad range of topics. It would also be useful to allow users to search with only their natural language question instead of a structured query. Refining our evaluation methods will also be important because a labeled set of who is the most helpful for various topics is not readily available to us and is somewhat subjective.

Here’s the Github link: https://github.com/mitalipalekar/GatesNLP

That’s all for now folks! See you all in a few days!