NLP Pedagogy Interview: Yejin Choi (University of Washington)

David Jurgens
10 min readSep 25, 2018

--

(The following is the second in a series of interviews with natural language processing faculty on how they teach their courses.)

Lucy and I sat down with Yejin Choi, Professor of Paul G. Allen School of Computer Science & Engineering at the University of Washington and recent winner of the Borg Early Career Award. Her research interests include language grounding, commonsense knowledge, natural language generation, conversational AI, and AI for social good. She is also an adjunct professor of Linguistics, affiliate of the Center for Statistics and Social Sciences, and a senior research manager at the Allen Institute for Artificial Intelligence. In the 2017–2018 school year, Yejin taught two undergraduate NLP classes: CSE 447, a introductory course, and CSE 481 N, a capstone project course. The former course has a traditional lecture format, while the latter groups students into two or three to propose, conduct, and present research. Yejin has also taught graduate, professional, and advanced NLP courses at UW in previous years.

Q: You start your classes using traditional methods such as n-gram language modeling. Why are these methods still important to teach when deep learning manages to do the same tasks really well?

Yejin: I don’t think we can skip ngram language models entirely, since they refresh the basic notion of conditional probability models, which later topics such as HMMs and parsing also depend on. In addition, language models raise interesting questions about how to handle previously unseen words. There is also an empirical lesson to be learned; depending on different choices of unknown word conversion ratio, smoothing algorithms, target domains, and applications, the empirical results can vary significantly, which is an important practical insight to gain. After all, there was some amount of black art even before deep learning. In addition, ngram models are still practically relevant today as there might be cases when you cannot apply deep learning methods.

Q: Some people teach NLP by organizing their material by tasks such as sentiment analysis, but it seems like you organize it by method. For example, you teach probability-based methods, feature-based and deep learning. Why do you choose this type of setup over the more task focused one?

Yejin: I focus on methods over applications because we have a quarter system, which is really short to cover everything. I try to include fun applications through homework assignments however, such as Twitter sentiment analysis, or reading comprehension / question answering, so that students can see how methods learned in the class can be applied to practical applications. That said, I would love to spend more time on NLP applications such as summarization or dialogue if there were a second NLP course that allows me to expand on such topics.

Q: You teach both a capstone class where students conduct a research project and a more traditional NLP class. Is your non-capstone class more oriented towards preparing students to apply NLP an industry and software engineering or an NLP research position?

Yejin: In fact, I try to make both classes serve both purposes, because students often don’t know what they might want to do with their career before taking the class. In addition to teaching the fundamentals, what I like to emphasize throughout the class is the insight into how new ideas are introduced in the field, and how students should feel confident and inspired to innovate the field as well. One thing I like to do when covering the parsing segment is to give students an overview of the literature history as to how different parsing ideas evolved over time. I do this so that they get a sense of how practitioners and researchers make progress when confronted with challenges. If students were to apply the vanilla CKY (Cocke–Younger–Kasami) parsing algorithm to penn treebank, they would only achieve about 70%. It is only after you apply various grammar refinements when the performance starts improving drastically. The key takeaway message here is that some of the textbook algorithms, in their purest forms, might not work well right away when applied to real data. When confronted with less than ideal results, instead of assuming the challenge to be infeasible, I want students to feel confident to dig further based on what they have already learned so far. After all, many of the grammar refinements, such as lexicalization or additional markovian orders, are not so much about proving new theorems or setting new algorithmic bounds that go much beyond their education. Instead, the empirical advancements are often feasible through careful error analysis and a bit of thinking outside the box. I want them to know that often innovative ideas are just around the corner within their intellectual reach. Whether they pursue a career in research or in industry, it is important for them to feel confident to look for innovative solutions.

Q: What makes your two courses different from NLP courses being taught at other universities? Do you have any secret sauce?

Yejin: The capstone class at UW is probably more unique in that not all universities offer classes of that format. It allows undergraduate students to undertake a significantly more substantial project than what would be possible in a traditional class setting.

David: How big is the capstone course usually?

Yejin: The capstone class consists of about 20–25 students, which means there are easily 15 different research projects to advise throughout the quarter. It is hard to scale much bigger than this while maintaining individualized advising.

Q: Have you seen enrollment pressure change for those two courses over time?

Yejin: Absolutely! We introduced the undergrad NLP class only three years ago. Since then the enrollment has steadily gone up and the last offering filled up within just few days!

Q: How do you choose the kinds of activities you have students do in the assignments given that you only have ten weeks for the quarter?

Yejin: The purpose of the first assignment, which is on language models, is for students to have a crisp understanding of conditional probability models, as later topics like HMMs and parsing will build on them. While they all have seen conditional probabilities before, many of them get easily confused when they start looking at the equations and notations of language models closely and trying to implement them. Another purpose of this assignment is for students to get comfortable in dealing with a corpus of a “reasonable” size. Some students struggle at first with the brown corpus if they are not used to right kinds of data structure.

In the second assignment, I have students implement the HMMs and the Viterbi algorithm so they internalize dynamic programming, which is a good thing to practice for anybody in CS because it tends to appear in industry interviews. I also provide more involving optional bonus problems, such as implementation of the forward-backward algorithm or trying the full EM. During lectures I only cover bigram-based HMM transition probabilities, and then in the homework I have them extend it to trigram-based transition scores when writing the Viterbi algorithm, which can be surprisingly confusing to many students until they fully internalize the concepts of conditional probability models and dynamic programming. I used to have an equality involving parsing assignment, but these days I replace it with a deep learning-based assignment.

Q: Do you think there’s any difference between what you’re currently teaching and what students want to learn?

Yejin: Most students these days are excited about deep learning. So naturally, that’s what they want to learn more of. They are also generally excited about applications that they have heard of a lot recently such as reading comprehension or question answering. To satisfy their interests, I recently introduced a deep learning assignment based on the Stanford reading comprehension dataset (SQuAD). The goal of this assignment is to implement and experiment with basic neural network components such as embedding questions, embedding answers, and computing attentions.

In my view, what is unique and still important in NLP is structured inference, which is at the core of many NLP tasks. I think tree-based algorithms are really fun and worth taking the efforts to learn. I find that the student reactions can be a bit divisive however; some students love parsing algorithms or grammar formalisms while others wish to see less of them. Currently about ⅓ of my class covers deep learning and the rest spans over language models, sequence models, parsing, log-linear models, and translation.

Q: How much influence is there from your own research in what you choose to teach?

Yejin: As I am teaching in a quarter system, there’s just not much room to talk about research. But many students are quite curious about what research we do, thus I try to include tiny lecture modules once or twice to liven up the class.

Q: You mentioned that you reduced parsing to include deep learning in your course now. How do you choose what changes to make and what stays the same in your syllabus over the years?

Yejin: Looking back the past few years, I switched out nearly half of my lecture modules. If you think about it, this is rather unusual, as most professors are able to reuse the lecture material for several years if not longer. However, drastic renovation was necessary because deep learning started changing both our field and the student expectations drastically.

So, my current course modules are designed to include both what students seem excited about and also what I wish they learn. As for an example of the latter, I like to teach the dependency parsing algorithm known as Eisner algorithm because it’s really cool. But there was also an occasion when I had to back off a bit. At some point I was excited to make a new lecture module focusing just on EM, including proofs and bounds, and I thought students would like it as much as I do! Instead, they seemed really suffering through this particular module and in the end I decided to keep it out of the syllabus as I couldn’t quite figure out how to make it more appealing and engaging for students. I still teach more applied cases of EM such as EM for HMMs (including forward-backward algorithm), and EM for MT alignments, but I no longer do the generalized version with proofs and bounds. As a rule of thumb, I found having a slide deck with detailed and intuitive visualization of the algorithm traces helps a great deal making the lectures more engaging and digestible.

Q: You mentioned that your course can be math heavy. What’s the ratio of programming to math in your class, and how do you incorporate math into course material?

Yejin: Lectures are relatively more mathy, whereas homeworks are more programming oriented. The purpose of homeworks is for students to be able to translate math equations to programs.

Q: How do you decide what goes in an undergraduate class versus a graduate class in terms of the difficulty the material or the amount of math?

Yejin: That’s a great question! When I was about to teach an undergrad class at UW for the first time, I was told that undergrad students are very smart, but because they are less experienced, it is important to explain everything step by step and inside out. I find that undergrads digest material a little bit slower than graduate students do and it is important to avoid notational errors or inconsistencies. Also, undergrads seem to require more visualization of the concepts and become mentally fatigued sooner than graduate students. As a result, taking all these factors into account, my undergraduate class is about 70–75% of what is covered in the graduate class.

Q: In your intro slides, you note that your class is different from computational linguistics because it involves more math, algorithms, and programming. There is some overlap between your class and UW’s Intro to Computational Linguistics course. How do you plan your class relative to that one?

Yejin: Computational linguistics classes are designed for students who have strong linguistics background but not the standard CS background. The classes I teach are designed primarily for CSE students, thus my classes are heavier on programming assignments and mathematical concepts while relatively lighter on linguistic theories.

Q: Do you encourage your students to still explore some of the linguistics behind ideas in NLP?

Yejin: Yes. grammar formalisms are really great to teach, especially after going through dense algorithmic concepts. If time permits, I like to include mildly context-sensitive grammars such as tree-adjoining grammars and CCG (combinatory categorial grammars).

Another topic I always like to include is frame semantics. In that module, I like telling students about the original vision of Chuck Fillmore when he first conceptualized frame semantics. In its earlier version, the notion of frame semantics was much broader, including pragmatics and common sense. However, as researchers developed frame semantic resources such as FrameNet or PropBank, the scope has been substantially narrowed down to what corresponds to semantic role labeling. My personal wish is that our filed will eventually find a way to broaden the scope beyond semantic role labeling, and I teach frame semantics in the hope that some of the students might be able to achieve that.

Q: NLP is a big, rapidly changing field, especially with the recent emergence of deep learning. How do you prepare students for both the skills they need now and also the skills they’ll need in five to ten years?

Yejin: The future is hard to predict, thus the best strategy seems to be diversifying our teaching to balance both the traditional concepts and the more recently hot deep learning. Because UW doesn’t yet have a class dedicated to deep learning, I try to cover the basic concepts of neural network architectures such as RNNs/LSTMs/GRUs, encoder-decoder architectures, attention mechanisms, copy mechanisms, highway/skip/residual connections, convolutions, and stacking.

Another way to prepare them for the next five to ten years is to be transparent about the limitations and challenges we face today with deep learning. I want them to be excited about deep learning, but also keep a skeptical eye so that they can think critically about how to address some of the current challenges.

The above interview has been edited for clarity. Also thanks to teaching assistant Nelson Liu for providing useful information about NLP courses at UW.

--

--

David Jurgens

David Jurgens is an Assistant Professor in the School of Information at the University of Michigan