I recently tweeted something that I didn’t expect to be controversial. While not distorted through a 140-character lens, the idea I had hoped to convey is the following: the work published in NLP doesn’t draw on recent advances in linguistics and is, thus, not fairly characterized as interdisciplinary. Perhaps, this view is mostly bluntly represented with the following Tweet:
While I do not agree that nothing interesting has come out of linguistics in the past 30 years, to me, it seems self-evident that linguistics and NLP are divorced. As an example, of the NLP reading groups whose schedules I could easily find posted online, I can’t really find examples of recent linguistics papers, e.g., Stanford, CLSP, Stony Brook, and Arizona. If we are really doing research at the intersection of two academic disciplines, why don’t more of us read papers from both? The weaker version of my claim is that NLP does not build on recent linguistic advances (past 10 to 20 years), but I can barely find examples of older (classic) papers on those lists!
Before delving deeper into this point, however, I will start with two definitions: one of computational linguistics and one of natural language processing, both due to my advisor Jason Eisner. (See this Quora post.)
What is Computational Linguistics?
Computational linguistics is analogous to computational biology or any other computational fill-in-the-blank. It develops computational methods to answer the scientific questions of linguistics.
The core questions in linguistics involve the nature of linguistic representations and linguistic knowledge, and how linguistic knowledge is acquired and deployed in the production and comprehension of language. Answering these questions describes the human language ability and may help to explain the distribution of linguistic data and behavior that we actually observe.
In computational linguistics, we propose formal answers to these core questions. Linguists are really asking what humans are computing and how. So we mathematically define classes of linguistic representations and formal grammars (which are usually probabilistic models nowadays) that seem adequate to capture the range of phenomena in human languages. We study their mathematical properties, and devise efficient algorithms for learning, production, and comprehension. Because the algorithms can actually run, we can test our models and find out whether they make appropriate predictions
Linguistics also considers a variety of questions beyond this core — think of sociolinguistics, historical linguistics, psycholinguistics, and neurolinguistics. These scientific questions are fair game as well for computational linguists, who might use models and algorithms to make sense of the data. In this case, we are not trying to model the competence of everyday speakers in their native language, but rather to automate the special kind of reasoning that linguists do, potentially enabling us to work on bigger datasets (or even new kinds of data) and draw more accurate conclusions. Similarly, computational linguists may design software tools to help document endangered languages.
What is NLP?
Natural language processing is the art of solving engineering problems that need to analyze (or generate) natural language text. Here, the metric of success is not whether you designed a better scientific theory or proved that languages X and Y were historically related. Rather, the metric is whether you got good solutions on the engineering problem.
For example, you don’t judge Google Translate on whether it captures what translation “truly is” or explains how human translators do their job. You judge it on whether it produces reasonably accurate and fluent translations for people who need to translate certain things in practice. The machine translation community has ways of measuring this, and they focus strongly on improving those scores.
NLP is mainly used to help people navigate and digest large quantities of information that already exist in text form. It is also used to produce better user interfaces so that humans can better communicate with computers and with other humans.
By saying that NLP is engineering, I don’t mean that it is always focused on developing commercial applications. NLP may be used for scientific ends within other academic disciplines such as political science (blog posts), economics (financial news and reports), medicine (doctor’s notes), digital humanities (literary works, historical sources), etc. But then it is being used as a tool within computational X-ology in order to answer the scientific questions of X-ologists, rather than the scientific questions of linguists.
What does interdisciplinarity have to do with it?
Computational linguistics is per definition interdisciplinary. NLP, however, may be or may not be. Just as aeronautical engineering does not have to draw inspiration from birds to be effective, NLP does not have to draw inspiration from how humans process language. With regard to NLP, I think it’s important to emphasize there is no value judgement on whether NLP should or should not be interdisciplinary. Practitioners should choose the best method from their toolbox to solve the engineering problem at hand. My point was simply that the (vast) majority of work presented at *ACL events cannot be fairly described as interdisciplinary.
What does interdisciplinary work look like?
Wilson and Hayes (2008), published in Linguistic Inquiry, is a prime example of what interdisciplinary research in computer science (machine learning) should look like. (Full disclosure: Colin Wilson was my undergraduate advisor.) It draws on techniques from NLP and machine learning, maxent modeling in this case, to propose something that advances understanding of languages. They make substantive scientific claims about language and back them up with experimentation. More recently, consider Futrell et. (2017), published in TACL, which looks at the same problem, and Linzen et al. (2016), which analyzes agreement in an LSTM language model, but also engages extensively with work in linguistics, psycholinguistics and cognitive science. Note that my point was never that there aren’t brilliant examples of such papers, but rather that they are exceedingly rare. Moreover, I think you’re much more likely to find good work at the intersection of machine learning and linguistics in Cognitive Science venues.
Is this a stringent definition of interdisciplinary work? I don’t really think so. I think academic disciplines are ever-fluctuating entities, and interdisciplinary work is that which takes place in the intersection of what both groups think is interesting. Having to acculturate oneself to the traditions of two fields makes such work hard and notoriously difficult to fund. Is anyone of the opinion NLP is difficult to fund? Several Tweeters argued that using the notion of a “word” or “punctuation” constitutes interdisciplinarity. This is absurd; does using the concept of a logarithm make NLP interdisciplinary with mathematics? We certainly use mathematical techniques, but what we do is a far cry from what’s published in mathematics journals these days.
Next, there are two intertwined claims that I want to separate:
Claim 1: NLP fails without linguistic theory.
This is a claim that Emily Bender has made. I would tend to agree with it, but I really do not work on core, human-facing NLP tasks, e.g., machine translation or question answering, so I can’t truly speak to it.
Claim 2: Work in Computational Linguistics is not really present at *ACL conferences.
I think this is basically true, but there are some exceptions. (See above.) My claim here is that this sort of work is rare (and often ends up in smallish rooms with few attendees). If pressed, I personally identify less as an NLP person and more as a computational linguist. I find it difficult to motivate some of the problems I think are cool to NLPers, often getting responses like: what is that useful for? Or what can I do with it? On the other hand, I find it very difficult to explain the methods I employ to most linguists, many of whom took their last math class in high school. To be clear, I am making an observation about the field as a whole as I see it, having only been around for four years or so, and I am not advocating for a particular change in the field necessarily. Based on these experiences, I believe that ACL is not really an interdisciplinary place and, moreover, it is becoming ever less so. I will present three examples that motivate this belief, but there are many others I could adduce:
- A number of computationally oriented linguists as well as linguistically oriented ACLers formed a new conference, whose first iteration will be held in early 2018. If *ACL truly embodied interdisciplinary collaboration between computer science and linguists, why did so many people think we needed yet another conference? Why did so many push for it to be held jointly with ACL? I would argue that it’s because there is little interaction between the disciplines, i.e., neither *ACL nor linguistics conferences are really interdisciplinary events. But don’t take my word for it, take a look at the comments found here for the founders’ motivations.
- In my view, the field is moving away from linguistics faster than ever. I presented a poster at EMNLP 2017 about multilingual morphological tagging (Cotterell and Heigold 2017) and one of the first questions I got was from an industrial NLP researcher who asked in all earnestness: why do we need part-of-speech tagging when we can train everything end-to-end now? This view is also held by more established researchers, e.g., Kyunghyun Cho, to some degree. I want to be very clear that there’s nothing wrong with having or not having part-of-speech tags in your models: you should choose whatever works best. However, we are now in a position where the old stand-bys of NLP of yesteryear must be justified to a younger generation. And, if we are being honest with ourselves, part-of-speech tags are a relatively superficial aspect of syntactic theory that takes up at most a day of a class on syntactic theory. Fred Jelinek famously quipped that every time he fires a linguist, performance goes up and that mantra remains the battle cry of much of NLP.
- Another point that that I made in the Twitter thread was that many NLPers aren’t studying linguistics. As Emily points out, interdisciplinary research inherently requires expertise in both fields. I would argue that source of most of these expertise should be instruction of some kind, or another form of mingling with the experts. This does not seem to be happening, as far as I can tell.
Quantifying the Claim
I am currently trying to quantify the influence linguistics has on NLP, i.e., are the papers that are published at linguistics conferences and journals cited by papers in NLP conferences and journals. And, are any of the cited linguistics papers recent? Luckily, we have data that allows for this thanks to Dragomir Radev’s LILY lab. (See Hou et al. (2016).) Preliminary results indicate that the overlap is very minimal. I welcome any suggestions for how to do this study most effectively :-)