Joint IWPT + DepLing Panel

Panel: Eva Hajicova (EH), Joakim Nivre (JN), Stephan Oepen (SO), Kenji Sagae (KS) and David Hall (DH). Moderated by Yusuke Miyao (YM). I add the names of commentors as they join in the conversation.

quote thingies are my notes

(text in parenthesis is my comment)

How can we make parsing research relevant to linguistic theory?

EH: We learn by mistakes. We write annotation guidelines and annotators come with questions. We have to adapt, develop, complement, change the theory according to the data. This transfers to the domain of parsing: when the result of parsing is erroneous, we learn from our mistakes (if using a rule-based parser), the parsers’ mistakes or mistakes in the underlying theory.

JN: We need to have linguists and parsing researchers talk to each other. We have to ask ourselves: why aren’t they (if they aren’t)? 25 years ago it was obvious they should talk to each other. Maybe there was some feedback from CL to linguists. This is not happening to the same degree today, for many reasons. Depling has in its best moments been a good example about how we can open that dialogue again. Note however that if people don’t want to talk to each other, we can’t do this.

SO: disagrees with JN. The neural revolution leads to large number of papers that say: I don’t need feature engineering. They don’t care about the representation. This weakens the connection between Computational Linguistics and Linguistics. But there is a trend towards deeper analysis. Research on the PTB has tried to add traces and all. We need lexical knowledge to make a distinction between Mary wanted to dance and Mary seemed to dance. That is where linguistics has worked for ages and will be used increasingly.

DH: what linguistics has to offer to NLP is more on the generation side. Parsing is the wrong application of linguistic theory. (I strongly disagree with this).

KS: There is relevant work, for example by Emily Bender (+ gave other examples). That type of work exists but has been pushed out of ACL-like conferences. Cranking out PTB parsers has taken the space and pushed out the rest. The emphasis on parsing to improve some scores will be deemphasized. It is natural to talk about syntactic theory and look for what parsing research can bring to linguistic theory for syntax. But parsing might be relevant to other parts of linguistic theory: discourse!

KS: UD made practical decisions but there are many treebanks in many languages. I’ve been working on typology using UD. UD may not satisfy every linguists’ desires but if it didn’t exist, I wouldn’t be able to do the work I’m doing now.

Alessandro Lenci (I think): Construction Grammar (CxG) is growing in linguistics There was a workshop on computational linguistics and CxG. Communities that didn’t know each other. Something is missing in parsing: semantics. CxG makes hard claim that syntax and semantics should go parallel. We’ve been too task dependent. A large community can be interested in what we are doing. We need to work on the syntax-semantics interface to attract them. There is a whole bunch of idiomatic constructions. We can reach that community if we try to understand what they are looking for.

EH: KS, you stole my discourse topic! :) Going beyond sentence was not present in Depling this year. Would be good to see this!

DH: we are doing this for dialogue, there’s not a lot we can draw on from the community.

Giuseppe Attardi: we can go beyond plain parsing to provide a tool that can construct and retransform structure (I think I missed the main point here).

Xavier Carreras: Neural Network tools allow you to build parsers without thinking. We have systems that don’t use the notion of hierarchy or stack, they just output bracketing. It’s absurd but it works. But it cannot generalise. We have a great opportunity: we can redefine what it means to generalise. We have abused supervised learning. Test examples are just rewrites of the training. (missed sthg). It is not just about the representation but trying to understand (?). We need to use Computer Science and Linguistics.

KS: I never said parsing is useless. A lot of parsing papers were focused on improving results but we are now going to look at more important issues.

JN: I like what XC said. The way to use linguistics and NLP is to start doing science again. (:D :D :D)

Eric de la Clergerie (EdlC): there should be a dialogue between parsing and linguistics to understand why there is so much variation in language. (I missed some things here)

What’s next for parsing?

JN: What is the question: what should be the next thing or what will be? I’m not sure what the question is so I’ll go in-between.

Looking at other languages than English and trying to understand what is going on. For En, the res went from 93 to 96 with DL. That’s unbelievable! For other languages it went from 75 to 78. Maybe that’s unbelievable too but it’s still very far from 96. Maybe it’s lack of resources. Languages with rich morphology — free word order. But we still don’t have much understanding of interaction between typology and parsing.

Other thing: at EMNLP, there was a lot of work about graph parsing or paraphrasing. We are seeing and will continue to see more diversity in the problem. We need to care about seeing parsing not as a piece in itself but a piece of a larger puzzle. The key for results in Deep Learning is end-to-end training. Syntax and Semantics are still going to be relevant in many problems but we need to work out techniques where they will be used differently. Multi-task learning or linguistic scaffolding like was called in ACL. Force the model to learn parse trees in parallel with other tasks improves accuracy of other tasks.

SO: We have been oversimplifying. We have been guilty of focusing too long on single languages, isolating problems. The parsing task has long been: take POS tags and put them in a tree. (Didn’t get the exact words here but I think the point is we assume we have tokenized text). We forgot about information that was added in the PTB. We forgot it was a compromise. PTB builders were listening to syntactic theories. Let’s go back to acknowledging the complexity and sophistication of the problem. Don’t consider syntax in isolation but add some notion of semantics (he added something I missed about downstream applications and semantics).

EH: some linguists fear that UD is bringing too many simplications. On the other hand: I see in UD one positive point: the comparison of languages. The results may be oversimplified but if you just reveal — I know about your internal discussions — reveal some of the points you discuss internally, it would be very important and informative for linguists and for linguistic theory.

Teresa Lynn (TL): we’ve had too much of a focus on scores but applications reveal where we’re making progress and that’s where linguists will come into play, rather than in the engineering part.

KS: How do we measure progress in linguistic theory?

DH: There has been a focus on metric pushing. This enables us to grab money but also has a different value: those researchers care about performance and linguists care about explanations. It’s hard to bridge that gap. (I missed the rest of point)

Kim Gerdes (KG): The problem is that big data is everywhere. It’s all about capturing data and machine learning. It’s a general problem and we have to ask ourselves: how do we position ourselves as scientists? Big data research is useful for production. But now tools move ahead without understanding. Why does society pay us, scientists? Because we understand how things work. But it is no longer true. Every science has the same problem as us: ML does better than humans. Maybe this will change. But at the moment we have to work on understanding even if we don’t understand what we are doing. A lot of people have no hope that theory will come back in MT.

KS:(I’m paraphrasing here) it is easy to have performance as goal and it gets you funding. It’s hard to not stop working on interesting problems.

JN: more diverse problems will bring more diverse representations. They are designed by engineers. Figuring out ways of getting more people involved would be useful. There has been a move away from discrete/sparse representations to things that live in a continuous space. The advantages: it makes optimisation much easier because things are now differentiable. It also seems to work better. When were doing parsing, the only thing we don’t make continuous is the target representation: the trees. Why? Because we haven’t figured out how to use it as training data if continuous. Maybe we think we need discreteness. Training data has to be produced by humans. Try asking a person: give me a distribution of trees. Is this an obstacle to progress? Should we rethink this? It would be interesting to get linguists involved on this.

DH: discreteness has value because it is audible. It is hard to verify something that is continuous. When you have constraints you have to have discrete choices. Discreteness allows the establishment of a protocol that you can’t have with continuous representations.

EdlC: we have fuzzy borders everywhere: maybe there’s a point of discussion between linguists and us: can we be almost discrete? Representations could be a superposition of alternatives.

SO: we can go back to having several interacting layers. I sometimes look at UD and think: you are trying to have the cake you’re eating. Now I’m less sure. (I wrote words that I now can’t make sense of but I think the gist of it is semantic annotation is more complex). That complexity has no home in much active parsing research. If we want to deliver more on the syntax-semantics interface, we need to go that path.

What impact has UD had on linguistics and parsing research? Are there alternatives?

EdlC: UD is a good thing because it enables cross-linguistic comparison, etc. It should not be the end of the story. It should only be a first step.

EH: It has had a very very good impact: those who work with underresourced languages now have something to start with.

KG: What is good about UD is the visibility of resources. Now, do we really want to get linguists in this discussion? If so, how? UD breaks with traditions, phrase-structure grammar and dependency grammar. Can we make a version that is more appealing to linguists working with these frameworks? PPs with the preposition as dependent is not what I want to show my students. We should add the distinctions we need to transfer automatically from one representation to the other. Some information is missing in UD. It is hard to get something more substantially syntactically relevant. The community that does research on dependency grammar is small: is there any hope to get real synergy between UD and them? Will they be forced to be empircal? (I think I missed some more things here).

DH: Slav Petrov once said we are dropping prepositions (in a specific task I guess? Can’t remember) because it works better. Is that wrong? Do we know that that’s a wrong representation of language?

JN: UD is good primarily because it exists for many languages. For many it’s the only thing that exists of its kind. UD is not a linguistics theory, I always stress that. It’s not an annotation scheme that needs to replace the other ones. You will always be able to use more a informative one for individual languages. UD is trying to strike a reasonable balance between comprehensibility, parsability and annotatability. The current annotation is far from perfect. It will never be perfect. We try to keep moving in a good direction and doing it while interacting with the community around us.

TL: I would not use UD for teaching. It is not the purpose.

My note: Bill Croft does use something similar to UD for teaching.

The Irish treebank and UD Irish are being kept in parallel for different purposes. There’s information we don’t want to lose but we want to have it in UD for cross-linguistic studies. Most importantly: as a group of potential reviewers we should not flag: you haven’t used the UD data. As a community we can prevent UD becoming the gospel.

KG: The choice of content word as head is just wrong from a typological perspective. But this is not a discussion we should have in this panel. (Made other points that I missed).

SO: (to JN) you should be less defensive. UD is based on Tesniere and other linguistic research. You should be more specific about what are decisions that you embrace and which are the ones you ‘just had to decide’. Show the various pieces of the compromise. This might ease some of the hesitation to show this stuff in a dependency class.

JN: I agree with the idea of opening up and having variants like EdlC and KG said. I agree but there’s a concern here: trying to add complexity to languages might make all of it collapse. We still have so many things to fix in morphology and syntax and doing it consistently. The more cross-linguistically consistant the better. So I have been pushing back against adding semantic layers. But we want to encourage adding a semantic layer. And we will try to give recommendations so that we can say: if you want to add a semantic layer, here is how to do it. UD is not a linguistic theory but we hope that it is based on linguistic theory. It would be nice to see which theory that is. It was collected in a pragmatic and eclectic way. I do think there is a story that can be told about theory and I’ve been trying to share this to people. But is it worth the effort? We’re never gonna be able to please all the different schools in linguistics and NLP…

KS: We’ve been saying the exact same thing about the PTB, as long as there’s criticism it’s a good sign. We want to have what we have done consistently rather than having variants. (I missed the main point here I think)

Future of depling and IWPT

YM: I don’t know why this collocation happened.

SO: partly because, as IWPT program chair, I felt that recent IWPT meetings were getting cosy beyond … yeah… too cosy. We have a collection of special events (+TLT + framework-specific wkshps) with partly overlapping people who like to go to these small events which are less hectic and engineering than xACL. But to keep them healthy, we need substance. The original idea was just to collocate the 2, the joint day is an experiment. We don’t know yet if we can consider it a success because IWPT is just starting.

JN: for depling it felt natural, there is an obvious connection, syntax is central. Every depling has had papers on dep parsing. It’s been positive. One effect of collocating: fewer parsing papers in depling. This freed up time for other things. So we’re open to discuss further collocations.

my note: the later deadline for IWPT may have played a role in parsing papers not being submitted to depling (especially because the IWPT deadline was good timing after the CoNLL shared task, depling was too close).

Originally published at gist.github.com.