ACL 2020 — The 58th Annual Meeting of the Association for
Computational Linguistics

DATEV eG
DATEV TechBlog
Published in
5 min readSep 1, 2020

Von: Neha Pawar

Association for Computational Linguistics had an interesting venue this year. Instead of Seattle, it took place on trending platforms like the SlidesLive, RocketChat and Zoom.

The online version was impeccably organized. Apparently, it was also an expensive endeavor for the Organizers, in an attempt to keep the price fair for the attendees, to pay the various contracts with the respective online vendors and to maintain an open and friendly environment. Nevertheless, with about 4000 attendees and 800 research papers the conference was spread over a period of four days. An online conference enabled the 24h session distribution with majority of the sessions taking place in accordance with the Pacific Time zone. This meant that like me, many attendees from different parts of the world were awake with a big mug of coffee in the middle of the night for their chosen live sessions.

In this conference, one gets to have a direct contact with the Authors of the research papers. The presentations take place in short sessions and question answering rounds. A few of the papers, which helped me get useful insights into my ongoing work at AI Lab** are:

1.Multimodal extraction of data:

Multimodal Document is a document about a topic, represented with not just text but also tables, numbers, images and videos. Such documents range from movie websites to Youtube videos and to Invoices. DATEV can benefit from following this research field, since many of our Departments use Invoices to service our clients. I have analyzed one such research paper from Google about Invoices in a Paper review here**. Here I have compared the paper with a similar PoC at AI Lab. Apart from this paper, interesting research also came from Amazon for multimodal websites.

2. Preprocessing steps for NLP Tasks:

A study presented during this conference debunked some preprocessing beliefs whereas affirmed the importance of handling negations during tasks like Sentiment Analysis. Earlier, in an NLP task, as a rule of thumb, researches removed the stop words. However, increasing use of latest language models (where word gets a varying embedding depending on the context), this removal step has become unnecessary. Some ablation studies in the paper even show that the stop word removal hinders the process of learning, thus producing worse results.

3. Reinforcement learning:

In one of the PoC’s in AI Lab, we experimented with a reinforcement learning approach to counter the lack of training data. This approach was not successful a few months ago. But, the talk of Ye et al. 2020, would help in this endeavor.

4. German Tokenization:

A research paper was introduced to tokenize the German words. It peaked my interest since it reminded me of the Dejoin Library written at DATEV. Dejoin** tackles the problems by separating the long joint words found in German. Trivial as it may seem, it is far from it. If a longer or a noisy word are not found in the word embedding space, it would be assigned a Zero vector. This can be solved by separating them into smaller words, which can be eventually found in the embedding space. The library helps in the cases where an embedding model is used. Examples:

a. Motorradmechatroniker >> Motorrad mechatroniker

b. Energietechnikingenieur >> Energietechnik ingenieur

Apart from the many interesting papers, there were keynote speakers and awards ceremonies as well. Like each year, such conferences help the attendees figure out the trends in NLP. The conferences also make an open call for more research in some directions. Moreover, it signals a warning for certain Topics that can cause havoc if the respective research is used for the wrong reasons. This year, there is a clear attempt to draw the focus of the researchers on the following topics:

1.Measurement Metrics:

Currently the metrics used to evaluate how good or bad a particular algorithm is, fall short of a complete explanation. Accuracy/Fmeasures/Bleuscores/Rougescores fail to explain the whole picture of the behavior of the model. In some cases, a model that does very well according to a given metric fails to make an accurate prediction on a very easy data point. This is one of the common reasons a model is rejected in production.

2. Importance of Commonsense in models:

The keynote speaker Prof. Josh Tenenbaum(MIT) in his talk bought our attention to the fact that we are still not close to developing a system that can have a real conversation with humans with a shared sense of meaning and commonsense. Though this remains an unsolved quest, Prof. Tenenbaum guessed that the interdisciplinary fields of computational, behavioral and neural studies together could help tackle this problem. The NLP community is indeed making some slow progress by combining the various concepts of neural language models, probabilistic programs, hierarchical Bayesian learning and neuro-symbolic program synthesis. All of these concepts were explained in the informative talk.

3. Interpretability & Explainability:

With an increase in the use of Deep learning algorithms, the interpretability factor is taking a hit. A model is not able to explain why it made a prediction. There needs to be more research to help improve this. In AILab reading group**, we read an early research paper in this field. DATEV is already doing a PoC on this topic with FibuA.

4. Ethics & Bias:

Many of the current NLP Systems exhibit racial, social, gender and all sorts of other biases that trickle down from the data used for training. Apart from this, different people conceptualize Bias in different ways. How do we come to a common understanding? It is very important to involve the end Users of AI systems in the talk about Ethics. Moreover, our understanding of Ethics keeps evolving with time. This topic is very much under discussion in most of the AI Conferences. AILab is already involved in talks revolving around this theme with the responsible departments.

5. Lack of Real-world applications:

In many cases, the academic research cannot be applied directly to the real-world scenarios. This is due to many reasons:

a. The academic research is based on outdated datasets.

b. The skewed metrics: The metrics being used to optimize a model are not the same metric used in acceptance of the model in production.

Similar to bigger companies, at DATEV, the AI model is a very small part of the bigger pipeline. A huge amount of effort goes into data preprocessing and post processing to make the AI useful in production.

This year’s conference reminded me of my first Datascience Community of Practice** (a DATEV Meetup), where one of the organizers asked me, if there is an NLP system which can answer anything and everything. As of today, the answer remains “42”.

Such conferences give a more realistic view into the state of AI. The AI systems today are still shadows of the more exciting sales pitch related to it. Gradually, we tweak our models from one-day to the next. Finally, we hope to develop a system that can answer anything and everything with a shared sense of meaning and understanding with us, humans.

**: The links are exclusively available in DATEV intranet.

Photo by Alexandra on Unsplash

--

--

DATEV eG
DATEV TechBlog

DATEV eG steht für qualitativ hochwertige Softwarelösungen und IT-Dienstleistungen für Steuerberater, Wirtschaftsprüfer, Rechtsanwälte und Unternehmen.