State of the Art in Document Analysis (omni:us at DAS 2018)

Published in

omni:us

7 min readMay 7, 2018

Last week 3 of our scientific engineers took a trip to Vienna for 3 days to attend DAS 2018 in order to catch up with the most recent research trends in document analysis and to also present the joint paper with our research partner, the Computer Vision Center (CVC) from Barcelona. This article provides a short summary of the event together with a few links and visual impressions.

Introduction

DAS 2018 is the 13th edition of the International Workshop on Document Analysis Systems (DAS) organized by TC 11: Reading Systems of the International Association of Pattern Recognition (IAPR). Besides the International Conference on Document Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwritten Recognition (ICFHR), it is one of the most important events solely dedicated to document analysis. It is organized as a 3 day single track workshop with tutorials, keynotes, oral and poster sessions and discussion groups in order to facilitate in-depth exchange and discussions between the experts in the field of document analysis. There were about 150 academic and industrial participants from all over the world.

The workshop program covered a wide range of topics related to document analysis. The first day started with a keynote by Lawrence O’Gorman (Bell Labs) about the history of document analysis from the early systems in the 1990s to the deep learning based systems today, followed by oral sessions on word spotting, handwritten text recognition, historical document analysis, databases/evaluation, and concluding with the first poster session. The second day continued with oral sessions about scene text detection and recognition, document analysis application, the second poster session and the discussion groups. The third day concluded the workshop with a keynote by Rolf Ingold (University of Fribourg) about historical document analysis and oral sessions about document understanding, graphics recognition, and forensic document analysis.

Discussion groups

One very interesting part of the workshop were the different discussion groups where experts from academia and industries came together to discuss the status quo and future work regarding certain aspects of document analysis. Based on the votes of the participants, 6 discussion groups with 10–30 participants each were formed for the most popular topics. The reports of the different discussion groups will be published on the corresponding IAPR TC11 webpage.

Deep learning for document analysis: The discussion revolved around general aspects of deep learning and its use for document analysis. These include efficient ways of finding the optimal architecture and parameters, understanding of the black box by visualizing features and activations, ways to obtain enough representative training data, and the evaluation from an end users perspective.
Document analysis systems: The discussion focused on architectural aspects of document analysis systems to achieve modularity, flexibility, scalability, interoperability and interactivity. The common opinion is that pipelines based on microservices running on premise or in the cloud, are the most promising solution. If possible the data exchange should rely on common formats such as PageXML, ALTO or TEI.
Document layout analysis: This discussion centered around the extraction of structural information from arbitrary documents including hierarchical and spatial relationships between different elements (headers, paragraphs, images, graphics). The approaches are typically application dependent and it is quite difficult to obtain suitable ground truth. The most promising directions are treating layout analysis as the inverse of typesetting and deep learning on graphs.
Historical document analysis: This group started with a discussion about the difficulty of specific problems including solved problems (custom applications, academic datasets), well-understood problems (online handwriting, language modeling, word spotting) and really difficult problems (lack of ground truth data, true generalization). Furthermore, the group discussed the status quo of digital libraries, including the standardization of data and models as well as the use of dynamic systems to improve the quality over time.
Camera-based document analysis: This discussion focused on online document analysis from mobile device cameras which is very interesting from an industrial perspective and has also a large range of applications (invoices, translation, education, augmented reality). Currently, the existing products work pretty well for easy scenarios (known size and layout, flat surface, good resolution). However, there are a lot of open problems which need to be solved, including torn and wrinkled documents, multiple pages, and fraud detection.
Datasets, users and interaction: This group focused on realistic scenarios where the documents are initially processed manually, with the system learning and improving over time by making the most of the user interaction. Based on this, the idea of a living laboratory was discussed where data scientists, engineers, and end users are working together to explore the possibilities and limitations of artificial intelligence for document processing.
Transfer and semisupervised learning: This group focused on one of the fundamental problems of learning-based approaches, which is the lack and cost of representative and annotated training data. Several solutions to this problem have been discussed, including transfer learning (adapting models from different domains or problems) and semi-supervised learning (training models based on unlabeled and labeled data).

Our paper

In the second poster session Manuel Carbonell (one of our industrial PhD students from CVC) and Mauricio Villegas (one of our experienced scientific engineers) presented our joint paper with the title: “Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-End Model”. Contrary to other approaches which treat the transcription of text and the recognition of the named entities separately, our method solves both tasks jointly using a single neural network. In combination with curriculum learning, we attain state-of-the-art performance with a lower complexity and without the need for dictionaries, language modelling and postprocessing. Due to this original approach, there was a lot interest in our paper, keeping Manuel and Mauricio pretty busy during the entire poster session.

Video presentation by Manuel.

Other papers

Overall, there were more than 80 papers presented during the 9 oral and the 2 poster sessions. The full papers will all be published by the IEEE Conference Publication Service and be available on IEEE Xplore. The short papers are available here. The best paper award was won by Praveen Krishnan et al. for their paper “Word Spotting and Recognition using Deep Embedding” dealing with an end2end framework that jointly learns text and image embeddings using a state of the art deep convolutional neural network. There were a lot of other interesting papers as well. We compiled a list of our favourites below:

Daniel Stromer et al. “Non Destructive Digitization of Soiled Historical Chinese Bamboo Scrolls”: Interesting approach that relies on x-ray and 3d reconstruction to recognize text on bamboo scrolls without opening them
Alexander Pacha et al. “Handwritten Music Object Detection: Open Issues and Baseline Results”: Musical object detection using a convolutional neural network similar to our region detection method for documents.
Syed Saqib Bukhari et al. “OCR Error Correction: State of the Art vs. an NMT Based Approach”: Uses a LSTM to learn a model for the correction of common OCR errors.
Herve Dejean et al. “Comparing Machine Learning Approaches for Table Recognition in Historical Register Books”: Comparison of conditional random fields (CRF) and graph convolutional networks (GCN) for table cell recognition from detected text lines.
Christoph Wick et al. “Fully Convolutional Networks for Page Segmentation of Historical Document Images”: Segmentation of document images into semantic regions (paragraph, images, footer etc.) based on a fully connected neural network (pix2pix).
Frederic Rayar et al. “CNN Training with Graph Based Sample Preselection: Application to Handwritten Character Recognition”: Reduce training dataset without affecting the performance by considering the distance between data samples through a relative neighbor graph.
Vincent Poulain d’Andecy et al: “Field Extraction by hybrid incremental and a priori structural templates”: Describes a framework for extracting information from invoices based on initial templates and which are incremental refined based on the user feedback.
Marcel Würsch et al: “Web Services in Document Analysis — Recent Developments and Importance of Building an Ecosystem”: Describes a web based framework for document analysis similar to our platform using Docker based microservices and workflows described in the Common Workflow Language (CWL).
Mark Vol et al: “Automatic recovery of corrupted font encoding in PDF documents using CNN-based symbol recognition with language model”: Describes a method that detects and corrects problematic font encodings by performing partial OCR on glyphs using a convolutional neural network.

Conclusion

Overall DAS 2018 was a very interesting event to catch up with all the recent developments in the field of document analysis. Especially the discussion groups were a good opportunity to exchange thoughts about the status quo in the research and the industrial domain. The upcoming events dedicated to document analysis are

International Conference on Frontiers in Handwritten Recognition (ICFHR) 2018 in Niagara Falls (USA)
Summer School on Document Analysis (SSDA) 2018 in La Rochelle (France)
International Conference on Document Analysis and Recognition (ICDAR) 2019 in Brisbane (Australia)