Digital transformation has become a top priority for companies and this trend is about to accelerate even more in the light of the current situation with COVID-19. As part of this development, companies and public institutions make an effort to digitize their documents and therefore collect a vast amount of data every day.
Which may pose the question: Is there a point where we will no longer work with documents?
Understanding Documents: Human vs. Machine
First, let us contrast how machines read and understand documents to how humans understand documents.
Machines can only process structured data. For a machine, structured data are strings of numbers or characters. The information is represented in a way, that they can just process it, as the logic is given, and all information is explicit. Documents such as a scan, pdf or email are generally more complex, a lot of information is implicitly given. The great variety in layouts, structures, inherent logic and context knowledge makes it extremely difficult for automated processing.
That is where humans come in. As we have been taught how to read and understand information in documents all our life, we can grasp and analyze documents “on the fly”.
This makes human great at processing unstructured data, which most documents are. In the process of digitization, new technologies such as machine learning become better at accessing unstructured data and making it processable for machines in general.
The human: a never-ending source for documents
Is the end of documents near?
There is one point, that makes this kind of scenario difficult, if not impossible: it is the creativity of humanity. As long as we have ideas and want to exchange these with our colleagues and friends in a written form, we will make use of documents as means of communication. Creativity and interpretation cannot be broken down to a code since they require human interaction and exchange.
We come across a problem and try to find solutions using previous experiences together with outside inputs, partly in form of structured data, partly in form of human feedback. This is just one example on how we analyze, conclude, and create new things in a way, that is too complex to break down in a standardized string of numbers.
A full hard drive instead of a mountain of files
Therefore, it is unlikely that documents will disappear anytime soon. What we can expect to happen is a decrease of documents in their physical form as companies and other institutions advance on their digital journey. On the flipside, the number of digital documents has increased dramatically in the past few years and is expected to grow almost exponentially in the case of unstructured data.
The reality is: around 2.5 trillion PDF-documents are created each year and in 2019 there were around 293 billion emails sent every single day. The IDC (International Data Corporation) estimated, that in 2025, the worldwide data production will reach 175 zettabytes, which is ten times the amount of 2017 and most of it is unstructured data.
So how do we prepare for a future with such a great amount of information? One of the first steps has already been initialized by many businesses: the digitization of their documents. As a next step, you want to look for a smart system that allows you to handle the massive number of documents and to transform them from unstructured to structured data. And while the machines may not be able to fully replace us in reading and understanding documents, they can at least help us in organizing them and making our lives a bit simpler.