Analyzing data for automatic datavisualization
In the world of technology, data is the most important resource. So, processing and put data in statistical form become more important than ever and currently many software exist (Excel, Google sheet …) but most are still closed source products and some of them are not good enough to analyze the data of user and provide statistics from them as you can see in the image below:
With OpenData module Linagora aims to develop an efficient open source product capable of processing data and provide statistics from them.
So What Is OpenData Module ?
OpenData is an OpenPaaS module which processes data and presents them in statistical form through charts.
Currently at the stage of research & development, it is capable of processing data and building charts automatically for users. In a near future it will evolve in the form of an open data portal integrated into OpenPaaS.
Ideas and activities of OpenData
As you know, we have many types of data stored from large database managers to small files.
So is there a way for the computer to know what the data is about and show it in a way that every user can understand? That is the problem need to be solved by OpenData.
These data can be about time, number or anything, but ultimately all of the data consist of three main categories:
And since then the problem has become the identification and presentation of data.
Features of OpenData
As above, OpenData will have two main features: identifying and presenting data
Let’s look at simple data:
Identify these data into a form is a part of OpenData:
But the actual data is not so simple like that, let’s look at another example:
The data have 3 columns, each representing a different data type. You can see the first column is the “text” , the second column is the “datetime” data type and the third column is the “number” type. As in the example above, we will have to identify each record of the column, determine which data type of those columns and return it to the table form.
For the data which type is “datatime”, each country, each region will have its own format of time, so we need to identify the format of the data for processing.
Not only that, you can see that with the data on the third column containing the whole unit of data, so with data like this OpenData requires identifying the unit of data.
In addition, the data has a special thing that the data in its first two columns is repeated. Identifying repetitive data is of very importance, so many data in reality has thousand of records but in actually it just has fewer than 10 records that are repeated in all those records and then the representation will become much simpler.
The above examples do not cover all the actual data but are very basic examples of what to identify in the data. So that is how OpenData works. There are still a lot of types and attributes of data which OpenData must identify but it will be updated in the later versions.
The mission after identifying the data is to represent them:
OpenData will display the identified data as graphs, giving users the most complete view of their data.
The question of representation is to determine which kind of graph can be drawn with that data and how to draw if it is feasible.
In addition, the most important issue is to not present to numerous graphs to the user (we can not display 1000 graphs and then users scroll through 999 graphs to see the last graph is the one they need), the graphs must show what it represents best (eg, time-related data is most likely to use a line chart than a pie chart).
That’s how OpenData represents the data. In the future, with the help of AI, Machine learning, Big Data, … OpenData will be a great application for business.
OpenData is completely focused on the identification and presentation of data, so it is expected to provide better quality than today’s multifunctional data processing software such as Excel, LibreOffice…
OpenData is still under development so there are still a lot of drawbacks, such as slow computation and insufficient processing of large amounts of data. But these problems can be solved in the future with data processing technologies such as BigData.
For the purpose of building an application that automatically identifies and demonstrates data in the most appropriate way, our development team has researched and developed this project, but in the process of development has a lot of problems that we can not even have predictation on, but finally, after analyzing and finding a solution, OpenData has its first form.
There are still have many problems to be solved, but it is hoped that Linagora in particular and the development community will contribute to develop OpenData so that it can soon become a great open source application in the near future.