The architecture of the CortexDB in comparison

As a multi-model database, the CortexDB is designed to manage the data based on different database paradigms. This offers simple possibilities for a wide range of applications in various areas.

This article was created with the support of Cortex AG and Deepl.com and is also available in German.

Overview

The core of the CortexDB is a schemaless document store and a multidimensional key/value store. Graph functions, temporal validity per field content, multiple use of single fields and also the storage of binary data (files) and JSON objects are available for use by users and developers.
The CortexDB therefore differs considerably from relational databases. At the same time, it offers the core functionalities that are also used by search engines and full-text search functions. Thus the CortexDB offers flexible and agile functions for the conversion of data base applications for a multiplicity of different ranges.

The basis of today’s database applications: relational.

Several types can be distinguished in the area of databases. The best-known are relational databases that work on the basis of tables. The columns always have the same structure (schema) for each data record. If contents are not available, the corresponding fields of a data record remain empty. The following (short) summary illustrates why the CortexDB is very different from this and which advantages result from it:

To save storage space through duplicate content (redundancies) and to speed up queries and updates, data records are split (“normalized”) across several tables. Only the contents of a field show which data record is in the other table (using the foreign key that exists as a unique ID in the other table).

The field “Company” is the foreign key and can be used more than once; the field ID is uniquely assigned to a data record in each table. The second table shows the “target data set”.

The query language SQL (“structured query language”) was originally developed for relational databases. Depending on the manufacturer this language is available in different dialects (“derivatives”), so that an exchange of databases is usually connected with very high time and effort, if the respective SQL dialect and also other functions must be adapted.

If further information is to be stored in a relational database, a corresponding number of tables may have to be created or adapted. Information on different objects is therefore stored in different tables.

The complexity of the tables therefore increases with the avoidance of redundancies, the number of different data record types and the information that may be connected to each other. It should be noted that there is no uniform specification for setting up relational databases and that only the stages of “normalization” provide a possibility of orientation. However, these are by no means obligatory to use.

The described relational approach also includes the topic of “index”. This is another structure in which only selected columns of a table are stored sorted (often in combination as a combined index). Such an index is used for fast search of data records with the help of different algorithms. If no such index exists, each data record would have to be read sequentially to find certain contents.

For the index in relational databases, there is no uniform specification and no universal schema. Database managers (or developers) are therefore dependent on a precise definition of the end application in order to find a suitable architecture and to be able to implement changes in existing databases as ideally as possible.

Alternative to the index of relational databases: inverted index

Especially for large amounts of data (“big data”), for text search (e.g. search engines, Wikipedia, etc.) and also for other applications, the concept of the “inverted index” is often used (e.g. with the Lucene and/or Apache Solr solutions). A sorted list of terms (index) is formed from documents in an indexing process, in which the locations (the documents) and the positions (of the terms in the text) can be read as another sorted list.

If individual search terms or a combination of terms are searched for, it is very easy to see where these terms were saved. The more precise the delimitation is (e.g. with a phrase in quotation marks), the more accurate is the evaluation of relevance.

Terms from different documents are stored in an inverted index to be able to search it and provide relevant results.

An inverted index is therefore a content-based index in which each term only exists once and for which a list of occurrences is stored. This is also used in an adapted variant for the CortexDB and combined with other functions. In particular, the inverted index is updated each time the database is changed and is available for each attribute of a data record.

This shows that “simple” applications which only rely on the inverted index are not necessarily transaction-safe and that they are not database applications. The CortexDB and the CortexPlatform offer exactly these application possibilities with simple configuration, so that business departments can often configure the standard applications of the CortexPlatform independently. The database functions and APIs mentioned above are available for developers to design individual applications.

CortexDB compared to other databases

In contrast to relational databases, the CortexDB stores all data records in a schemaless format. There is therefore no predefined structure for data records; rather, a memory area is defined in which the data records are stored as independent objects (as so-called containers).

If all records of a relational database were stored in a single table, a field would have to exist for each piece of information.

However, this structure would contradict the avoidance of redundancies and the reduction of storage requirements. In addition, too many fields would be empty, which would also have to be managed by the database.

By avoiding the given table structure, the above problems can be solved. The only requirement is to select a storage for the data records that allows a free structure — a “schemaless” storage. This is made possible by document stores.

A “document” is any type of “record container” in which fields and contents are stored equally. Due to the missing specification of a schema, as with tables, the field information for each information must also be available in each data record. This is the case, for example, with the formats XML and JSON. Therefore, there are several database solutions that use these formats to store data records.

Due to its history, the CortexDB uses its own container format in which the information is stored. A direct transfer from an XML- or JSON-based application (or csv files and other databases) is therefore only possible with a transformation. Import tools and APIs are available for this, so that applications can also use direct access to the database. The return is always a JSON object.

The basis of future database applications: CortexDB?

Within the CortexDB you can work with imported transaction data in the same way as with aggregated data. Transformation is therefore only necessary during the “loading process” and is taken over by the import mechanisms.

The import process therefore transfers the data records from the data sources to the document store of the CortexDB. Link structures (graphs) can be set up there to map hierarchies (parent-child references, long-term shareholdings, and similar), for example. When you change a data record (create, edit, delete), a transaction also includes updating the Key/Value Store.

This extends the possibilities of an inverted index by additional dimensions. In this index, the individual attributes (fields), the contents and the list of IDs of affected data records for each content are managed in sorted form.

“For each content (value) it is known in which fields (keys) and data records (Doc-ID) it exists and for each field it is known which different contents exist.”

Please note that the Document Store, as well as the Key/Value Store, can be located on a hard disk, on a SSD or in the RAM. A corresponding configuration and relocation is possible.

In the Key/Value-Store each field and each content is managed only once. This also applies to past (and future) values of a field (history). Thus, this type of architecture is not only an inverted index, but the highest normal form that cannot be used meaningfully in relational databases.

Only this key/value store (or field index, inverted index) is used for each selection and also for simple analyses of field contents.

Note that selections are always made in the working memory. The intersection of two ID lists (e.g. “Name” and “DSType”) and the result list (“Result”) must therefore be managed in memory. The internal length of the ID’s is 12 bytes (24 bytes represented as UTF characters). If two sets with 100 and 200 IDs each are combined, the basic set is 300 x 12 bytes plus the result set, which must each be managed in the working memory.

Since the selections are only made in the Key/Value Store of the database, it is not necessary to read individual data records. This is also taken into account in the rights and roles concept, so that users can be granted read, write and selection rights.

As a result of the separate access rights to the Key/Value Store, other methods can also be executed. For example, it is possible to view and analyze field contents.

If we look at the above example in relation to the “Income” field, we see that there are four different values that are used in five different data sets. This makes fundamental analyses possible.
As in any other field, the distribution of values can be seen in this field. In addition, for example, the sum of all values can be determined: 287.001 or the median value: 57.300.
Further analyses are also possible. For example, the deviation from the median value can be calculated for each value: For the value 65,000, the deviation is 7,599.80 — many further analyses can therefore be implemented.

Conclusion

By combining different database models (multi-model), the CortexDB offers the basis for fast and agile project realization and the use for large amounts of data. The potential of this database solution has not yet been exhausted and there is still much to be expected in the future, which will not only be implemented by the manufacturer itself, but also by other developers and partner companies.


This article was created with the support of Cortex AG and Deepl.com