The Rise of Modern Data Management in China FinTech

8 min readJun 26, 2018

Executive Summary

Finance industry drives the fintech innovation with the growing of Internet or mobile financial services. With all the technologies, data management is the key to the big data and AI in financial industry. Distributed database as an important foundation of data management architecture, has huge affect on the entire architecture.

Now,expanding financial services directly to private business and individuals becomes the top priority in China financial industry. The mobile-based technology is growing rapidly to supplement the traditional financial system.

The emerging “FinTech” such as Alipay and TenPay leads the customer engagement into a new level. Each of them probably operates the third-party payment services for over 475 million users. Under these pressure, the traditional banks embrace the modern technology profoundly and gradually to be competitive. Smart banking becomes the phenomenon that banks are in a transition from systematic to intelligent. The expectations of the banking industry are:

· Data security. None of the business and customer data shall be breached;

· Deliver all the financial service to major devices, particularly mobile phones;

· Reduce the IT and maintenance cost;

· Business continuity.

About the measure of size, the challenges are significant. Instead of hiring thousands of developers and engineers as those new “FinTech” companies did, banks are buying the products and solutions. From a vendor’s perspective, the products and solutions shall be differentiated as “enterprise class”, that it is a full package including security, support, productivity, ability to deploy in multiple use-cases, IT complexity reduction, integration capability, and policy management. To achieve the enterprise class, the vendor must own the code and have full control the products.

SequoiaDB, the enterprise distributed database was developed under this circumstance. With the heritage of the RDBMS development and long time of data management experience, they have built the product from ground zero along with the banking industry evolution.

SequoiaDB has been listed in Gartner’s DBMS report in 2017 and it is the first time ever a Chinese database vendor been listed.

Data Management Technology Evolution

The banking industry has led the enterprise-class technology for over 30 years in China. The banking systems have evolved along with the regulations on every aspect created by government in decades. Even though banking is a technology-driven business, the data management evolution is a long journey to manage the complexity of the business requirements and regulations and be compatible with the legacy systems at the same time. It shows the critical capabilities for the modern data management in banking industry are mainly in categories as below:

· Scalability and performance

· Distributed Object Storage

· High availability and disaster recovery

· Hybrid Transactional/analytical processing (HTAP)

· In-DBMS analytical capabilities

· Multi-model data management

· Distributed OLTP capabilities

SequoiaDB is customer oriented and rises gradually along with the customer needs. The roadmap of SequoiaDB is the same as the China banking industry evolution journey.

Scalability and Performance

RDBMS is the key to core banking systems. Oracle and IBM DB2 have been extremely reliable for this purpose. They have been dominant of the online transaction processing since the first day. However, RDBMS is designed to run on a single server to maintain the data integrity with ACID support. They are not designed for scale.

For example, banks were used to provide historical transaction search in only recent three months, maximum one year, for customers. Customers had to download the older transactions in the pre-processed document format such as pdf file so that the transaction data was kept in a controllable size by moving the history data in the offline backup disks. The search operation was compromised to yield for the critical core transaction processing since they were running on the same RDBMS.

The similar scenario is in e-banking systems. Customers run many operations such as browsing and searching products, comparing the price, reading the reviews, before they decide to buy. It means data queries are at least ten times more than data inserts or updates. A large number of queries slow down the transaction processing since they are all in OLTP and ACID. However, expanding the RDBMS for these queries are very complicated and costly.

NoSQL was designed in the distributed architecture to scale and achieve performance by compromising ACID support. It separates most of the queries out of the OLTP and keeps the core system healthy. SequoiaDB introduced its first version of distributed NoSQL database in 2012. The performance was ten times faster than the existing RDBMS in a large scale. In 10 billion rows of data volume, response time was within 100ms under thousands of concurrent requests. It also proves its robustness and automated scalability by testing over 1000 nodes. Therefore, SequoiaDB had been adopted in China banking industry quickly.

very satisfied without transaction and SQL support. The SQL and transaction support was added in SequoiaDB v2 responsively. Tunable consistency and multi-model support became the standard feature.

Object Storage and Content Management

Financial Services find themselves in challenges to handle a large amount of unstructured data in different types such as pictures, videos, and files. Enterprise Content Management (ECM) solutions such as IBM CM8, FileNet, Documentum are used for a long time. Same as RDBMS, they are not designed to scale. Furthermore, they can be replaced easily comparing to RDBMS.

People started to build their in-house platforms with database integrated with file-level storage, i.e., Network-attached storage (NAS). They use the database for meta data point to the file location on the NAS. This solution is deprecating later on due to its data inconsistency, scalability, and expensive cost.

HDFS is brought up since Hadoop became a standard setup for Big Data. People started using HDFS as storage for ECM. This solution is fine for archiving purpose, but it is not designed for high concurrency requests.

Real-time object data access is a new trend. For example, Banking and Insurance industries in China are first to adopt the facial recognition for supporting account open and security verification scenarios. Based on the single Massively Parallel Processing (MPP) architecture, SequoiaDB built a service layer on top the SequoiaDB object storage engine as a new distributed ECM competitor on the market, which is called SequoiaCM.

It has been proved very successful to penetrate the banking industry including tens of top 50 banks. Many traditional ECM vendors in China integrated with SequoiaCM to enhance the existing solutions.

GDPS and Active-Active Disaster Recovery

High availability is the mantra of the day. China Banking Regulation Commission has required all banks data centers to be Geographically Dispersed Parallel Sysplex (GDPS) capable, which means distributed data centers are required as standard deployment. Distributed DBMS will be the top priority for the data management modernization.

Most of the banking data centers also require active-active disaster recovery capability, also known as dual active data centers which combines both High Availability and Disaster Recovery in the approach to data storage, data processing, and data recovery. Data management can be expected to achieve continuous availability at a lower cost and to maximize the use of data centers with the least amount of effort.

SequoiaDB uses MPP architecture to serve GDPS natively. It has the flexibility to be deployed based on the customer High Availability requirements. SequoiaDB supports the dual active data centers depending on the network capacity. SequoiaDB deployment for dual data centers in the same city works perfectly. But it’s hard if the data centers are far away, for instance, data centers in different cities.

Data Centralization (Operational Data Lake) and HTAP

Banking systems were built independently. Each of them maintains their own data in RDBMS. Business requirements sometimes need to have multiple service calls to these systems to retrieve the data in pieces and then merge them into a new data result. For example, personal statement in private banking is data summary that comes from over 40 core systems such as debit, credit, and investment. This kind of low-efficient data processing would not make it possible to be a general function for customers — the end users. It would be a disaster to handle such concurrent requests across over such amount of core systems.

Data centralization, or data lake, is the answer for it. A data lake is the single data central repository to handle the high concurrency data processing. The data in the data lake is from different sources such transaction data, behavioral data, logging data, and so on. Transactional data is synchronized by database bin logs. Behavioral data and logging data can be either send by restful API or log extract tools. The operational DBMS for data lake must be ETL (Extract, Transform, Load) friendly.

There are many scenarios people want to have in business real-time data monitoring, data report and decision making. The other purpose for a data lake is hybrid transactional/analytical processing (HTAP).

Traditional architect separates the OLTP and OLAP. The gap causes difficulties over data consistency, data platforms, and skills to become large hurdles for an enterprise to adopt a new data management solution. RDBMS and Big Data (Hadoop) became the two sides of the data processing. People chose the operational then have to give up the analytical, vice versa.

The operational DBMS is then evolving, with new, innovative entrants and incumbents supporting the in-DBMS analytical capability. SequoiaDB has its Spark connector to integrate with Spark. It can be used as a data source of Spark and support Spark SQL.

SequoiaDB data analytics solution supports two scenarios — pre-defined requests and ad-hoc requests. Pre-defined requests are easy to handle, either from offline batch data processing or real-time data streaming processing.

Ad-hoc Query is very different since there is no way to know what the ad-hoc SQL is and what data related, so SequoiaDB has to use a scheduling tool to find and retrieve the data on-the-fly. A sandbox has to be introduced for the ad-hoc test run to avoid the destructive requests.

Distributed OLTP

Data scale is increasing dramatically nowadays. The banking core systems are required to keep up with the pressure, which means RDBMS needs to be scalable and distributed. For example people started working on a kind of middleware on top of RDBMS like MySQL to improve the parallel processing. Therefore, distributed online transaction processing is required when transactions need to cross multiple data nodes or instances.

A distributed transaction must be synchronized and provides full ACID support. In the application level, the distributed OLTP is transparent to the developers. Developers don’t need to change any code to be distributed, and the transactions are managed the same the local transactions.

SequoiaDB is in progress in the other direction. It is improving its OLTP support in native MPP architecture. SequoiaDB 3.0 will fully support the distributed OLTP.

Conclusion

While traditional RDBMS is still a common choice in enterprise-class data management, demands of high performance and reliability at competitive cost grows increasingly for modernizing legacy systems. The next generation DBMS is built for distributed OLTP, hybrid transactional/analytical processing (HTAP), and distributed object storage in a large scale to help modernize the data management in China.

IT spending in the banking ranks to the top in all industries of China. Hence, SequoiaDB, an enterprise-class distributed database management platform proves its success by penetrating the vertical market sector, Financial Services, especially banking industry dramatically.

Company Introduction

SequoiaDB is a financial-level distributed database vendor and is the first Chinese database listed in Gartner’s Magic Quadrant OPDBMS report.

The product SequoiaDB is a distributed multi-model database that consists of distributed NewSQL, distributed file system and object storage, and high-performance NoSQL database abilities. SequoiaDB has recently released version 3.0.

SequoiaDB is now penetrating the vertical sector Financial Industry quickly and had more than 50 banking clients and hundreds of enterprise customers in industries including government, telecommunication, Internet and IoT.