What’s the new fashion of data?

In recent years, data has been a hot word. The data scientist is named as the sexiest job of the 21st century. And according to International Data Corporation, the global Big Data and business analytics market reached approximately $189B in 2019 and is expected to expand dramatically to $274B by 2022, a ~13% CAGR over the period.

Studying history in college and working in the communications industry, I use to think data isn’t a part of my life. However, everything is data now and all the user-facing businesses are data-driven. No matter what areas you are in, you cannot ignore the power of it.

Last week, when I did the interview with my developer friend for the homework, he mentioned one new trend in the database field exciting him, which is hybrid transaction/analytical processing (HTAP), and it also intrigued me. It means to merge the distributed transactional database and analytical database into one. This integration can make a lean team as the companies currently need at least three teams to manage the front-end transaction database and back-end analytical database separately as well as the Extract-Transform-Load (ETL) in the middle. Also, it speeds up the iterations. And the speed of iteration is the final objective to build software.

Analytic databases are purpose-built to analyze extremely large volumes of data very quickly and often perform 100–1,000 times faster than transactional databases in these tasks. It wasn’t until 2004’s founding of Vertica that a modern analytic database came into being. Since then, the market has exploded.

Usually, companies need a data warehouse to store and analyze all that data. It is a central repository of integrated data from multiple disparate sources used for reporting and analysis. Most business applications store data in an OLTP (On-Line Transaction Processing) database, which is accessed by numerous users to perform fast, simple queries. For example, the Point-of-Sale system at the cash register, the teller storing data for each transaction at the bank, and the purchase confirmation on an e-commerce website all use the OLTP database. Each Transaction is not necessarily a business transaction where money exchanges hands, although it can be. OLTP is designed to store day-to-day business transactions and is well-suited for querying specific records, such as the email address of a specific customer. Thousands of such queries can be run simultaneously on an OLTP database.

However, people who are higher up in the hierarchy of the company usually do not need all the details of transactional information. They need a bigger picture and a strategic view of business data. Here is where analytical information is used as queries start to get increasingly complex and require aggregations among numerous tables. For instance, a query for compiling a monthly sales report, year-over-year profits, current balance of a financial portfolio or current inventory levels is best suited for an OLAP (On-Line Analytical Processing) database, which provides a multi-dimensional view of enterprise data rather than a transaction-level view. OLAP is used for managerial analysis and decision making. And unlike OLTP, OLAP database is supported by the Star/Snowflake Schema database architecture methodology.

Transactional data, however, is an integral part of analytical data. If we do not have good records of daily sales, we cannot compile a useful report to identify trends. That’s why efficient handling of transactional information is very important. The main purpose of a transactional database is to ensure the accuracy and integrity of information and allow having all the up-to-date information readily available. A transaction is usually issued in a special language, SQL. And OLTP requires very reliable and durable computers. A little glitch in a network may cause a lot of technical and financial distress for a company.

Together, OLTP and OLAP form the two sides of the data warehousing coin. OLTP systems are the original, disparate data sources across the enterprise while OLAP systems integrate data from these transactional sources and present a multi-dimensional view for reporting and analytics. Traditional application architectures separated transactional and analytical systems. That’s why two dedicated teams to manage and monitor the transactional data are required.

While in 2014, Gartner Inc., an information technology research and advisory company, created a term — Hybrid Transactional and Analytical Processing (HTAP). It is used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing. HTAP solves the issue of analytic latency in several ways, including eliminating the need for multiple copies of the same data and the requirement for data to be offloaded from operational databases to data warehouses via ETL processes. As business moments are transient opportunities that must be exploited in real-time, if an organization is unable to recognize and respond quickly to a business moment by taking fast and well-informed decisions, then some other organization will, resulting in a missed opportunity (or a new business threat). HTAP allows advanced analytics to be run in real-time on “in-flight” transaction data, providing an architecture that empowers users to respond more effectively to business moments.

But the main technical challenges for an HTAP database are how to be efficient both for operational (many small transactions with a high fraction of updates) and analytical workloads (large and complex queries traversing a large number of rows) on the same database system and how to prevent the interference of the analytical queries over the operational workload. Although HTAP functionality has been offered by database companies, such as Alibaba DRDS, Microsoft SQL Server, Oracle 12c In-Memory and Amazon Aurora (Parallel Query), it still has limited industry experience and skills, as well as best practices. But when it can be used well by all the organizations, the world will be much more efficient.

Reference:

--

--