Powering OLTP system with OLAP feeder system

Background of the system

Mani sankar J
Capillary Technologies
4 min readMar 31, 2023

--

Capillary Technologies Loyalty+ is an Enterprise grade Loyalty management platform that allows brands to manage customer relationships by providing long-term strategic rewards and short-term tactical incentives. The Loyalty system maintains points, tiers, and incentives for a user/ entity which are awarded to them based on interactions done with the brand over time. In order to preserve these, three types of activities are required — Earn, Burn (redeem), and Expire.

The Earn and Burn activities are purely transactional in nature and are only triggered by an external/ internal event. These events include registration, transaction, milestone completion, or user activity. However, the Expire activity can be event-driven or time-triggered. Some sample scenarios -

  • A non-activity-based downgrade of the customer tier (slab).
  • Time-based expiry of points.
  • Time-based expiry of the slab.

The existing PEB (Points Engine Bulk — Bulk points processing) system (shown below) was a pure OLTP system and all time-based activities were powered by a cron. A cron expression was written in the system which picks jobs sequentially and processes them. Every time a job is initiated, the request is queued, post which a consumer divides the job into batches and processes them sequentially. Multiple consumers at each queue ensured some amount of parallelism.

Most such jobs need to perform heavy read operations in terms of the number of rows scanned to figure out if there is an expiry requirement. While the index on the expiry date column definitely reduces the scanned rows metric, we are doing further reads in order to create batches, population of temp tables during the batch processing etc. This leads to unnecessary load on the OLTP system which can be effectively off-loaded. According to the metrics we observed that for every update operation we were scanning approximately 7x rows.

Need for a new approach?

The existing system limits the processing to off-peak hours and requires vertical scaling of the OLTP systems.

  • Idle machine for a large part of the day — This is a cron-based system and does bulk processing of customer data. All the cron jobs are scheduled during off-peak hours to ensure the transactional systems are not impacted thereby leading to under utilisation of system resources which includes node capacity, and queuing service.
  • High CPU usage on the transactional system during bulk processing — The usage of the OLTP system spikes during bulk processing and table scans are performed leading to delays/ timeouts in the transactional processing. The OLTP system needs to be vertically scaled in order to avoid failures.
  • As of now the bulk processing happens only in non peak hours and the product only supports expiry on day level. If we want to introduce hour level expiries for flash sales or short lived promotions etc, the bulk processing can’t happen in the peak hours owing to the scans and joins that read queries do.

Note: Picture showing the spike in ReadIO when expiry jobs are running. WriteIO here seems to be equal with ReadIO. It is due to the indexes that were present on the core tables in case of updates.

OLAP feeder system

The primary objective an OLAP feeder system will solve is reducing the load on OLTP database during bulk processing. As part of our Analytics/OLAP system workflows, the OLTP data is synced to the OLAP system. The synced data is made available for processing by storing it on S3. This data can be easily queried using Databricks providing horizontal scaling for querying data. Querying using this method would reduce the load on the OLTP system. However, this system runs with a lag of approximately 24 hours due to the daily processing of ETL jobs. This limitation can be mitigated by doing a delta Sqoop from the current sqoop_master_time to desired_master_time.

Final approach

A hybrid approach to keep the processing with the cron-based system and use the OLAP system to fetch the data would lead to the desired optimisation. The OLAP system would provide the capability to query the data from parquet files in S3 using dynamically generated SQL that can be triggered on Databricks.

The OLAP system would provide a delta processing job, in order to update the delta information in Databricks since the last ETL run, which can be triggered on demand from the PEB system. Once the delta processing is complete, all data would be available in S3 and can be used by the Peb system to continue processing information. This step would remove the vertical scaling requirement of the database as the read operations can now be off-loaded to the OLAP system while the OLTP system can be utilised only for the write operations thereby drastically reducing the CPU requirements. Hence the processing could also be done in off-peak hours with little or no impact on the OLTP system.

--

--