From: Apache Storm

Dynamic Pricing Platform (3/5)

Dr. Manoj Kumar Yadav
redbus India Blog
Published in
6 min readMay 7, 2022

--

It is understood that the heuristics (Littlewood’s rule and EMSRa,b) used for revenue management in flights might only be a subset of the problems that bus revenue management would need to address. It will connect the dot if this article is read after the first and second articles of this series. Let’s look at the two primary differences. First is the process of pricing seats before the booking for bus tickets, i.e. in India and many other countries the buses provide a way to pick the seats upfront with the prices. Now, what this translates into is that the protection level (y*) of the capacity at given price has to be predicted first and then also the pricing has to be decided upfront (refer second article on EMSR). Second, problem is the way to protect the seats on the routes of network as well. And this also needs to be known upfront.

Wikipedia: EMSRb Littlewood’s rule

There were two ways to move forward, one was to tryout various ways manually by partnering with business owners of buses, and other was to modify the EMSRb to fit the use case of the buses. Meanwhile, the Data Engineers were setting up the pipelines for collecting and processing data and building platform to distribute the prices to various portals where bus owners sell the travel tickets.

With various experiments and learning, and input from various functions about the robustness and flexibility into the platform, current version of the dynamic pricing platform and product revMax is built on top of it. Now, let’s bring the reasoning and choices of architecture in building the platform.

Expectation of computations:
Event based compute was needed, as this needed to be completely automated system for handling signals from various systems. The price computes are expected to be happening within few seconds, and price compute of each bus service has to be independent of other.

SLAs
There is value in computing the fares at given time only. For pricing the seats based on value and demand it is essential to respond the changes in demand before next demand arrives and capacity to sell is still available. All the external distribution systems should have same SLAs for data distribution.

Type of compute requirements
Dynamic pricing at times can make decisions based on a given signal or it may need to look at volume of data to make appropriate decisions. Essentially, three distinct category of computes are needed, first is near real-time, second is near real-time with heavy memory need, and third being batch processing for model building.

Flexibility for analysts and data scientists
This may look least critical, but looking at Littlewood’s Rule and EMSRa simulations, it was clear that there are going to be many variables that will be affecting the pricing and hence it would need many training and iterations to arrive at the models, strategy and policies. It can be understood that making too frequent or too many changes to running production may increase the chances to introduce escape bugs.

Components
With all that specifications, following software applications and languages have been used in building the dynamic pricing platform:
1. Cassandra : Cassandra has been in-house database that have been used for recording the final states of all the bus inventory. This have been picked to store the objects/bus service model.
2. Apache Storm: A preferred choice for real time analytics (RTA) systems. Similar to multi-master Cassandra for storage, Apache Storm has got ‘nemesis’ in place to have cluster of masters. Thus, overall decreasing the chances of failures on computes. One can also pick independent consumers to do similar job, but that would result in excess hardware maintenance and monitoring. Airflow was attempted to to do this job but, at scale the cost of engineer’s time were quite high.
3. Apache Spark: As mentioned in previous sections, the pricing at situations may need to process just a signal or it may need to process history to make pricing decisions. Thus, Spark comes into the picture. So, between Apache Storm and Apache Spark systems decide and pass on the jobs for pricing calculation based on the type of need via Kafka.
4. Kafka: This needs no explanation :)
5. redis: Pub/Sub is used to generate the time events. It is usually tricky to get consistent time events, but with enough parameter tweaking, it can be achieved.
6. Core Java: All the object definitions have been modeled with code Java,
7. Scala : It is used for building the Spark jobs. pySpark was looked at, but it appeared that it was largely Runtime.exec(), so we decided to go with Scala based jobs for performance and consistent usage of libraries.
8. Jeasy rule engine: Simplest rule engine available out there to integrate with. It largely takes ‘yaml’ format to put the rules and conditions. The rule engine is essential part to moderate and translate the output of model to the language of business. This also allows the analyst’s to suggest the strategies.
9. j2html : This one amazing library make the java backend engineers a full-stack engineer when used wisely. This fits well with any of java http servers.
10. rapidoid(fast http): This is one of the fastest http server available. For lightweight and quick hosting, with minimal line of codes. This largely helps in hosting APIs and web-UIs based on j2html.
11. Python: All the data processing and modeling is done with python, for reasons that needs no explanations. R could have been another choice but python gets the work done.
12. MongoDB: Amazing data store, we log everything here. It could very well be used as object store, but since Cassandra was already available, so use of MongoDB was limited to logging and monitoring cases.
13. MapDB Database Engine : Backs the web UI with local caching for the purpose for log viewing.
14. MySQL : Meta store for business level logic and configurations. With the current design relational database is expected to be utilized less than 5% of instances.

Below diagram puts all the components together.

Dynamic Pricing (revMax) Platform at redBus

Plugging in the Data Science modules to the system along with the signals comes easy with this platform. The model and signal processing components are attached to the engine but there is no tight coupling. All the modules can be developed and deployed independent of each other. The engineering team spends almost no time in maintaining the infrastructure and focus largely on top layer of the pricing engine.

Putting the Models and Signals together.

The pricing engine allows to have different models and strategies running for each of the bus services and given days. Essentially, it allows the parallel iterations of models and algorithms for pricing without impacting any of the SLAs.

Finally the “revMax” product is built into redPro-Win. redProWin is the B2B product from redBus for the bus operators to manage day to day operations, host offers and deals and many more value added services. Diagram below shows the placement of the product on top of dynamic pricing platform.

revMax in redPro Win

Next Chapter:
Sample topology, resource utilizations, scale up, and rate of computations will be discussed. We will explore the systems performance and how they had reduced maintenance time and provided platform to do the dynamic pricing in time, every time.

Single Apache Storm Topology visualization with multiple “spout”

Chapter 1: Introduction

Chapter 2: Littlewood’s rule and EMSR

Chapter 3: Technical Architecture

Chapter 4: Details & Reasoning

Chapter 5: Future Scope

--

--

Dr. Manoj Kumar Yadav
redbus India Blog

Doctor of Business Administration | VP - Engineering at redBus | Data Engineering | ML | Servers | Serverless | Java | Python | Dart | 3D/2D