“Hotel Price Recommendation @ MMT

Dynamic Pricing is a strategy where prices change real-time considering supply, demand, need, user-behaviour. Dynamic Price Optimisation Models are used to tailor pricing for customer segments by simulating how targeted customers will respond to any price changes. In this blog I would site how we at MakeMyTrip are trying to build our Dynamic Pricing Engine.

We love our customers and they love it whenever we get a great deal for them, dynamic pricing isn’t just about raising the prices- it often leads to lowering them.

We are building an automated self-learning system which should be able to refresh the prices in near real time. Along with profit maximisation, building such engine helped us to understand market trends, helped hoteliers to clear out their slow moving inventory and even helped our customers to pick bookings based on the prices they are willing to pay.

General Revenue generation comparison

We started tracking certain signals which impact Hotel’s price at run time. E.g

  • Understanding Demand & Supply trend in the market.
  • Customer segment by his nature of visit.
  • Advance period when the customer is booking.
  • Where we stand to our competitors and many more ……

While analysing the inventory and booking data we found few interesting anomalies. The bulge in the below graph, pink line, depicts how hoteliers have actually closed the online inventory, while the online booking has increased ( as shown by blue line).

Many such anomalies were observed in 2017, where the hotels got booked much in advance from the travel dates.

Our Approach

We are building a predictive model which finds booking or exposed inventory anomalies for the advance purchase of next 90 days window. To be specific we used MAD (Mean Absolute Deviation) model. Its a well known error measurement statistic and we customised it further to deduce the amount of deviation we see in our booking data. This deviation then actually controls the surge signal strength.


Complete infrastructure is hosted on Amazon cloud and excessively using their server-less and on-demand capabilities.

Airflow- Airflow is a platform to programmatically author, schedule and monitor workflows. All our data pipelines are built, managed and scheduled using Airflow. Some of its out-of the box capabilities are Scheduling, alerting, retry, logging and a rich interactive UI.

EMR- Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data. Airflow provisions EMR clusters on-demand.

Spark- Apache Spark is a general-purpose data processing engine to rapidly query, analyse and transform data at scale. We are using spark to process both batch and real time streaming data.

S3- Amazon Simple Storage Service is highly scalable, reliable, fast, inexpensive data storage for the internet. We are using it as a data lake where all the raw events, ETL dumps and (partially) processed data is persisted.

Confluent- Confluent provides a better-packaged distribution of Kafka, data Sink and source connect, Kafka streams processing and many open source tools.

Athena- Athena is a server-less interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL.

Superset- Superset is an open-Source, code free and interactive BI visualisation tool.

Kafka Mirror- MirrorMaker is a stand-alone tool for copying data between two Apache Kafka clusters.

Redis- An open source, in memory data structure store used as database store, cache and message broker.


Key considerations

  • Data Accumulation involves collecting data from multiple sources in a centralised, scalable, easy to access place. We have our data lake build on Amazon S3. Our Data pipelines (configured in Airflow) pulls inventory availability from Hotelier’s Inventory system. Currently the platform is working on hourly and daily snapshots of the inventory but eventually should be a learning model to factor-in the popularity of the city, area or hotel. The inventory data is stored in PARQUET format for performant querying. Below graph shows a quick comparison of PARQUET with other data formats.
  • Processing is done using Spark enabled EMR Cluster. All inventory data is sliced and aggregated at city level, hyper-location level and hotel level. Aggregations are analysed over a historical time series data for inventory consumption. Anomalies in bookings are generated as pricing surge/dip signals. The signals are persisted in S3 and made available with low latency through a Redis cluster.
  • Availability: The signals (generated from Spark jobs) are persisted in S3 and made available with low latency through a Redis cluster. We build our Dynamic Pricing Rest APIs on top of REDIS CACHE, with code formulated in RxJava. Serving signals at ~15ms (95th percentile)
  • Visualisation: For all data driven platforms the biggest challenge is the ease to analyse the data. In our case all model iterations were data driven, therefore the ease to analyse the data and draw patterns on it was a pre-requisite. We used open source SuperSet and Amazon Athena to run SQL queries and paint graphs and line charts.

We have seen some promising results of DP on our conversion and revenue. As shown,the system is able to maintain conversion intact, in fact much better with tremendous increase in profit margin. Expected Conversion shows what we would have achieved with static pricing mechanism and Actual Conversion is, what we actually achieved integrating flexible pricing in our system.

As I said, we are still in learning stage and more stable system to evolve which will be beneficial both for our customers and industry.

To Summarise, Dynamic Pricing is an innovative, flexible pricing mechanism which helps business to respond nimbly to market trends, and thus is within the bounds of what consumers already accept as market dynamics and even embrace it.

Contributors: Shashidhar Singhal, P Gautam, Piyush Mittal, Manish Swarnakar