Transforming Pricing for Oil & Gas Wholesales: Personalized Pricing for Petroleum Products

Mudit Mishra
GAMMA — Part of BCG X
16 min readNov 13, 2019

Written by Mudit Mishra, Pavan Theja & Prateep Dasgupta

Part 1: Petroleum wholesale pricing

Data-driven pricing is an essential strategy that helps boost bottom-line performance in the face of ever growing competition in the digital age.

While pricing analytics has long been entrenched in some industries such as airlines and banks, some traditional industries such as energy and industrial goods are still very new to the concept of algorithmic pricing. They have only recently embraced the data science wave. Setting prices for energy products in a wholesale market is currently in that nascent stage. The established processes rely heavily on salespeople’s experience, old and rigid segmentations, and legacy systems. In other words, their pricing strategies and practices are neither data-driven, nor dynamic, nor “modern”.

In this two-part article we describe in detail how we helped an integrated refinery make the transition from traditional practices to a data-driven, personalized pricing practice. We helped the client revamp their pricing strategy for both of their business units: petroleum (variants of diesel and gasoline) and petrochemicals (various grades of polymers and polyols.) The solution marked the first-of-its-kind implementation of machine learning in pricing for oil and gas products in a wholesale market.

The opportunities identified in personalized pricing underscore the fundamental and compelling difference between algorithmic pricing and traditional pricing. We estimated a margin uplift opportunity of between 1.0 and 1.5 cents/gallon for petroleum products, and up to $30/ton for petrochemical products.

In the petroleum business, pricing is a one-way communication, where sales teams release hourly or daily prices based on market conditions, inventory, sales targets etc. To tap into the substantial upside promised by algorithmic pricing, we took several steps. We helped the client build an extensive data platform with over 300 variables based on wholesaler behavior and market indicators. After screening key hypotheses related to wholesaler behavior, we used hierarchical clustering to identify lookalike wholesalers based on their price sensitivities, historical transactions, and profitability. Then we built daily or weekly price demand curves to predict wholesalers’ demand with respect to the available price options for that hour or day. Available price options are then fed in a linear optimizer to find a single optimized price per wholesaler based on the overall maximized profit, given the inventory constraints. These prices can then be directly shared with the wholesalers. This is essentially an optimized, personalized spot price.

As part of our comprehensive digital stack, we also handed over to the client a digital front-end to trigger the algorithms and to generate pricing quotes for customers. This front-end helps reduce the time to quote generation by 10x, gives transparency to the sales team on algorithmically derived prices, and keeps the back-end complexity low for the sales team. That allows them to focus only on the execution part of the process.

Part 1 of this article focuses on how we designed the solution for the petroleum business. But first, we need to highlight a few general challenges and hurdles in introducing data science into a traditional industry.

Applying data science to traditional pricing strategies is difficult due to the nature of existing experience based approaches, the static way of classifying customers, and the technical challenges of working with legacy systems. For our case, existing pricing across both business units — petroleum and petrochemicals — was primarily based on limited sources of information where sales teams would mostly rely on their experience or would make approximate use of this information. For example, sometimes the time to generate these prices was so limited that they would look only at selected information. It is indeed doubtful that this approach would consistently yield optimal prices.

Compounding this is the fact that traditional companies often keep the same segmentation in place for years. By definition, such segmentations — usually based on easily observed parameters such as volume purchased and frequency — cannot account for dynamic buying patterns of wholesalers. On top of this, the Legacy systems are only designed to support this group-based pricing, and not to support personalized pricing. Our client also had limited capabilities around pricing for wholesaler-specific deals proportional to deal size, time of purchase etc.

To remedy this situation, understanding the data science maturity of a company is paramount to striking the right balance between complexity, interpretability, and returns. Simpler solutions are easily conveyed and implemented, but the returns may not be as significant. Complex “black box” solutions are hard to interpret as well as challenging to implement, but the returns could go to the levels unimagined before.

Challenges in pricing for petroleum business unit

Most companies have a short-term view of pricing. They evaluate deals as “one-offs” against profit hurdles and manage discounts based upon authority levels to make “price exceptions”

Tom Nagle (Author : The Strategy and Tactics of Pricing)

The petroleum business unit has its own specific challenges. First, the product portfolio is very concentrated. Only 2–3 government regulated variants of diesel and gasoline contribute most of the revenue. Yet prices for these products fluctuate multiple times a day based on global oil price movements, which are tracked both in local and in global markets based on different indices. Three or four price changes per product per day can lead to more than 50 price releases by each fuel station every day.

Figure 1: Business dynamics for petroleum business unit (applicable mainly to refinery products)

In general, within any petroleum wholesale community, companies sell multiple products via multiple key locations called depots. Wholesalers send their fleets to depots to pick up a particular product. As part of the pre-algorithmic strategy, decision-maker takes a few pieces of readily accessible data — market price indicators and different competitor prices by each product-depot combination — and then sets prices lower or higher than the most competitive prices, based on monthly sales targets.

This pre-algorithmic process has several drawbacks that inherently prevent it from yielding optimal prices on a consistent basis. First, how can an organization make the best pricing decision given the limited available information, with the added constraint of needing to release prices in less than 30 minutes? Second, how does it make best decision when at the same time it needs to position the prices relative to the available inventory and the monthly sales target?

To bring petroleum pricing into the data-driven era and enable companies to implement it, we developed a comprehensive digital stack that includes three broad components, as shown in Figure 2.

Figure 2: BCG’s product offering (applicable mainly to refinery products)

The first component is an extensive data platform, which captures market prices, seasonality, economic indicators, exchange rates, weather parameters, and other significant data. The second component is the advanced pricing algorithm, which generates personalized prices for over 4,000 wholesalers. It is a self-learning algorithm that adapts to the market environment and can be adjusted to fit wholesalers’ behavior every month. Finally, the stack includes a digital front end that helps users define the seed parameters and trigger the pricing algorithm. It displays suggested prices for individual wholesalers and products. It automates relevant calculations and analyses, and also integrates with SAP & SMS engines to allow for wholesaler-level price differentiation.

The data platform — on its own — is not useful in a traditional pricing process, because decision makers can mentally process only a limited amount of data anyway without technological support. An attractive front end is likewise sub-optimal unless the prices and guidance it provides reflects the best possible use of available data. But, the central part of the digital stack — the algorithm — is the linchpin in the process of deriving and implementing personalized prices. In this article, we explore the advanced pricing algorithm in greater detail for the petroleum wholesale business.

An algorithmic pricing strategy helps petroleum companies make better informed pricing decisions based on available information. The algorithm can be viewed as a composition of 4 steps: data diagnostics, clustering, price demand curves, and optimization.

Four steps to developing an algorithmic pricing strategy

Step 1: Performing data diagnostics

The objective of data diagnostics is to understand and validate data sources that aid in pricing products better. The diagnostics provide in-depth knowledge about wholesaler behavior by quantifying it and by separating the relevant signals from the noise. But the determination of relevance vs. noise does not take place ex ante. The diagnostic step begins with a laundry list of all the possible hypotheses that are potentially relevant both from a wholesaler’s buying perspective as well as from a price setter’s (sales team’s) perspective.

To illustrate, let’s look at competitor price sensitivity analysis. In general, a wholesaler is always seeking the lowest purchase price, because fuel and fuel-derived products are government-regulated commodities. That means quality differences between suppliers don’t play a role. Nevertheless, we can identify price sensitivity among wholesalers by comparing product prices with those of a competitor and observing real fluctuations in the wholesaler buying pattern.

Figure 3: Fluctuation in average daily volume lift (in percent) for wholesaler XYZ vs. price differential to competitor minimum (cents per gallon)

As shown in Figure 3, a negative price differential (red line) leads to increased volume sold (green line) to wholesaler XYZ, and vice versa. The correlation between the two variables quantifies a given wholesaler’s price sensitivity. This quantification helps us understand which wholesalers are price sensitive with respect to which competitor.

Similar to the above price sensitivity analysis, we can also analyze other global and local market indicators to which wholesalers may be price sensitive. We can look for and quantify how wholesalers’ behavior patterns vary based on these variables (working assumption in parentheses):

  • Wholesaler’s size (the bigger the account, the higher the price sensitivities)
  • Discounts granted during marketing campaigns
  • Wholesaler arbitrage (wholesalers tend to optimize transportation costs across different company depots)
  • Forecasted market prices (wholesalers tend to stock-up when future forecasts indicate higher market prices)
  • Public holidays (anticipating a supply drop can also lead to an increase in volume lifts)
  • Traffic data (a particular route closure may shift volume across company depots)
  • Temporal variables (certain wholesalers may show varying price sensitivities during different time of day, day of week etc.),
  • Wholesaler profitability (high, medium or low unit profit wholesalers)

Once the hypotheses are identified, the next step is prioritizing them based on the underlying perceived value and use of quick analyses to filter out the less relevant ones. These quick analyses can include checking the availability of relevant data sources, checking the quality and time-frame of available data, and performing simple correlation checks, i.e. R-squared checks with respect to the variable(s) of interest etc.

Step 2: Clustering wholesalers behavior

Clustering will group the most homogeneous wholesaler behaviors together to permit further action and analysis. The clusters also allow a business to put constraints on lookalike wholesalers. In Step 3, we will identify different price-demand curves suitable for the clusters. But first we will explain why clustering matters and how to apply it.

Why clustering matters

Let us imagine we have data on three behaviors for each wholesaler after the data diagnostics step: wholesaler size (average daily lift volume), price sensitivities, and unit profit margin. An initial look at Figure 4 reveals two commonalities or types. Type 1 includes some large wholesalers who show high price sensitivities and low unit profit margin. Type 2 includes some medium-sized wholesalers who are relatively less price sensitive and more profitable.

Figure 4: Plotting three metrics for about 100 wholesalers (before clustering). X-axis shows average daily volume. Y-axis shows price sensitivity. Bubble size shows unit profit margin.

But understanding “general” behavior is insufficient to classify wholesalers. Simply put — where do we draw lines on the x and y axes so that we can distinguish reliably and robustly across high, low or medium price sensitivities, and across large, small, and medium-sized wholesalers? Any manually drawn line is unlikely to find the best possible separation across wholesalers. It also risks giving too much weight to bizarre or counter-intuitive behaviors drawn from a small population, or confusing apparent trends with random effects. The solution is to find a mathematical means to reveal the most homogeneous sets of behavior.

That is where clustering comes to the rescue. Clustering helps the data scientist aggregate homogeneous behaviors together and scope out various use cases where we can apply machine learning. Compare Figure 4 with Figure 5, which translates the scatter in Figure 4 into three clusters.

  • Cluster 1 (Yellow): Wholesalers with high daily average, high price sensitivity, and low margin
  • Cluster 2 (Orange): Wholesalers with medium daily average, medium price sensitivity, medium margin
  • Cluster 3 (Blue): Wholesalers with low daily average, low price sensitivity, and high margin
Figure 5: Plotting three metrics for about 100 wholesalers (after clustering). X-axis shows average daily volume. Y-axis shows price sensitivity. Bubble size shows unit profit margin.

The chosen number of clusters is based on statistical validity of the clusters and in consultation with the business. In the above example, a four-cluster solution is also possible by breaking cluster 2 (orange) into two separate clusters instead.

How to apply clustering

Now that we are clear on the “why” of clustering, let us revisit our pricing example where we identified a small number of behaviors from Step 1 (“Performing data diagnostics.”) We are trying to cluster wholesalers that show similar behaviors in response to a pricing decision. This requires a two-stage process.

First, we prepare the data to run clustering algorithms by consolidating it at the wholesaler-product level and identifying a suitable time period to calculate variables such as:

  • Wholesaler size: total volume lift during the period
  • Price sensitivity: a correlation (discussed earlier) expressed either as a correlation value or categorical dummy variables
  • Profitability: unit profit earned from wholesaler; absolute value will be best to use in this case
  • Regional arbitrage: This quantifies the tendency to optimize transportation costs across different regions. It can be a correlation value or categorical dummy variables
  • Temporal preference: This quantifies the preference across time of day. It can be categorical dummy variables across morning, evening, and night
  • Lifting latency: average number of days between two liftings

Please note that above variables are for illustration purpose only. Also, the time period should be defined by observing the variation in wholesaler behavior across multiple time periods and should be augmented with business fundamentals.

The second stage is to run a hierarchical clustering algorithm on these data to plot a dendogram (Figure 6 below) and select a suitable n-cluster solution. There are a lot of comprehensive tutorials online on clustering codes — link 1, link 2, link 3 — so we will not elaborate on those here.

Figure 6: Plot of a dendogram as a result of hierarchical clustering. Shown are a two-cluster and a four-cluster solution. Each horizontal green line in the right represents one wholesaler

We selected hierarchical clustering because it is easier to interpret and can yield consistent reproducible results. We could have also used some of the more famous approaches — Elbow method, Average silhouette method, Gap statistic method — to find the optimal number of clusters. Here are some links to implement these techniques in python — link 1, link2.

Step 3: Creating personalized price demand curves

In economics, a demand curve depicts the relationship between the price of a certain commodity (the y-axis) and the quantity of that commodity demanded at that price (the x-axis). Price demand curves are widely used across industries to perform personalized pricing.

Figure 7: An example of a demand curve shifting. D1 and D2 are alternative positions of the demand curve, S is the supply curve, and P and Q are price and quantity respectively. The shift from D1 to D2 means an increase in demand with consequences for the other variables

In this third step, we construct a demand curve for each wholesaler in order to estimate the right price for the quantity that the wholesaler is willing to buy. It builds naturally on the first two steps, because willingness to pay for each wholesaler varies by the wholesaler’s historical purchase behavior, present demand, market conditions, and other relevant variables. Figure 8 shows the significant differences in realized margins when a company uses the same underlying demand curve for flat pricing or for personalized (differentiated) pricing.

Figure 8: The margin differences between flat pricing and differentiated pricing using the same underlying demand curve.

Flat pricing (left) reduces realized margin in two ways. First, it undercharges wholesalers who are willing to pay higher prices. Second, it overcharges price-sensitive wholesalers, which means the company loses an opportunity to sell more volume. Personalized pricing (right) shows the influence on realized margin and achieved volume when wholesalers are given prices based on volume targets and their respective willingness to pay.

Demand curves are also handy when the prices need to be adjusted in real-time, given the fixed supply. They allow the seller to decrease prices when demand is low (to avoid “spoilage”) and increase prices when demand is high (to avoid “spill.”)

How to develop personalized demand curves

This follows a three-stage process: defining a modeling objective, preparing the data, and generating the curves. The modeling objective involves defining the exact price and quantity relationship at certain granularity to model. For example, we can model for a “cluster X product X depot” granularity. Our assumption is that wholesalers within a cluster for each product sold at each depot have similar price sensitivities. Modeling for similar wholesalers gives us more data points and hence, helps in identifying the most appropriate demand curve for wholesalers within that cluster. We also assume that different products and depot locations have different effects on wholesalers’ buying behavior due to varying competition for each.

We then need to prepare the data at the wholesaler level for a particular time period. The starting time period in general should be the most recent and also cover multiple seasons. One can estimate the appropriate time period by modeling multiple time periods (e.g. 6 months, 1 year. 3 years, 5 years etc.) as training data, and then comparing model accuracies for each model (R-squared, in-sample or out-of-sample MAPE etc). For our client in this case, the models were based on a two-year timeframe, because accuracies did not improve for longer periods. We then prepared over 300 variables for each cluster at ‘wholesaler X product X depot’ level.

To generate the demand curves, we used stepwise linear regression and random forest for variable selection. We then fitted a simple multivariate linear regression on the most important predictors with quantity as the dependent variable and price differential to competitor minimum price as the main independent variable.

We chose regression for its interpretability and ease of implementation. We also tried multiple transformations across variables. Out of the 30+ models that we built across all the clusters, log transformation on quantity (log-linear models) revealed better predictions and fit. To update model coefficients, we refresh the equations every month and use rolling two years data for training.

Here are a couple of good links on modeling a demand curve using multivariate regression — link 1 , link 2, link 3

Figure 9 shows the final model accuracies and the list of significant variables. R-squared varies due to limited data availability on small wholesalers and due to other external factors that were not part of the model (e.g. traffic data, other global market indicators etc.)

Figure 9: Model accuracies and the list of significant variables

Interpreting the daily demand curves

In order to track daily demand, we evaluate each equation to see how demand would vary due to market conditions for that particular price setting. Figure 10 shows how daily demand curves look and how the theoretical maximum demand changes with respect to the main price variable.

Figure 10: Showing two value levers generated by correctly shaping the demand

If the prices are too high (>1.5 cents per gallon), the demand shifts to the competitors. There is a minimum price point where the maximum theoretical demand can be achieved, i.e. there is no reason to give additional discounts to wholesalers.

Step 4: Optimization

In case of petroleum wholesalers, the end goal of the first three steps is to provide one price for each wholesaler every time there is a price change. The wholesalers then choose to buy more, less, or not at all at the given price. Once we have the demand curves for each wholesaler, the job is still not finished, however. A demand curve is a list of price and volume pairs, but it does not say anything in isolation about margin effects.

Optimization finds highest combination of ‘Premium x volume’ for all wholesalers. In other words, optimization prioritizes wholesalers that are more profitable than others but, at the same time, takes into account that these wholesalers will generally have smaller volumes and may not contribute sufficiently to meeting the required sales volume target.

The optimization step looks beyond maximizing the objective function, which in this case is the overall profit margin. We also need to establish constraints such as volume targets, floor/ceiling premiums etc. as inputs to the optimizer. These constraints define the boundary conditions within which the optimization algorithm searches for a single optimized price for each wholesaler. Figure 11 below is a visual representation of a 3-step process of implementing an optimization algorithm:

Figure 11: Step-by-step process for optimization algorithm

We used linear programming for our optimization tasks for its simplicity and ease of implementation. Here are a couple of links to implement a linear optimization algorithm in python– link1, link2, link3

It is important to note that implementation of the above four step process requires a fully automated solution where applicability of each of the steps is thoroughly examined before a full scale launch. This precaution is necessary to avoid any client specific constraints which can render some of the components irrelevant and hence, can impact the feasibility and accuracy of the solution.

In the second part of this two-part series, we will apply a similar four-step approach to the petrochemicals business unit, which faces different challenges because prices are negotiated rather than determined on a dynamic, spot basis.

--

--