Expanding physician reach for biopharma - a case using machine learning

Gagan Bhatia
5 min readJul 8, 2020
Photo by Luis Melendez on Unsplash

Discovering new growth opportunities is a key strategic question during the growth phase of a product lifecycle. Marketers are often faced with the challenge to expand reach to potential prescribers while attaining the required ROI on their spend. Hence, there is a strong motivation to improve data driven targeting or expanding reach methods. In what follows, we introduce a relatively novel method for biopharma to reach a large prescriber base using machine learning based look-alike modeling.


Traditional methods for finding targets have included looking at the growth potential using market sales, relevant patient base, and/or creating regression models. Essentially, the experience with a smaller audience is expanded to a larger audience using basic calculations. While these methods are commonly used, they often are not able to capture the full effects of the broad range of factors that are now available across multiple types of data.

The static demographics data or location data based reach expansion is not new and routinely used in other industries such as Retail and CPG. However, the application in biopharma requires a deeper domain understanding and nuances of the data. In addition to the static data, we built the model utilizing the physician touch-point data such as calls and sample volume. We applied this approach to a brand for a company that was in its growth phase and achieved a relatively higher model accuracy, while maintaining the explainability.

Case study


The brand had captured the initial launch uptake, and a few physicians in segments not initially targeted. The goal was to see if the small adoption can be converted to a larger prescriber base. The expected outcome was a prioritized list of prescribers characterized in segments.



In look-alike modeling, typically a known seed audience is chosen. In our case, the known seed audience are the physicians who have previously prescribed the brand. These physicians were very different in terms of patient mix, frequency of prescription, other treatments, demographics and various other factors. Finding similarity in the seed group using descriptive analytics for many factors (many of which showed positive/negative correlation) was a formidable task.


Private data sources, such as insurance claims, sales, managed care favorability were used. Additionally, internal promotional activity data provided the level of promotion to physicians. These included physician touchpoints such as call volume and sample distributions. We also captured publicly available data such as the Google search trend for the brand, to capture geographic effects.


We used a machine learning based look-alike model. The model was a classification model which is composed of two classes: physicians who prescribe (Class 1) and physicians who do not prescribe (Class 0). We used about 10 ML algorithms to train the model using 50+ initial variables including buckets of physician demographics, patient mix, patient co-morbidities, payer mix, and promotional activity. Physician prescribing behaviors of competitive brands and early adoption of new products were also considered (and they turned out to be an important factor).


We held out a set of physicians out of the training model (test group). In the hold out testing, we found that the model was able to correctly classify the prescribers and non-prescribers with an accuracy of 80%. Other metrics including feature importance, and AUC were also compared across models.

Outcome/ Takeaways

Business Impact

We used the model to classify more than 20,000 prescribers (unseen universe who have not prescribed the brand before). For each prescriber, we also derived a propensity score from the model which enabled us to identify top prescribers, with highest propensity to prescribe the brand in the coming period.

We were able to successfully create a target audience of top 2,500 prescribers. This gave the company a priority list to plan their sales and marketing campaigns. While this needs to be further measured, our initial estimate using actual historical data gave an uplift of 1.5–2X and 5x in terms of expected response from the target audience when compared to a purely volume-based targets and a randomly sampled audience, respectively.


  • We ran a feature importance analysis on the model outcome and found that the top feature which affected the prescribing propensities, were high volume of patients, certain comorbidities in the patient mix, and physicians previous adoption of new drugs. The high-volume prescribers got more chances of prescribing the brand, and thereby the patient volume came up as one of the top features. However, the model was able to capture complex relationship between prescribing propensity and patient volume in presence of various other factors. In other words, the highest volume physicians were not the highest propensity prescribers due to several other factors (such effects are hard to capture in predominantly patient volume based modeling).
  • Although we used a wide variety of prescriber features, a set of behavioral elements was not fully captured, or was limited due to lack of data or sparse data, e.g. the physician’s underlying belief in the effectiveness of the brand. However, we were confident in the model given the overall 80% accuracy.
  • While this study was done before the current pandemic, one can imagine environmental or geographic factors that may affect a physician’s patient volume, prescribing behavior or willingness to try something new, that could also be captured in the data through prescribing in other markets, or the severity of the pandemic in their geography. Some of this data is now publicly available to draw features from.
  • The model showed high promotion or touchpoint sensitivity as most prescribers had some type of touchpoint. This inflated the propensities when a future promotion assumption was applied. However, when assuming similar promotion levels, the relative value of prescribers was correctly established. This is something we will talk more about in future blogs on multi-touch analysis to optimize promotional touchpoints for physicians.

Till then, stay tuned !!

Gagan Bhatia is the founder and Principal at DataStride Solutions, an analytics based strategic consulting firm focused in biopharma. This article was originally published at www.datastridesolutions.com

Anshuman Lall, PhD. is the founder at Predmatic, a data science and artificial intelligence based consulting firm, which provides high impact and scalable business solutions.