How we used medicine searches to uncover health trends

Published in

Tata 1mg Technology

6 min readMar 3, 2017

As India’s foremost consumer health platform, 1mg powers ~100 Million healthcare searches for users in India every year. Users search for medicines or tests (prescribed by their doctors) on our platform for a variety of reasons:

To learn more about composition of a medicine, side-effects, usage, interactions and key things to keep in mind to maximize effectiveness of medication (India lacks a robust pharmacist infrastructure)
To discover cheaper substitutes (multiple brands exist for the same underlying drug salt often with wildly varying prices)
To order medicines or tests on our platform or chat with a doctor online

We’ve often wondered if we can identify and, potentially, highlight some interesting public health trends from these user searches. Over the past couple years, as our platform grew, doing such a thing started looking increasingly feasible. Additionally, a couple of incidents really helped driving the realization home:

One morning during the monsoons of 2016, a colleague casually observed that the platelets count test (key indicator for dengue) bookings had surged on our platform. However the local news outlets were still not reporting a major dengue outbreak. The trend on our platform, however sustained, and the news outlook worsened 2–3 days later with reports of hospitals overflowing with dengue patients.
Almost an year back while looking at medicine searches in Hyderabad, we observed that searches for viral encephalitis drugs as well as reminders being set for them (we have an adherence feature on our app as well) had gone up. While there was no public data available in this case something like this bubbling up was indication enough for us to start thinking about Trends.

Towards fall of last year we built the first product on our data stack: 1mg-Trends. 1mg-Trends attempts to identify rising (and falling) trends based on what users are searching or browsing on our platform. It currently covers some of the most important entities on our platform such as medicines, drug-salts, OTC (over the counter) products, therapeutic classes etc.

Our work has been fairly effective in identifying public health trends on the platform. For e.g. we noticed a spike for air-pollution masks in Delhi after a not so eco-friendly Diwali.

Or during the most recent Valentine’s day sexual wellness items showed a considerable increase.

Other trends observed have been around seasonal influenza, broncho-dilators (during the cold season), etc.

Trends, as we scale, could be a very useful input for public health and we hope that the ecosystem would use this interesting information to take timely action. While these are still early days and there will, without doubt, be a few false alarms, we believe, it is better to be prepared and time is often the currency that is most valuable — the difference between a controllable outbreak vs a public health disaster.

How we built Trends

Data processing

We started out by building our data pipeline. While we evaluated various options for data collection we eventually settled on API log processing. Every product page is an API call and the request log is as good as an event (only verbose). There are several advantages to API log-based events (as opposed to client events):

Client independent
Multi-platform from the get go (as opposed to building event loggers for all platforms)
No data usage on the client
Capability of potentially capturing any interaction between users and us

Lua (a scripting language on top of Nginx/OpenResty) was used to push API request and response logs to a Redis queue buffer.

Once we had the logs we built a post-processing system. A daemon script popped log messages from the Redis queue, flattened nested objects, performed data sanitization and cleanup, and uploaded the processed logs to Amazon S3. There onwards, everyday a “smart” script further processed this data to identify user sessions/actions and stored browsing details for each user and entity in a database.

We chose Amazon Redshift, a column-oriented database system which is great for performing quick aggregations on large amounts of data. We store the count of daily views of each entity (drug, salt, OTC product, article). In addition to these core entities, our medicine database also have a set of derived entities (e.g. therapeutic class for a drug or diseases). By exploiting Redshift’s relational model we get similar scoring data for derived classes by simply doing a join.

Scoring

We then computed top-trends based on a rolling Z-score metric once we had the daily counts of each entity for the past n-days. The Z-score accounts for the mean and variance of the distribution, thus a fair comparison between two distinct entities is possible even if one of them is more popular. Moreover, the rolling aspect of the metric attaches higher weightage to recent values than past values, which is preferred for computing trends.

Optimization

Despite Redshift’s fast performance our APIs were still slow. This was primarily due to additional joins and wasteful processing of data (why process for all when only the top-n need to be displayed). To do that we processed everything on an ‘id’ basis and instead of performing an expensive join to get entity details (such as name, tooltip details) or location details (pin-code to city and city to state mappings) we stored them in a dictionary in Python. This was possible because this data does not change often so we can afford to take a daily dump of such data and store them in ‘pickles’ instead of querying for it every time the API is called.

The Trends home page involves fetching the trends for 10 different entities (out of which half require further joins). To combat this we introduced a caching layer which makes use of Redis, which caches every API call based on the request parameters. This ensures that frequently viewed trends can be returned quickly and do not burden the database. Moreover, we felt that the Redis instance mentioned above (the one which acted as a buffer for unprocessed logs) was underutilized and deserved another job.

To give Trends a 2K17 look, we added an experimental voice search functionality to it. Tooltips were also added on the home-page over the entity names so that average users can easily access basic information about the top trending entities.

Future Work

While we are excited about the first Trends product it’s clear that there’s a lot to be done.

An immediate area of work is to move from absolute search numbers to relative ones. This ensures that our trends can compensate for overall traffic trends instead of showing everything as trending just because the site traffic is growing at a certain clip.

A further area of investigation is to automatically detect the reason behind the trend of an entity using correlation with data from other sources such as weather, seasonality or marketing. For e.g. a cursory google search will reveal that seasonality can cause a spike in respiratory disorders and hence a corresponding increase in searches for related medicines.

Another potential area could be to increase the time-duration of the Trends being surfaced. We intend to explore techniques and algorithms to identify key trends for the whole year (e.g. spring was allergy season and fall was Flu).

Give 1mg Trends a spin here and share your feedback in comments.