Dr. Bing Is Seeing You — Right Now
A NY Times article published yesterday titled “Microsoft Finds Cancer Clues in Search Queries” highlighted how Microsoft researchers wondered if an analysis of specific healthcare related web searches could be used to predict a diagnosis / onset of pancreatic cancer. This is likely to get lost in the noise of news stories, but is worthwhile and very significant.
The problem with acquiring data
All said and done, the “problem” is that diagnosing a condition requires data but said data is not easily accessible, and more often not even available. Any physician will tell you one of the key things they learn in med school is differential diagnosis (DDx) — the only way to arrive at a set of potential diagnoses is by asking questions. Asking questions takes time. Life threatening diagnoses are complicated by their nature requiring more questions. Now constrain the physician’s need to spend more time with a patient to be able to ask all these questions with what she can expect to get paid for it. And therein lies the human problem — the tradeoff of time vs money.
What the NY Times article contains is a possible way to remove time as a constraint. What if the patient asked questions when he or she felt them? And maybe, like me, you are asking the questions of
- Can this actually work?
- I get the possible value, how do I know about it? False positives?
- Should I be worried about privacy?
The opportunity
You might remember from some time ago how Target got into hot water because it sent a family coupons for diapers and other baby related items. The pater familias was incensed until it came to light that his teenage daughter was pregnant. You can read more about that story here but the key takeaway was the question: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” Leaving aside the “not wanting us to know” bit, isn’t that what this data-driven approach opens a door to?
The NY Times article starts asking a similar question: “If we heard the whispers of people online, would it provide strong evidence or a clue that something’s going on?” Microsoft started by asking powerful questions.
Pancreatic cancer is a terrible disease with little warning and no recourse. The story of Randy Pausch and his last lecture are legendary. But data shows this best. Edward Tufte put this graphic together that shows the survival rate by type of cancer ordered by decreasing likelihood of survival. Pancreatic cancer is at the very bottom.
The further challenge with diagnosing this disease is that “it typically produces a series of subtle symptoms, like itchy skin, weight loss, light-colored stools, patterns of back pain and a slight yellowing of the eyes and skin that often don’t prompt a patient to seek medical attention.” [Source]
What the researchers at Microsoft have done is document a way to aid in the diagnosis of pancreatic cancer using user generated search terms over a period of time.
What if you could do the same across any disease?
The challenge
It all comes back to data and asking the right questions.
Who captures it?
Should they capture it?
Should they do something with it?
What about false positives?
How is it communicated?
Should it be communicated?
The number of questions this can generate is enormous. The fundamental question is whether to even start going down this rabbit hole.
The answer is a resounding yes.
There are open questions to be answered for sure but let’s not let that stand in the way of saving lives, especially if that’s possible via passive data collection. I do appreciate that I can get a calendar update based on an email from an airline on an upcoming trip. This is so much more than that.
The future
The future is fraught with challenges of course. While Google Flu trends debuted with much fanfare, it had challenges and was shut down. No doubt more data-driven population health innovations will have initial struggles. This doesn’t mean we shouldn’t keep pushing the initiative.
Challenges notwithstanding, hopefully, you share my enthusiasm that this data-based but targeted approach is something that the search and research community should embrace.