Geek Culture
Published in

Geek Culture

A data scientist’s guide to storytelling

If you want to convince and dazzle others with your cool data science derived insights, you need to master the craft of story telling

Storytelling with data is an art that can be learnt and evolved (picture credits @rainbennet, unsplash.com)

Who doesn’t like a good story? From J.K Rowling and Stephen King to Michio Kaku and Neil deGrasse Tyson, the massive influence of storytellers of any kind can hardly be overstated. Their masterful stories belabor a key point — unless you get the audience hooked from the word go and, keep them engaged throughout, the message you are trying to convey, however cool it maybe, will fall on deaf ears. The typical human mind shies away from disparate pieces of information but seamlessly tunes into cogent stories.

Data science is no exception to this rule. Consider our data scientist who has worked months, including weekends at times, to derive very new, cool and actionable insights from a data set fraught with issues of quality and quantity. After nailing this challenging intellectual exercise, she proceeds to present the results to her manager, a non-technical professional. Alas, the latter appreciates neither the creativity of the data scientist’s solution nor the enormous challenge posed by the data set. Here, we ask the question “What did the data scientist or the manager miss ?”. Since we are not discussing manager training in this article, let’s focus solely on the data scientist.

The above scenario is all too common. Without persuasive storytelling that highlights challenges with the data set, and the ingenuity of the solution, a data scientist will find it difficult to generate enthusiasm in colleagues, managers or stakeholders. More dangerously, the non-technical person who understands bits and pieces of information stripped of its proper context might come up with their own erroneous interpretation. So, how does a data scientist rise to be a master storyteller who can influence others ?

As is true of acquiring any new and valuable skill, this will warrant lots of deliberate practice over time. A noteworthy fine print worth pondering here is to indulge in ‘deliberate practice’, which makes perfect. Contrast it with blind practice which only makes permanent and not perfect!

But, here are a few pointers to help you get started:

  1. Leverage the power of three-act story telling:

Act I, state in the clearest terms possible the problem you are trying to solve or are working on. Let’s pick a few real-world, machine learning examples to illustrate this point —

(a) a computer vision solution for identifying COVID-19 in chest radiographs,

(b) identifying credit card fraud and,

(c )predicting the actual doomsday year and month (Ouch !). This is aimed at those looking to reserve their spot on Elon Musk’s SpaceX spaceship bound for Mars, the list goes on…

Act II, tell your readers or audience exactly why it’s a big deal if we solve the problem. In duly aligning ourselves to the above examples, this translates to

(a) diagnosing COVID-19 solely using molecular tests can take days and can turn up false negatives up to 30% of the time. Importantly, several in several regions in the world, X-ray diagnostics are easier to access than molecular diagnostics. These delays, inconclusive test results and access issues will all negatively impact patient treatment especially in a resource-starved public health setting. How can we leverage X-ray diagnostics to solve these problems and potentially save hundreds of lives and millions of dollars in treatment ?

(b) More than half a million cases of credit card fraud happens annually, costing banks and customers at least 10 billion dollars in the process. How do we detect fraud at the instant it occurs and prevent further, irreparable damage from happening ?

(c ) take this one with a pinch of salt — Earth has a population of around 8 billion and predicting with any accuracy a potential doomsday will allow us to better prioritize the colonization of Mars. We know the upper limit to Earth’s existence to be another 7.5 billions years whence it will end up being consumed by our Sun. However, owing to other dangers including global warming, widespread nuclear annihilation, a terrifying global epidemic of a killer virus or, an unwelcome tryst with a momentous asteroid, could end all humanity on the planet much earlier. Could we predict any of this with some certainty and turbocharge Elon Musk’s Mars colonization dream?

Act III. Finally, what is the solution you can offer to solve the problem?How much money or work hours or lives is your solution going to save ?Once again, building on our examples —

(a) Using radiographs, your novel deep learning integrated computer vision solution detects COVID-19 with 95% accuracy. Moreover, it distinguishes COVID-19 lung abnormalities from other, confounding ones 90% of the time. When deployed properly with digital X-rays, it can be used to triage high-risk patients for early, aggressive treatment. After factoring in other relevant COVID-19 population data, you translate these accuracies to number of lives saved. For instance, your solution will prevent two out of every ten COVID-19 deaths that happen because of delays in initiating aggressive treatment protocols.

(b) your new credit card cybersecurity system will reduce credit card fraud by anywhere between 5to 10%. While the percentages may seem relatively small, it translates into preventing up to 1 billion dollars from being stolen.

(e) your highly sophisticated, probabilistic doomsday prediction model has a 10 year margin of error. Since this will help fine tune spending strategies and resource allocation around Mars colonization, it will save the Government 50 billion dollars annually. On a lighter note, it will also make sure Elon Musk remains one of the hottest entrepreneurs on planet Earth (or planet Mars ?).

Please note these numbers quoted above are purely fictitious and only serve to illuminate our approach to storytelling.

2. Use language that non-technical stakeholders can understand. Keep this tip front and center of your thoughts when preparing to communicate with stakeholders who may come from other less technical fields. An eternally safe strategy is to stay away from using highly technical, “data scientist’s only” jargon such as the F1-score (something to do with F1 racing ?), hyper-parameter tuning (hyper about what ?), R-squared (what even is this R thingy ?). Instead, try hard to translate these metrics into language the other side will easily understand — for one dollar invested in your solution, how many dollars will the company profit? or, how many lives will we save or even, how many lost work hours will we protect?

3. Use compelling data visualizations. Be it a politically stirring New York Times piece or a nerdy Nate Silver’s 538 blog post, or even a scientifically provocative NewScientist article, great data stories usually come with carefully crafted data visualizations. You can find several excellent resources on how to make captivating data visualizations on the web, on Coursera, or at your University library with the Edward Tufte books having achieved some kind of a cult status on this.

While I would hate to deprive you of all the fun of learning data visualization, here’s a few salient points to keep in mind — (a) choose your color maps meaningfully while being kind to people with color blindness, (b) avoid redundant labelling in thy plots and, (c ) pick the right kind of plot to emphasize the point you are trying to ferry across. A hack I often used in the beginning was to skim through some of the popular python visualization package galleries.The additional benefit of choosing from seaborn’s and bokeh’s gallery pages is that each gallery specimen gives you the code that generates the plot ; all that needs to be done is for you to copy — paste the code into your programming environment, then ‘massaging’ it to suit your exact needs!

4. Summarize your insights. All good things come to an end and your data stories are no exception. An effective way to draw the curtains include showing a plot of progress you made at regular intervals in your data science adventure to summarize the key takeaways from your story.

5. Open the door to questions and feedback. A crucial, growth mindset way to advance your storytelling skills is to solicit feedback. Not all comments will be helpful or even well-intentioned, but you are likely to get the seeds for new ways to improve your craft. Remember we agreed on deliberate practice at the beginning of this essay ?

Embrace the elegance and power of storytelling with data, practice and evolve and, may the force be with you !!!

--

--

A new tech publication by Start it up (https://medium.com/swlh).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store