Using “Star Trek” to help understand what “data storytelling” means — and how you can do it, too

Casey Doyle
Data Science at Microsoft
11 min readJan 11, 2022

--

Photo by the author

It’s common for data scientists to hear about the need for “data storytelling.” But what does that mean? Put simply, it means — contrary to what is often said — that data doesn’t speak for itself: It requires humans to construct a narrative about it, to tell a story about it.

Telling stories comes naturally to human beings. This ability helps us make sense of the world around us. But too often, data scientists may become so focused on the details of their analysis — or so mindful of the challenges they have overcome — that they talk only about those aspects of their work, forgetting to put the work in the context of why it was likely commissioned in the first place: to solve a business problem.

This is not trivial. Data scientists who can’t communicate the business impact of their work — typically by describing the “So what?” and “Now what?” in addition to the “What” — may find their careers unintentionally limited, no matter how great their technical expertise. But constructing a relevant, meaningful narrative about the business impact of their work is only part of what is needed. To be successful, data scientists must understand all the parts of their story and how each part appeals — or does not appeal — to different audiences and stakeholders, and then deliver accordingly.

Whether a data science deliverable consists primarily of some type of long-form writing, a presentation, or something else — or in cases where a talk track is being built to accompany the unveiling of an ML model, for example— they can all benefit from having a defined narrative structure. In this article I share ways to think about building a data science storyline using an example of narrative structure from Star Trek, and I suggest ways to approach writing what may be the most necessary and significant but often the most overlooked part of it: the executive summary.

A data science storytelling framework

Just as Shakespeare wrote his stage plays in five acts, so, too, there are five main structural components of a data science story. They include data exploration, analysis, findings, conclusions, and recommendations. Furthermore, a “wrapper,” supplementing this core and consisting of an introduction, appendixes, and — crucially — an executive summary, round out a complete data science storyline.

Figure 1: Data storytelling framework.

I start with the core five sections not only because they are central to a well-told data science story, but also because they must be constructed before the introduction, appendixes, and all-important executive summary can be put together.

The five core sections

To illustrate the five core sections of the data science storytelling framework, I follow a typical Star Trek storyline. In a classic episode from the original series, “The Devil in the Dark,” the Enterprise arrives at mining planet Janus VI to investigate the recently impaired production of a key mineral vital to the populations of other worlds. As the show unfolds, Kirk, Spock, McCoy, and their fellow crewmembers follow a story structure used in many Star Trek episodes that also serves to illustrate a meaningful data science narrative structure.

First, they explore the data as presented to them by the miners on the planet: Ever since reaching a particularly deep subterranean level, some equipment has been damaged and some miners have been killed, leaving only acid-charred remains. Some miners claim to have seen a “monster” lurking near the carnage. Fearing for their lives, the miners have been reluctant to venture back to the levels where the deaths are occurring, reducing production of the mineral. Also in that vicinity, the miners say they have discovered thousands of spherical silicon nodules — unique in form and composition, but of no commercial value — that they have not encountered elsewhere.

With these data points in mind, Kirk and Spock investigate the deep tunnels for clues to help them analyze the situation. They confirm that deaths are occurring at the lower levels when one of their own crewmembers is killed there. They confirm the presence of the silicon nodules and also notice that many have been smashed by the miners. They also observe that additional tunnels have recently been cut into the rock at the lower levels — but not by the miners. Ultimately, they encounter the creature glimpsed by the miners, and after a brief confrontation ends in a draw, Spock establishes telepathic communication with it, learning about the situation from the creature’s point of view — and even that the creature calls itself a Horta, indicating intelligence.

From their analysis, including the information gleaned from Spock’s telepathic communication, they come to their findings: The Horta is the last of her kind, a sentient being and natural maker of tunnels through deep rock who has laid a large number of eggs at the lower levels — the spherical silicon nodules — and is watching over them. Enraged by the miners destroying her progeny and doing what she can to stop them, she has been attacking and killing the miners and destroying their equipment.

Based on these findings, Kirk and Spock come to their conclusions: If they can stop the miners from destroying the Horta’s eggs, the killings will stop, and mineral production can resume. Furthermore, because the Horta is a natural burrower — and because thousands of her eggs are about to hatch — they could possibly reach an accord for the miners to make use of the tunnels naturally made by the Hortas in the course of their lives while otherwise leaving each other alone. This would provide the miners with even greater sources of the mineral as well as other valuable raw materials that would be too difficult to reach otherwise.

Finally, based on their conclusions, the Enterprise team makes their recommendations to the miners: Stop destroying the silicon nodules, make peace with the Horta, welcome her offspring when they emerge, and seek a mutually beneficial relationship in which the Hortas live in peace as they tunnel through the depths, making minerals more accessible to the miners along the way. With Spock acting as intermediary, the miners and the Horta come to an agreement, and then the Enterprise departs, with Kirk, Spock, McCoy, and other crewmembers having succeeded not only in resuming mineral production, but also increasing and expanding it while protecting and respecting the Horta and her coming offspring.

The importance of a complete narrative

Now imagine for a moment that the Enterprise crew did what many data scientists do — stop with their findings, or even less helpfully, keep their focus on how they did their analysis, instead of moving to conclusions and recommendations. The miners would learn only the following: 1.) extensive subterranean cave exploration by Kirk and Spock revealed a creature burrowing tunnels in the deep rock and recently motivated to murder because of the destruction of her eggs, and 2.) Spock has an ability to establish telepathic communication with the creature by having the courage and fortitude to physically and mentally make contact with it.

In this case, the creature’s existence and motivation, while interesting, provide an explanation for what the miners have experienced but no deeper insights that could result in a direction to take for the miners to remedy the situation. And the details of Spock’s telepathic communication — no matter how much of a significant personal risk and display of abilities for Spock — are irrelevant to the miners. It is only when Kirk and Spock explain their conclusions (the meaning of their findings, namely their opinion based on their findings that there is an opportunity for mutually beneficial cooperation) and make their recommendations (what they believe the miners should do based on their conclusions, namely to come to an agreement that gives both sides what they want) that the miners can take action to solve the problem they called on the Enterprise to help with in the first place.

What works as a narrative structure in Star Trek also works in data science. A complete data science storyline can be said to require explanation not only of the What (the findings), but also the So What (the conclusions) and the Now What (the recommendations). Regarding the extent to focus on the How — the details of the analysis itself — while it indeed has its place and audience, it’s not everywhere and for everybody, as I explore next.

Creating a wrapper for your data science story

The five core components I’ve walked through above that are critical for a complete data science narrative can be further enhanced by the addition of an introduction, appendixes, and — most importantly — a well-crafted executive summary.

Although the finished data science deliverable, when finally presented, starts with an executive summary that is followed by an introduction, these two sections are not the parts of the overall narrative for the data scientist to construct first. That’s because the five core parts of the data science narrative must already be built so that there is something to say in the executive summary and introduction. In fact, the executive summary should be prepared last, as I describe shortly.

The introduction, in much the same way that each episode of Star Trek begins with a voiceover of Kirk speaking into his log to set the stage for the story to come, provides background on the business problem, situation, or question that led to the data science work being done, along with who commissioned it and any other key stakeholders involved. It also provides a high-level overview of the sections that follow it, spanning data exploration to recommendations, without going into specific details. (Those are found instead in the executive summary; see more on this below.)

The appendix, akin to the closing credits of a Star Trek episode, is the place for anything related to the work that doesn’t belong in one of the other sections. In this way it is typically not a formal part of the deliverable, but more of an ancillary component. As such, it is optional. It may also be a place for work or “roads not taken” that are related to the data science work but outside the flow of the overall narrative.

The executive summary

Aptly named, the executive summary is literally what is shared with executives and other decision-makers, and I describe it last while going into some detail about it for a reason: It is often the most challenging section of the data storytelling framework to write. That’s because it must distill the essence of the key information the decision-makers need — the What, So What, and Now What — without going into the How.

Many data scientists stumble here because they have been so focused on the How for so long — and have overcome so many obstacles along the way for which they are justifiably proud — that they are eager to demonstrate to executives and key stakeholders their smarts, perseverance, and skill in getting to their results. So, they focus on the How.

But if we pause to consider the point of view of the decision-maker, we can see how this good intention can lead to significant missed opportunity. The decision-maker is likely far removed from data science work and probably does not have a background in data science. Moreover, the decision-maker is likely hearing from many different data scientists about a number of different projects on a regular basis. If those communications are primarily about process — the How — that the decision-maker is not well positioned to appreciate (or possibly even understand), and if those communications lack what the decision-maker needs to do their job — the What, the So What, and especially the Now What — then there is a significant gap between what the decision-maker needs and what the data scientist is delivering.

Multiply this across many data scientists taking a similar approach, and the decision-maker may lose confidence in the ability of data science to provide practical help in making better business decisions, and data scientists may lose standing and influence with the decision-maker, who may then seek help elsewhere or may even reconsider organizational investments made in data science.

Once this dynamic is understood, it’s more straightforward to write an effective executive summary, and easy to see why it should be the last section to write, even though it appears at the beginning of the finished deliverable. This executive summary, necessarily brief just as its title indicates, relays the highlights in no more than two or three sentences each of the findings (the What), the conclusions (the So What), and the recommendations (the Now What). It does not include references to data sourcing, exploration, or the details of the techniques or processes of the analysis itself.

It is the executive summary that is primarily what is presented to the decision-maker, with all the other portions of the work either clearly subordinated or included in an appendix for later reference if needed or requested.

In this way, the executive summary is meant primarily to help business decision-makers and other key stakeholders do their work, which is to decide whether to undertake the direction recommended by the data scientist based on the findings and conclusions presented. The executive summary may also be helpful for other audience members either as a useful overview of the material without diving deeply into its particulars, or as a reference to the material after it has been presented.

The other sections of the narrative framework and its wrapper — the introduction, data exploration, analysis, findings, conclusions, recommendations, and appendixes — are in their fullness primarily for other data scientists, data science managers, program and project managers, and other interested parties who want or need the details in these sections to do their work and advance the practice of data science as a discipline.

By understanding the different needs of various audience members for different parts of the deliverable and providing material accordingly, data scientists can give these audiences what they need to do their jobs while promoting excellence in data science.

For anyone looking to complete the Star Trek analogy, where does the executive summary fit in? It is akin to the pitch made to entertainment executives after the TV series is developed, and perhaps after the pilot episode has been produced, summarizing what will appeal to the audience for the series to earn viewership and, ultimately, revenue.

Data storytelling framework summary

The following table summarizes the parts of the data storytelling framework:

Table 1: Data storytelling components and descriptions.

Conclusion

It’s not difficult to see why data scientists sometimes encounter obstacles in making business impact with their work. Data science is challenging, complex, and can be difficult to explain to non-specialists. But doing so is critical to realizing the ultimate success of data science, which depends on making business impact.

To get there, data scientists who understand and include the five core parts of a data science narrative and the three wrapper sections (collectively, the executive summary, introduction, data exploration, analysis, findings, conclusions, recommendations, and any appendixes), who can prepare an effective executive summary based on the five core components, and who can present the relevant parts of their work to the right audiences in the form of a data science narrative that’s applicable and relevant will not only communicate about their work more effectively, they will increase their ability to influence decision-makers and key stakeholders, leading to effective and lasting business impact on their organizations — and gain enhanced career success along the way.

Casey Doyle is on LinkedIn.

--

--

Casey Doyle
Data Science at Microsoft

Principal Data Scientist of a data storytelling program fostering thought leadership in information design and data visualization inside and outside Microsoft.