Investing in Data Fluency


The real problem facing companies isn’t a lack of highly qualified data scientists; it’s a lack of necessary exploratory data skills

“…we were accused of ‘muddying the water’, to which our response was that we were simply acknowledging that the water was, indeed, muddy.” — Peter J. Diggle, Amanda G. Chetwynd, Statistics and Scientific Method

Data are being collected faster than we know what to do with them. To an almost startling degree, unabated data collection has become a regular activity for companies. Amazon, Google, Facebook, and Microsoft are all collecting data on their users to inform their business decisions, develop better products, and reach new customers. In the information age, successful businesses will increasingly have to adopt data-driven practices and tools to be able to remain competitive.

Those that don’t, die. Remember travel agents? An entire profession was swallowed up by websites like Kayak, Travelocity, and Priceline; instead of Blockbuster and Hollywood Video, we now have Netflix and Hulu; and Amazon is rapidly replacing department stores. These transitions have all been accelerated by businesses using data to learn more about the customers they’re marketing to, the marketplaces they’re competing in, the products they’re selling.

All of these data being collected and stored has also required a significant increase in the availability of computational resources. But other than the meteoric rise of smartphone use, very few people are applying computation in a way that benefits their business — or their lives. Nearly all of the work in the professional world involves a computer, but not many employees (or recent college graduates) have been trained to think of how to answer a question or solve a problem using data and computation.

So how can businesses better take advantage of their data? Data science has become the knee-jerk answer. Organizations and executives are bombarded with a proliferation of products to buy and professionals to hire that promise to unlock, optimize, solve and otherwise make decisions with data. However, for all the LinkedIn and Medium posts proclaiming the data science revolution and age of AI, how are data scientists spending most of their time?

Data scientists frequently admit they spend most of their time doing data preparation — finding, cleaning, wrangling, and asking questions of data. David Mimno, a Cornell professor of Computer and Information Science, wrote a blog post in 2015 that compares these activities to carpentry, a highly-skilled trade requiring years of apprenticeship:

“…my impression is that it is not so much a single discipline as a vast array of specific skills. None of these are particularly difficult by themselves, but knowing which tool or method to use at each stage and carrying out each one cleanly and efficiently takes years of practice. Data carpentry, which I’ve been practicing in one way or another for about 15 years (though never as my official responsibility), is likewise not a single process but a thousand little skills and techniques.”

These “data carpentry” skills are well worth developing and investing in. Companies that do will likely discover the highest value in their data.

But it doesn’t necessarily take 15 years of practice. It takes knowing your products, customers, and the landscape of your business. Revealing meaningful relationships, defining the essential measurements, and delivering evidence and opportunities don’t require fancy algorithms, machine learning, and p values.

These skills are what we call data fluency — a set of foundational analytic skills to read, explore, and visualize data. These abilities are foundational to any data-driven workflow. We’re proposing organizations rethink their analytics strategies to focus on developing fundamental skills in data management, analysis, and exploration (the activities currently occupying the majority of the data scientist’s time). An early investment in these skills builds an analytical mindset that empowers team members to be impactful and take ownership.

Consider what Cassie Kozyrkov, Google’s chief decision scientist, had to say about the necessity of qualified data analysts:

“An excellent analyst is not a shoddy version of the machine learning engineer; their coding style is optimized for speed — on purpose. Nor are they a bad statistician, since they don’t deal at all with uncertainty, they deal with facts. The primary job of the analyst is to say: “Here’s what’s in our data. It’s not my job to talk about what it means, but perhaps it will inspire the decision-maker to pursue the question with a statistician.” — What Great Data Analysts Do — and Why Every Organization Needs Them

But don’t I need a data scientist? Probably not. It’s also worth noting that the notion of a “unicorn data scientist” — equally skilled in all things ML, AI, deep learning, and beyond — is neither empirically represented nor necessary to add value to an organization of any size.

The heavy lifting of statistics, experimentation, machine learning, and data engineering requires fine-tuned skills that are built throughout a career. By leveraging the development of internal resources and investing in the current employees who know their business, organizations can identify and train analysts who can grow into data scientists, while retaining the business critical domain knowledge that makes them most suited to be impactful.

An analytics and research community made of a diverse set of teams with the same mission to make evidenced-based decisions creates alignment in an organizational strategy. This model enables individuals to improve and expand their skills while enhancing their contributions and visibility in the organization. The alternative to this is what businesses are currently seeing. As, Whitney Johnson, author of Build an A-Team and Disrupt Yourself wrote in HBR:

“When your employees (and maybe even you, as their manager) aren’t allowed to grow, they begin to feel that they don’t matter. They feel like a cog in a wheel, easily swapped out. If you aren’t invested in them, they won’t be invested in you, and even if they don’t walk out the door, they will mentally check out.

What data fluency really means is approaching today’s information landscape in a different way: instead of expecting to hire that lone data scientist who’ll spend more than two-thirds of their time wrangling and preparing data, companies should invest in expanding the analytic skills of their current employees. By doing this, business leaders can align their professional development program with the company’s goals and, at the same time, increase retention by giving their employees a chance to learn and grow within their organization.

For a great resource to get started, check out our tutorials and links to interesting projects on Storybench, where we lay out examples of the foundations of analytical skills.

EDIT: This article was authored by Peter Spangler and Martin Frigaard (Martin Frigaard | Paradigm Data Group) but, published under Peter Spangler’s account. Check out Martin’s work on Storybench and his LinkedIn profile.



Peter Spangler | Paradigm Data Group

We help organizations make better decisions by increasing their ability to tell stories with data and create evidence based solutions to business problems.