The road from data to wisdom

Matt Triano
4 min readAug 23, 2018

Everyone starts out as a scientist. Everyone. You may think that science requires lab coats and equations and expensive equipment, but science is just a method for making predictions based on past observations. Babies aren’t born with knowledge of physics or gravity, but they see when they drop food, throw toys, or splash water, those things fall down. Every time they repeat the experiment with new objects in new places, the pattern emerges: everything has fallen down.

“Experiment note #76: water traveled upwards briefly before changing direction and falling down”

When our baby scientist understands this pattern, they will expect that they’ll see the same result if they repeat the experiment in the future, or stated differently, they know that things will always fall down. This is the core of the scientific method: observe, look for patterns, and perform more experiments that test the pattern. And eventually, after our baby scientist performs some costly experiments (like quietly dropping a favorite toy from their stroller), they’ll gain the wisdom that sometimes, it’s better not to drop things.

Data science has boiled this learning process into the Data-Information-Knowledge-Wisdom (or DIKW) pyramid.

  • Data: observations or measurements of phenomena in the natural world.
  • Information: data that has been structured to show a pattern or framed in a context relevant to people.
  • Knowledge: information that has been organized and interpreted so that people can act on the information.
  • Wisdom: knowing when and how to act on that knowledge to achieve desired goals.

There is a lot of nuance that goes into using this framework to solve new data problems, but when it’s done correctly, it can impact everyone. Weather stations are constantly measuring the temperature, precipitation, wind speed, and a host of other factors. Data from individual weather stations are mapped to show the weather situation at any given time. These mappings can be ordered by time, and looking at the maps in order can show weather systems moving, which allows us to predict what the weather will be like in the near future. This data product is so well integrated into my life that it takes less me than a minute to check my phone in the morning and know how to dress and if I should bring an umbrella to the office.

The DIKW framework is easily adapted to guide a data science project to bridge the divide between the raw, noisy data of the world and the timely, relevant, and interpretable output that allows us to make data-driven decisions that produce better outcomes.

In practice, climbing the data science pyramid is often difficult. To determine the data you need to collect, you have to define the problem or question you want to investigate, and unless you’ve picked a trivial problem, looking at the data will expand your understanding of the problem and you may have to tweak the problem definition or explore different data sources. While modeling your system, you may suspect that there’s a more natural way to represent your data, so you go back and do more feature engineering. You may finish your model and discover you were thinking about the problem wrong the entire time. But that’s all normal while working on a data science project. If you already understood the system when you started the project, then the project would have been unnecessary.

This general process is called the Cross Industry Standard Process for Data Mining (or CRISP-DM).

The CRISP-DM Life Cycle

When I was new to data science, I would often worry when I discovered some finding that indicated I needed to get more data and start over. I assumed that this was evidence that I was wasting time and that professional data scientists didn’t run into these dead ends. But when I learned about these conceptual frameworks in my introductory data mining course, I realized that refining your understanding of the problem is a main goal of data science. Before you can solve a problem, you have to ask the right question, and before you can ask the right question, you have to stop asking the wrong questions.

It’s been years since I initially learned about the CRISP-DM process and the DIKW pyramid, but whenever I get stuck on a project and don’t know what to do next, I tend to think back to CRISP-DM. Thinking about where I am in the CRISP-DM lifecycle always gives me some clues about what I should double check. Next time you get stuck, give it a try!

--

--