Synthesizing Knowledge from Data

Building to Empower

Decision-First AI
Comprehension 360
Published in
5 min readDec 2, 2015

--

We work in a time when data is billed as the answer to many issues. Big Data is supposedly the answer to many more. But just having access to a myriad of raw data is not enough to provide your company with insight and knowledge it needs to succeed.

To create knowledge, your data must be synthesized. Synthesis is hard. It is much harder than analysis. And no matter what the sales guys tell you, synthesis is a hands-on process. It does not come canned with the latest BI tool, nor is it very effectively outsourced.

Five steps are necessary to fully transform data into knowledge. Each step comes from the process of synthesis and adds to our data. Adding:

  • Identity & Definition create Usable Data
  • Qualification creates Facts
  • Context creates Information
  • Structure creates Insight
  • Prediction creates Knowledge

Each step refines, defines, and creates higher value from the data.

Identifying and Defining Usable Data

Data is a raw material. Like so many other raw materials it must be harvested/collected. My preferred analogy is something like wheat or strip mining. Those processes require little upfront discretion. They are fast and efficient.

Depending on the source of your data, it will fall into two categories. A big mess or a small mess, but either way it will be a mess. To begin creating value, we must first begin to identify what we have. Is it wheat? Is it chaff?

Identity is not as straightforward as it may seem. We must define our data elements. How do we determine the wheat from the chaff? We must also consider that this data will likely be combined with others. So it is just not a matter of wheat or chaff? But which wheat? From which field? Harvested at what time? And so on…

Definitions must be detailed and thoughtful. Many times I have witness warehouses filled with time stamps. Which should be a huge improvement over warehouses where time stamps are sparse. But that is not true when it is not clear what the time stamp really means. Is it an create date, a transfer date, and upload date? Did you collect the timezone? Which timezone — the central servers, the timezone where the activity occurred, GMT?

Well defined data is more rare than you might expect. It is much more valuable, too. Identity & Definition make Usable Data.

Qualifying Facts

Well defined data is a great first start, but is it accurate? How do you know? To answer that question, you must identify a system of record. This system may be available through an external processor, through manual audit, or through comparison and triangulation.

Using our wheat analogy, we might rely on a third party audit. We might compare measurements from the thresher, hauling weights, or measurements at the storage silo. We might even look back at recordings of what was planted. Whatever gives us the strongest sense that are data is accurate.

Filtering can also come into play. Do we want to count the full harvest or remove the chaff? I prefer as little upfront filtering as possible, because future discrepancies too often come from poor filtering. Regardless, your filtering processes, whether performed before or after auditing, will need to be qualified (and defined) as well.

When well identified and defined data is validated for its quality, we have created facts. Facts should be a requirement for all reporting. Reports built on unqualified data have little real value.

Adding Context to Create Information

Facts are important, but they are a very narrow view of what is going on. Facts require context to have greater value. Context within a data warehouse is very similar to the context we naturally provide in a typical conversation. At least, most of us…

We have all had that awkward experience where we are talking with someone who fails to provide enough context. Often it is an older relative who assumes we are on a first name basis with all of their friends. In these experiences, we quickly find ourselves lost and confused.

Context requires us to connect and link various facts and elements so that the end user is able to follow along with the story. Great data should always tell a story. Connecting the facts and elements of your data will provide that context.

Once you have identified and define your data, assured its accuracy, and added linkages to create context — you have created information. Information informs. It informs decisions. It informs strategy. It tells you the story of your people, your products, and/or your customers.

Structuring Insight

Stories are interesting and informative, but great stories have great structure. When a story lacks structure, it becomes confused. We all know someone whose stories often devolve into a ramble. This is because their tale lacked structure and things without structure quickly become disorganized.

Organizing your data is critical. Data is collected as individual records. Each record represents a single event, action, or the like. Once context has been added, related records are linked. Structure takes this a step further, grouping recurring context into data objects. Objects can then be grouped, segmented, and further connected.

Well-structured information creates insight. Insight creates true understanding of our customers and businesses. Organization and structure simplify the story, enabling us to build analogies, make changes, and attempt to alter the story-line.

Predicting Knowledge

Insight creates real value, but it still falls a step short. Insight creates understanding, but true knowledge comes from prediction. Prediction can take many forms from statistical models and linear regression to simple brute force logical models built into excel spreadsheets. Whatever their form, your organization MUST capture them and store them in the data warehouse.

Why is this critical? One of the main reasons to record and store these predictions is so the organization can learn from them. If you know your customer you should be able to predict things about their activity and behavior. When those predictions succeed repeatedly, you have proved that knowledge. And when you fail, you will know where you need more data, better definitions, higher quality, and better structure.

Few organizations ever achieve this level of data management, it does not stop them from making predictions. It does prevent those predictions from being any more than a guessing game.

Synthesizing knowledge from data is not a complex process, but it is not an easy one either. Most importantly, it is a hands-on process, that requires dedicated experts who know your business and your data. There are no shortcuts, but the rewards for success are well worth the effort.

Quintessentially is an article format created by Corsair’s Institute to increase the reader’s comprehension of key concepts by providing several distinct views on a central theme. For more articles from Data, Quintessentiallyclick here.

For more information on the author visit his profile on LinkedIN — George Earl

--

--

Decision-First AI
Comprehension 360

FKA Corsair's Publishing - Articles that engage, educate, and entertain through analogies, analytics, and … occasionally, pirates!