Truth is Lineage, Lineage is Truth

Bryon Jacob
data.world
Published in
3 min readApr 19, 2024

Recently, I found myself at a dinner table surrounded by some really bright data minds — data scientists, data governance leaders, and experts in data security. As the evening progressed, we delved into the complex issue of providing business users with accurate answers to their questions. We discussed the intricacies involved in formulating these questions with precise definitions and the challenges in obtaining the correct answers. The conversation naturally led us to the role of governance and lineage in this process.

The Graph of Knowledge

Data lineage represents a graph of relationships among various nodes of knowledge. When we seek answers within a corporate environment, the clarity and accuracy of these answers rely heavily on a network of interconnected information. Each piece of data — from terms and definitions to the columns and tables in our databases — is governed, or should be, by a specific set of people and policies. These elements ensure that the data’s integrity remains intact throughout its lifecycle.

But what do business users truly seek? They desire the “right” answer. However, defining “right” isn’t always straightforward. In the context of business, “truth” is the answer that adheres to officially sanctioned definitions. It is the output that employs the correct data sources, approved specifically for the given use case, and queried in a manner that reflects the agreed-upon definitions and methodologies. “Truth” is a lineage problem!

Lineage as the Pathway to Truth

The correctness of an answer in a business environment hinges on the definition of terms, and on the predefined agreements surrounding those terms — a concept deeply embedded in the fabric of data governance. This is what makes lineage an indispensable component of a data catalog. Understanding how analyses link back to these governed facts is crucial for unearthing the genuine “truth”.

The significance of lineage extends beyond traditional data scenarios and into the realm of artificial intelligence. With the increasing reliance on AI and large language models (LLMs), establishing trust in the outputs provided by these technologies becomes paramount. You can’t trust an LLM outright; you need solid lineage that connects every AI-generated answer back to verifiable facts, definitions, and policies. Without this lineage, the authenticity of the information remains questionable. Trust isn’t achieved. Adoption suffers.

More Than a Philosophical Concept

While the relationship between truth and lineage might appear philosophical, it is deeply pragmatic. In the world of data, truth and lineage are inextricably linked. Every “truth” we pursue is anchored in the lineage that traces its origins and validates its accuracy. This isn’t merely an intellectual exercise — it’s a practical approach to ensuring that the information we rely on is not only accurate but also verifiable.

In conclusion, as we advance further into the age of information and artificial intelligence, understanding and implementing robust lineage systems is not just beneficial — it’s essential. It grounds our data practices in reality, ensuring that the truths we operate by are reflections of a well-governed and meticulously verified data landscape. Truth is Lineage, Lineage is Truth.

--

--