Putting the “unity” in Unity Catalog

Achieve unified governance that unlocks the full potential of Databricks Data Intelligence Platform

Tosia Morris
Slalom Data & AI
5 min readMay 9, 2024

--

Photo by krakenimages on Unsplash

By Tosia Morris, Maggie Davis, and Kasia Stein

In today’s data-driven business environment, well-managed data is no longer enough to stay competitive. The key to achieving the full potential of your data is ensuring not only that it is trusted, secure, and accurate, but also valuable for everyone in your organization. This starts by empowering the people who rely on your data, tools, and technology.

However, while today’s advanced data catalog solutions offer benefits such as streamlined data discovery, improved collaboration, and AI-driven insights, many organizations still face data management challenges, including duplication of data, repetitive efforts, and even questions about data quality and integrity. This can be due to a lack of unified agreement or understanding of the data across technology and business teams, as well as the possibility of human error in entering large volumes of definitions and security tags into the data catalog.

Databricks Unity Catalog offers a unified governance layer designed to help organizations manage and secure their data and AI assets across multiple data platforms, including data lakes, data warehouses, and databases. Unity Catalog empowers users by providing a centralized platform for discovering, understanding, and securely accessing trusted data assets across the organization’s data landscape. The centralized metadata repository enables self-service data exploration and analysis while enforcing consistent governance policies, data quality rules, and access controls, ensuring users can leverage high-quality, compliant data to drive informed decision-making.

To further help organizations overcome the challenges of metadata management and drive greater shared understanding and trust in data, Slalom created a Unity Catalog Accelerator that automates accelerated business glossary management. As part of Databricks’ newly expanded Brickbuilder program, Slalom’s Unity Catalog Accelerator provides businesses with a repeatable solution that reduces manual business metadata entry and tagging from weeks to minutes and empowers a broader set of data users to safely discover and analyze data at scale.

This accelerator works by leveraging a single business metadata template alongside Databricks Unity Catalog to facilitate a standardized and efficient approach to managing business metadata and tagging. This enables users to consistently apply changes and updates, ensuring continuity and reliability in governance efforts. Additionally, the accelerator provides robust features such as fault tolerance, disaster recovery, and automated deployment of new environments, enhancing resilience and operational efficiency. This integrated approach not only optimizes resource utilization but also promotes seamless collaboration and alignment across teams.

By empowering business and technical users to collaborate in Databricks with a collective understanding of their data, Slalom’s Unity Catalog business metadata accelerator helps organizations to activate data in Unity Catalog to drive value for the whole organization, not just data science and engineering users. The diligent population of metadata in Unity Catalog provides benefits outside of data governance as well. Unity Catalog is foundational in powering Databricks’ latest generative AI features, such as LakehouseIQ.

The catalyst: Accelerating data governance at a global media organization

We recently partnered with a multinational media and communications company on their journey toward modernizing their data infrastructure. The organization aimed to expedite the delivery of trusted insights by leveraging existing tools within their Databricks Lakehouse. To achieve this, they sought to create a comprehensive registry encompassing critical data elements, understandable terms, calculations, and sensitive data restrictions. Working closely with a cross-functional team of domain experts, data leaders, and business stakeholders, Slalom identified and prioritized critical data elements based on specific use cases. Recognizing the need for efficiency in manual data entry into Unity Catalog, Slalom developed and deployed an automated accelerator to streamline the process, aligning with enterprise sensitivity and privacy definitions.

The team devised a robust solution harnessing advanced technical capabilities to expedite the cataloging process and bolster data governance. Employing a domain-based approach, the tool rapidly organizes and manages data assets according to specific domains, facilitating efficient discovery and utilization. Through Unity Catalog data tagging, sensitive data elements were meticulously classified, enabling granular control over access and privacy. The implementation of the accelerator yielded meaningful results, with 30 tables and 750 fields cataloged in Unity Catalog. Notably, the solution provides field-level visibility into the most sensitive data, enhancing security and compliance measures, while its scalable architecture ensures adaptability for future expansions. By leveraging the existing functionalities within Unity Catalog, the accelerator optimized the organization’s Databricks investment, drastically reducing manual data entry time and accelerating data governance for both technical and business users.

Building on the lessons and outcomes at this global media organization, this transformational solution served as the catalyst for the development of what is now the universally applicable Unity Catalog Accelerator.

Optimizing your data catalog: Key steps for success

Maximizing the value of data for your organization starts with empowering the people who rely on your data, tools, and technology. Here are four key steps you can take to get the most out of your data management solution.

  1. Define your objectives and requirements: Determine what specific challenges you aim to address, such as improving data discovery, enhancing data governance, or facilitating collaboration among teams.
  2. Ensure comprehensive data coverage: It’s essential that the data catalog covers a wide range of data sources and types within your organization, including structured and unstructured data, databases, data lakes, files, ML models, and any other relevant data repositories. A comprehensive data catalog provides users with a holistic view of available data assets, facilitating efficient discovery and utilization.
  3. Establish robust data governance practices: The power of data governance is most impactful when IT and business collaborate instead of operating in silos. Get cross-functional input when defining data standards, policies, and procedures for data classification, metadata management, access control, and data lineage tracking. Enforce these governance practices consistently across the organization to ensure trustworthiness and reliability of data assets.
  4. Promote user adoption and collaboration: Encourage user adoption and collaboration by providing training and support for users on how to effectively leverage the data catalog. Users who contribute up front will be more likely to embrace new systems and ways of working.

A true data-driven culture is built on a modern technology foundation that provides easy access to tools and data. At Slalom, we approach our work from a strategic perspective, designing your data governance roadmap to support your business goals. Our Unity Catalog workshop brings together business and technical leaders to establish a clear understanding and definition of what is considered data and how the organization captures and uses data. A key benefit of this process is that you can do it without launching an enterprise data governance program, or you can easily integrate it with your existing data governance. By bringing together cross-functional experts and stakeholders to agree on data definition, you are engaging in the process of breaking down data silos, ultimately broadening access to data and allowing everyone in the organization to derive meaningful insights.

Slalom is a next-generation professional services company creating value at the intersection of business, technology, and humanity. To learn more about our partnership with Databricks, visit our partner page.

--

--