Discover, catalog and govern data with IBM Data Catalog

Susanna Tai
5 min readNov 17, 2017

--

For the last couple of months, we’ve blogged about the data challenges faced by many organizations today, particularly in areas of access, collaboration and governance. In those posts, we presented a vision of how IBM Data Catalog will address these challenges with an intelligent asset catalog that offers full end-to-end capabilities around data lifecycle and rule-based governance.

Our team is super excited to let everyone know that IBM Data Catalog is now available in open beta!

What is IBM Data Catalog?

IBM Data Catalog provides a cloud-based enterprise metadata repository that lets you securely catalog your data sources and assets wherever they reside. It is the one-stop shop for data for the enterprise, where data scientists, data engineers and business analysts can easily find what they need, and then quickly put that data into productive use through other applications and tools, such as Data Science Experience and Data Refinery, all working seamlessly together in the integrated Watson Data Platform. Robust governance capabilities embedded in Data Catalog let you define and enforce policies, giving you the peace of mind that the right data are being accessed by the right people. Data Catalog also comes with a business glossary which allows you to manage business terms and link them to data assets, policies and rules, providing the bridge between business domain and technical assets.

Here are some of the key capabilities of IBM Data Catalog that make data simple and accessible.

Discover data

Users can easily find and discover catalogued data across multiple on-premises and cloud sources through different search methods including tags and filters, and previewing of data. Data scientists and business analysts no longer have to waste their time logging into different systems to search for and extract data. With Data Catalog, they can shop for the data that they need from a single, centralized portal.

Search and discover data through a catalog

Catalog data wherever they reside

To share assets, users can add local files or data assets from remote data sources to Data Catalog. For the latter, there are currently 30 pre-built data connectors to help you set up connections to commonly used on-premises and cloud data stores. When cataloguing remote data assets, only the metadata of the asset is captured in Data Catalog. The actual data remain in their source systems.

Pre-built connections to access remote data

Automatically classify your data

As data assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards and business analysts to find, understand, share and use the assets.

Data columns in the asset are automatically classified

Refine and analyze data

Data Catalog, Data Science Experience (DSX) and Data Refinery are all part of Watson Data Platform designed to work seamlessly together in a common fabric. Once the user has found the data that he needs in the catalog, with a single click, he can add it to a project where he can refine and analyze the data using Data Refinery and DSX capabilities. Think of catalogs as being the place where you share and find data, and projects as being the work-spaces where you collaborate with other users for specific goals, for example, cleansing and shaping data to use in sales analysis.

Add a catalog data asset to a project to refine or analyze

Govern data

Governance policies and rules can be defined to control access to data in governed catalogs. Policy enforcement is automatic and enabled all the time, and leverages classifications assigned automatically or manually to data assets when evaluating whether or not a user can view or use the data. So while Data Catalog makes data easy to access, it is also underpinned by an intelligent and robust governance framework that ensures its users comply with corporate data governance policies.

Categories to organize governance policies

Monitor governance policies

Through the Governance Dashboard, the Data Catalog administrator can view a summary of all active governance policies and their enforcement history.

Policy enforcement history in Governance Dashboard

Create a business glossary

Data Catalog’s business glossary provides a framework to capture and manage the enterprise’s common business vocabulary. Business terms can be added manually or imported from a csv file or an xmi file from IBM Information Governance Catalog.

Manage terms in Business Glossary

Link business terms to assets, policies and rules

By simply mapping a business term to an asset or attribute classifier in business glossary, users can automatically see all the policies, rules and assets that are related to that term, thus providing the link between business domain and technical assets.

Assets, policies and rules linked to a business term

Ready to explore?

IBM Data Catalog is currently available in Beta, and you can try all the features described above for free!

To sign up, go to http://ibm.com/cloud/data-catalog. Or, if you are already a Data Science Experience or IBM Data Refinery user, check out how to add Data Catalog from your app.

Once you’re signed in, here are some suggested first steps to get started:

  • Create a catalog
  • Add assets to the catalog

To learn more, check out our docs or watch our video tutorials on YouTube. If you have a question or need help, you can leave us a message on the chat box on the bottom right of the app and we’ll be there to assist.

So dive in, explore and let us know what you think!

--

--

Susanna Tai

Offering Manager, Watson Data Platform | Data Catalog