How to Import Data Using OpenMetadata Tutorial Guide

Asaki Sakamoto
7 min readSep 19, 2024

--

openmetadata import data

Let’s talk about something that we all face during development: API Testing with Postman for your Development Team.

Yeah, I’ve heard of it as well, Postman is getting worse year by year, but, you are working as a team and you need some collaboration tools for your development process, right? So you paid Postman Enterprise for…. $49/month.

Now I am telling you: You Don’t Have to:

That’s right, APIDog gives you all the features that comes with Postman paid version, at a fraction of the cost. Migration has been so easily that you only need to click a few buttons, and APIDog will do everything for you.

APIDog has a comprehensive, easy to use GUI that makes you spend no time to get started working (If you have migrated from Postman). It’s elegant, collaborate, easy to use, with Dark Mode too!

Want a Good Alternative to Postman? APIDog is definitely worth a shot. But if you are the Tech Lead of a Dev Team that really want to dump Postman for something Better, and Cheaper, Check out APIDog!

Understanding OpenMetadata and the Process of Importing Data

OpenMetadata is an open-source platform designed for managing metadata across a variety of data assets. It plays a vital role in data governance, enabling organizations to understand their data landscape, maintain data quality, and ensure compliance. One of the critical functionalities offered by OpenMetadata is the ability to import data from various sources. This essay provides an in-depth exploration of the intricacies of importing data into OpenMetadata, along with step-by-step guides to facilitate a seamless integration process.

What is OpenMetadata?

OpenMetadata provides a comprehensive framework for cataloging, managing, and exploring metadata. It supports a plethora of data sources, including databases, data lakes, and analytical tools. By fostering a unified approach to metadata management, OpenMetadata enhances collaboration among data teams, streamlining insights and analytics.

Key Features of OpenMetadata

  • Data Governance: OpenMetadata supports robust data governance frameworks that ensure consistent and fair use of data across the organization.
  • Centralized Metadata Repository: It serves as a single source of truth for metadata, reducing redundancy and inconsistency.
  • Integration Capabilities: With connectors for major databases and services, OpenMetadata allows for dynamic importing and syncing of metadata.
  • Visualization Tools: It provides visual components for data lineage and impact analysis, helping users understand the flow of data across systems.

Types of Data That Can Be Imported

Understanding the various data types that can be imported into OpenMetadata is essential for maximizing its functionality:

  1. Database Metadata: Information regarding the tables, columns, and relationships within relational databases.
  2. Data Pipeline Metadata: Metadata related to ETL (Extract, Transform, Load) processes and dataflow pipelines.
  3. Business Glossary: Imported terms, definitions, and business meanings that provide context for the data assets.
  4. Data Quality Metrics: Information regarding the quality of data based on defined metrics, which can be critical for data governance.
  5. Schema Information: Structural information about datasets, such as their formats and types.

Steps to Import Data into OpenMetadata

The import process into OpenMetadata involves several steps, from setting up the environment to executing the actual import. Below is a structured procedure for importing data.

Step 1: Set Up OpenMetadata

Before beginning the import, ensure that you have OpenMetadata installed. You can set it up locally or on a server. Utilize Docker for simplified deployment:

docker-compose up

Step 2: Access the OpenMetadata UI

Once the installation is complete, access the OpenMetadata user interface (UI) through your web browser. Typically, it runs at http://localhost:8585. Log in using the default credentials or the ones you set up during installation.

Step 3: Configure Data Sources

To import data, OpenMetadata must be configured to recognize the data sources from which you wish to import. This is accomplished through the following steps:

  1. Navigate to the ‘Ingest’ section in the UI.
  2. Click on ‘Add Data Source’.
  3. Fill out the required fields, including:
  • Name: Name of the data source.
  • Type: Type of database (e.g., MySQL, Postgres).
  • Connection Parameters: Host, Port, Database Name, Username, and Password.
  1. Test the connection to ensure accessibility.

Step 4: Create an Ingestion Workflow

In OpenMetadata, ingestion workflows are optimized for extracting metadata. To create a new ingestion workflow:

  1. From the side menu, select ‘Ingestion Workflows’.
  2. Click on ‘Create New Workflow’.
  3. Define workflow properties, such as:
  • Source: Select from your configured data sources.
  • Entities to Ingest: Specify which metadata entities you want to import (tables, columns, etc.).
  • Scheduling Options: Choose to run the migration immediately or on a defined schedule.

Step 5: Execute the Ingestion Job

Once your workflow is configured, you can start ingestion. Monitor the ingestion job through the UI. This display provides logs and status updates to inform you of any issues.

  1. Navigate to the ingestion job you created.
  2. Click on ‘Run Job’ to execute the ingestion.
  3. Review logs to ensure successful completion. If errors arise, consult the specific log messages for troubleshooting.

Step 6: Validate the Imported Data

After successfully running the ingestion, validation of the imported data is crucial to ensure accuracy. To validate:

  1. Explore the data in the OpenMetadata UI through the ‘Browse’ section.
  2. Inspect different metadata artifacts by clicking on them. Ensure relationships, data types, and descriptions are correctly represented.
  3. Ensure your data quality metrics (if any were imported) reflect realistic conditions and comply with defined quality standards.

Using APIs for Data Import

OpenMetadata also offers an API that allows programmatic access for importing data. This can be advantageous for automation and batch processing. Here’s a quick guide on using the OpenMetadata APIs:

Step 1: Access the API Documentation

OpenMetadata has a RESTful API. Access the API documentation at http://localhost:8585/api/docs. Familiarize yourself with the endpoints available for ingestion.

Step 2: Create a POST Request

To automate data imports, use a POST request to the appropriate endpoint. For example, to ingest a new dataset into OpenMetadata, you can use a payload structured like below:

{
"entityType": "dataset",
"name": "my_table",
"description": "This table contains users' data.",
"columns": [
{"name": "id", "dataType": "int"},
{"name": "name", "dataType": "string"},
{"name": "email", "dataType": "string"}
],
"relationship": {
"type": "data_source",
"id": "source_id"
}
}

Use Postman or a similar tool to send this request with the required headers, including Authorization (if necessary), and the content type as application/json.

Step 3: Handle Responses

OpenMetadata will respond with a status code and response body, which you should handle appropriately based on the outcome of your request. For instance, a successful ingestion returns a 200 status code, while a 400 status code indicates a problem with the data structure.

Best Practices for Importing Data

When importing data into OpenMetadata, following best practices ensures optimal performance and maintainability:

  1. Incremental Ingestion: If possible, set up incremental ingestion jobs to avoid heavy loads during full ingestion, which can hinder performance.
  2. Data Quality Checks: Implement data quality checks before ingestion. Validate that the data meets the expectations of your data governance framework.
  3. Documentation: Maintain clear documentation on your ingestion processes, workloads, and schemas. This transparency is critical for collaborative environments where team members may need to reference workflows.
  4. Error Handling and Recovery: Implement robust error handling and ensure your ingestion jobs can recover gracefully from failures. Logging detailed information can significantly aid in troubleshooting failures.
  5. Security Compliance: Pay attention to data privacy and compliance, especially if importing sensitive information. Ensure that your connections and data handling conform to regulations like GDPR.

Conclusion

OpenMetadata represents a state-of-the-art solution for organizations seeking to comprehensively manage their metadata landscape. The process of importing data into OpenMetadata, while requiring meticulous attention to detail, provides a structured approach to building a holistic view of an organization’s data assets. By carefully configuring data sources and workflows, leveraging API functionalities, and adhering to best practices, organizations can unlock significant benefits from their metadata management initiatives.

Let’s talk about something that we all face during development: API Testing with Postman for your Development Team.

Yeah, I’ve heard of it as well, Postman is getting worse year by year, but, you are working as a team and you need some collaboration tools for your development process, right? So you paid Postman Enterprise for…. $49/month.

Now I am telling you: You Don’t Have to:

That’s right, APIDog gives you all the features that comes with Postman paid version, at a fraction of the cost. Migration has been so easily that you only need to click a few buttons, and APIDog will do everything for you.

APIDog has a comprehensive, easy to use GUI that makes you spend no time to get started working (If you have migrated from Postman). It’s elegant, collaborate, easy to use, with Dark Mode too!

Want a Good Alternative to Postman? APIDog is definitely worth a shot. But if you are the Tech Lead of a Dev Team that really want to dump Postman for something Better, and Cheaper, Check out APIDog!

--

--