Zafin Integrate & Orchestrate (IO) and Apache NiFi: Unlocking the Potential of Data Flow Automation

Engineering at Zafin
Engineering at Zafin
7 min readAug 12, 2024

By Jijo Lawrence, Architect III, Alok Sahi, Chief Architect, and Shahir A. Daya, Chief Technology Officer

In industries ranging from fast-paced tech startups to multi-national financial institutions, one thing remains consistent — transferring data efficiently is essential. In service of this need, Apache NiFi has emerged as a powerful solution to tackle complex data flow challenges. Let’s explore how NiFi automates data flow and why it’s an important tool in the Zafin Integrate & Orchestrate (IO) platform.

What is Apache NiFi?

Apache NiFi is an open source, robust data flow management system that enables the automation of data movement between various sources and destinations. Developed by the Apache Software Foundation, NiFi provides an intuitive graphical interface for designing, controlling, and monitoring data flows in real-time. Its visual, flow-based programming paradigm simplifies the creation of data pipelines without the need for complex coding, making it accessible to users with diverse technical backgrounds.

Key Features of Apache NiFi

Drag-and-Drop Interface: NiFi’s user-friendly interface allows users to design data flows by simply dragging and dropping processors onto the canvas. This graphical approach enhances productivity and reduces the learning curve for building data pipelines.

Figure 1 — Drag and Drop Nifi Web UI

Processor Ecosystem: NiFi comes with a rich ecosystem of built-in processors for handling various data integration, transformation, enrichment, and routing tasks. Additionally, users can develop custom processors to extend NiFi’s functionality and integrate with third-party systems or APIs.

Figure 2 — NiFi has a rich ecosystem of processors

Data Provenance: NiFi offers detailed data provenance capabilities, which track the lineage of each data element as it moves through the system. This feature is invaluable for auditing, troubleshooting, and ensuring data quality and compliance.

Figure 3 — Data Provenance in the NiFi UI

Scalability and Extensibility: NiFi is designed to scale horizontally to handle large volumes of data efficiently. Load balancing distributes the data in a flow across the NiFi nodes in the cluster based on the configured load balancing strategy.

Figure 4 — configuring the Load Balance Strategy for a Connection

Also, its modular architecture enables easy integration with external systems through custom processors and extensions, providing flexibility to adapt to diverse use cases.

Data Security: Security is a top priority in data management, and NiFi offers robust features to ensure data protection. It supports SSL encryption, role-based access control (RBAC), and fine-grained authorization policies to safeguard sensitive information.

Real-Time Monitoring and Alerts: NiFi provides comprehensive provenance monitoring capabilities, allowing users to track the performance and health of data flows in real-time. Administrators can set up alerts to proactively identify and address issues, minimizing downtime and ensuring continuous data availability.

Why did we choose NiFi?

NiFi is a perfect fit for Zafin’s data integration needs with Banks’ core systems and Zafin’s SaaS platform. Zafin Integrate & Orchestrate (IO) handles this data integration by providing batch, change data capture (CDC), and streaming capabilities. We treat batch files as bounded streams, and NiFi enables Zafin to stream the incoming file data to applications like Apache Flink for further processing and transformations. This has helped Zafin to integrate with a wide variety of banks easily while keeping the integration interfaces and underlying structures of the Zafin SaaS platform intact.

NiFi’s role in Zafin IO

We use Nifi to effectively incorporate pre-processing and post-processing steps in our Zafin IO data pipelines to ensure the quality and consistency of our data flow automation outcomes.

Key pre-processing uses of Nifi include:

  1. Data Ingestion: Bringing data into NiFi from various sources at the bank such as databases, files, messaging systems, or APIs.
  2. Data Validation: Checking the integrity, format, and quality of incoming data to ensure that it meets specified standards or requirements.
  3. Data Enrichment: Adding additional context or information to incoming data by joining it with reference data, performing lookups, or applying business rules.
  4. Data Transformation: Converting data from one format to another, standardizing data structures, or normalizing values to facilitate downstream processing.
  5. Routing and Prioritization: NiFi enables flexible data routing based on content, attributes, or dynamic conditions. This feature is particularly useful for Zafin IO layer in scenarios where data needs to be routed to different destinations based on the characteristics of the data.
  6. Data Filtering: Filtering out irrelevant or unwanted data based on specified criteria or conditions to reduce noise and focus on relevant information.

We have created standard reusable components in Zafin IO that can apply filters, perform joins, enrich data with metadata, and execute complex processing logic on the fly. Zafin IO utilizes NiFi to standardize and cleanse data as it moves through the data pipeline, ensuring consistency and accuracy across different systems. For example, Zafin IO uses NiFi to encrypt/decrypt files coming from Bank’s legacy system, convert files from EBCDIC to ASCII, and sanitize the data.

Key post-processing uses of Nifi include:

  1. Data Transformation: Applying additional transformations or calculations to the processed data to conform to bank’s needs e.g. Batch output needs are different from what’s provided by Zafin SaaS out of the box.
  2. Data Publication: Publishing processed data to downstream systems, applications, or users for consumption or further analysis.
  3. Data Routing: Routing processed data to different destinations based on business rules or integration requirements.
  4. Data Archival: Archiving processed data for compliance or backup purposes.

A typical flow which involves NiFi inside Zafin IO:

Figure 5 — typical data flow showing how NiFi is used in Zafin IO

Key Benefits of Nifi at Zafin

  1. Reliability and Fault Tolerance: NiFi is designed to be highly reliable and fault-tolerant, with built-in mechanisms for data provenance, flow control, and error handling. By incorporating NiFi into the architecture, Zafin IO has been able to ensure that data is processed reliably and consistently, even in the event of system failures or network disruptions.
  2. Integration with Ecosystem: NiFi integrates seamlessly with other components of the ecosystem like Redpanda, Flink, etc. This interoperability enables Zafin IO layer to leverage additional capabilities for data storage, processing, and analytics as needed. By leveraging NiFi’s integration capabilities, Zafin build a scalable and extensible data processing pipeline that meets the evolving needs of its customers.
  3. Improved Time to value: Using custom and built-in processors, we have been able to make reusable components that will allow faster data flow automation and integration with the banks. A great example of a reusable custom Nifi processor is Zafin IO’s EBCDIC to ASCII convertor.
  4. Improved Speed: We have used NiFi’s built-in Apache Avro support for both streaming and writing files which makes processing faster and saves a lot of storage space since Avro stores the data in binary format. For one of our clients, Zafin IO processes more than 1TB of data every day achieving a throughput of 20,000 records/second while ingesting the data, converting from EBCDIC to ASCII, filtering data that’s not required, and transforming the schema to conform to that required by the Zafin SaaS platform.

Conclusion

Apache NiFi’s combination of visual data flow design, dynamic routing, built-in monitoring, extensive processor ecosystem, scalability, security, and community support make it a highly valuable tool for organizations seeking to streamline and automate their data integration and processing workflows. These capabilities make NiFi a great fit for Zafin IO.

There is no one size fits all. Depending on the characteristics of the data and the task that needs to be performed, Zafin IO provides one of several options including Apache Nifi, Apache Flink, and Redpanda Connect.

We hope you found this blog to be helpful. We are always learning, so if you have any suggestions or experiences with similar approaches, please share them in the comments section. We love to hear and learn from others.

Stay tuned for next part of our blog series on Zafin IO & Nifi:

  • Installing NiFi and the Zafin IO Local Development Environment
  • Role of the NiFi Registry in Release Management

Acknowledgments

We would like to thank Jijoe Vurghese for his review of this blog. His feedback helped improve the quality of this blog.

References

  1. Zafin Website (2024) Zafin. Available at: https://zafin.com/ (Accessed: 31 July 2024).
  2. Zafin integrate & orchestrate (2024) Zafin. Available at: https://zafin.com/zafin-io/ (Accessed: 31 July 2024).
  3. Engineering at Zafin (2023) Announcing Zafin Integrate & Orchestrate (IO) and Data Fabric, Medium. Available at: https://medium.com/engineering-zafin/announcing-zafin-integrate-orchestrate-io-and-data-fabric-fc5b638974df (Accessed: 31 July 2024).
  4. Apache NiFi (no date a) Apache Nifi, Apache NiFi. Available at: https://nifi.apache.org/ (Accessed: 31 July 2024).
  5. Apache Flink® — stateful computations over data streams (no date) Apache Flink. Available at: https://flink.apache.org/ (Accessed: 31 July 2024).
  6. Digital Platform: Transforming Banking experiences with Dynamic Solutions (2024) Zafin. Available at: https://zafin.com/platform/ (Accessed: 31 July 2024).
  7. EBCDIC (2024) Wikipedia. Available at: https://en.wikipedia.org/wiki/EBCDIC (Accessed: 31 July 2024).
  8. Redpanda (no date b) The streaming data platform for developers, Redpanda. Available at: https://redpanda.com/ (Accessed: 31 July 2024).
  9. Redpanda (no date a) Redpanda connect: 220+ pre-built connectors, Redpanda. Available at: https://redpanda.com/connect (Accessed: 31 July 2024).
  10. Gallego, A. (2024) Redpanda Connect, Redpanda. Available at: https://redpanda.com/blog/redpanda-connect (Accessed: 31 July 2024).
  11. Apache NiFi Registry (no date b) Registry, Apache NiFi. Available at: https://nifi.apache.org/projects/registry/ (Accessed: 31 July 2024).

--

--