How to collect data like a spy — Part 1
Apache NiFi allows you to collect and process data from any source to any destination by using flow based programming.
Apache NiFi History
The project was created by the United States National Security Agency (NSA) and originally named Niagarafiles. In 2014 the NSA released it as open-source software, via its technology transfer program.
Flow Based Programming (FBP)
Apache NiFi (Nye Fye) is based on flow based programming (FBP). Flow-based programming (FBP) is a programming defines applications as networks of “black box” processes, which exchange data across predefined connections by message passing. These black box processes can be reconnected endlessly to form different applications without having to be changed internally.
Apache NiFi Components/Processors
NiFi calls the components, processors and there are over 160+ processors out of the box, with the capability to develop your own processors if required.
- Common cloud services, AWS, Azure and Google
- Text/string manipulation
- Database read/write to many database technologies, such as Hadoop, SQL and NoSQL
- Reading, writing, searching, transforming, manipulating CSV, Text, Excel, JSON, and AVRO files
- File ingress from S3, Azure, FTP, SFTP and text files
- HTTP web scraping
- Building of web services
- Data transformation, XLS to CSV, CSV to JSON, JSON to AVRO, and many more
- Validation of files, CSV, XML and AVRO formats
Similar to how parcel services move and track packages, Apache NiFi helps move and track data.
NiFi offers guaranteed data delivery at very high scale, supports high transaction rates and full data provenance.
Benefits of Apache NiFi
- Single data-source agnostic platform for data collection
- Intuitive real-time user interface
- Powerful data security from source to storage
- Highly granular data collection and sharing capabilities
- Extremely scalable and extensible platform
Over the next few posts, I’ll show you how to build NiFi flows that;
- Collect and process CSV files
- Web scraping
- Calling and processing data from APIs
- Processing social media data
- Collect IoT data from multiple sensors
Part One — How to collect data like a spy
Part Two — Getting NiFi up and running
Part Three — How to collect social media data like a pro
Part Four — Creating a database with AWS Athena
Part Five — Connecting RStudio to Athena
Part Six — Creating Maps of the Data in RStudio
Part Seven — Creating an interactive dashboard for your data