It’s Here! An Apache NiFi Simulator for Generating Random AND Realistic Time Series Data

Your Tool for Generating Random Data That Acts Like a Collection of Sensors

--

by Chris Herrera

Over the past few months as our engineering team has been building out the Tempus IIoT/IoT Platform Accelerator, it’s been evident that a realistic, real time data simulator would be needed to test more than just load and to stress our ingest. We also needed to be able to exercise our stream computation, persistence, and batch systems with representative data and queries. Hence, the origin of the Apache NiFi Simulator Bundle.

Why the Need for a Real Time IIoT Data Simulator?

The Apache NiFi Simulator Bundle is a processor that wraps the great work done by the TSimulus project and provides a utility that allows for generating random, realistic time series data.

An infographic of industries and sample datasets requiring this type of real time simulation capability is below.

This NiFi processor came out of a need not to just generate random data that looked like a sensor, but random data that acts like a real sensor. This allows a developer to test out far more capabilities of an IIoT platform than just the throughput. The ability to apply noise, cyclical and seasonal components allows for a much more realistic test set, without having to find and acquire numerous public data sets (which are extremely limited in the verticals mentioned above).

How Does it Work?

The simulator is driven by a configuration file that defines the shape (patterns, noise, cycles, etc.) and then converts that data into a time series value.

Additionally, this processor contains logic that allows the simulation to be run in real time, further making the simulation more realistic.

Using the NiFi Processor

As mentioned above, this processor wraps TSimulus, however, it does include the facility to generate the data in real time. Additionally, the processor outputs the data to a NiFi flow file in a CSV format. Each exported value will be a new line.

The flowfile will be output in the following format:

There are 2 properties to the processor.

  1. Simulator Configuration File — This is the location on disk where the TSimulus configuration file is located
  2. Print Header — This is a boolean value which will indicate whether or not the name,ts,value header is printed in the flowfile

The intention of this processor is to be used along with the new Record-Based processors in Apache NiFi. I suggest using the following Avro schema in the Schema Registry controller service and then use the record processors to further operate on the data.

Yes — Configuration Documentation is Provided

The configuration file documentation is located here. This will explain how to build the JSON configuration document.

NOTE: One caveat that needs to be mentioned, the configuration JSON document contains from and to elements which define the bounds of the set. While the simulator will still run when the simulation is run outside of those bounds, it is wise to set the to element to some point in the future, as the simulation tends to generate more realistic time data when the timestamp that data is being generated for lies within the bounds of the time window.

Additionally, at the moment, the frequency of the exports are not honored. The data will be generated in an aligned form at the specified schedule of the processor.

How to Get Started

NOTE: The processor was built against the Apache NiFi 1.3 Maven Archtype.

To build the library and get started, first clone the GitHub repository

git clone https://github.com/hashmapinc/nifi-simulator-bundle.git

Change directory into the WitsmlObjectsLibrary

cd nifi-simulator-bundle

Execute a maven clean install

mvn clean install

A build success message should appear

[INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 13.271 s[INFO] Finished at: 2017-06-30T15:14:58-05:00[INFO] Final Memory: 20M/377M[INFO] ------------------------------------------------------------------------

Navigate to the nifi-simulator-bundle-nar/target directory and copy the nifi-simulator-bundle-nar-1.0-SNAPSHOT.nar file to the Apache NiFi /lib folder.

Restart NiFi.

I Welcome Your Feedback!

Run the NiFi Simulator Bundle through its paces and let me know if you have any questions or comments as you start generating random, realistic time series data. You should also check out the WITSML Objects Library SDK. Please reach out for any assistance needed with your Apache NiFi, Time Series, and IIoT/IoT requirements or for a discussion and/or demonstration of our Tempus IIoT/IoT Platform Accelerator.

Feel free to share on other channels and be sure and keep up with all new content from Hashmap at https://medium.com/hashmapinc.

Chris Herrera is a Senior Enterprise Architect at Hashmap working across industries with a group of innovative technologists and domain experts accelerating the value of connected data for the open source community and customers. You can follow Chris on Twitter @cherrera2001 and connect with him on LinkedIn at linkedin.com/in/cherrera2001.

--

--