Event Driven Architecture over Polling Architecture for File Transport using Java NIO

with Project-X and UltraStudio

Chanaka Lakmal
Apr 18, 2017 · 7 min read
Java NIO

Overview

File transport allows files in the local file system to be read from and written to. A polling transport scans a directory or set of directories repetitively with a given period of interval. This is usually an overhead and leads to inefficient use of system resources since it scans the entire set or directories and files periodically even when there are no modifications. As a solution, NIO file transport acts as a non-polling transport which will trigger an event if and only if a file or a directory is created or modified within its monitoring scope.

NIO File Transport

NIO File Transport of AdroitLogic Project-X is a good industrial level practical application which acts as a non-polling file transport. Developing integration flow with NIO File Transport has been easier with Project-X and UltraStudio.

How?

NIO File Ingress Connector

NIO File Ingress Connector which is a simple connector that allows you to fetch the files as you want. This ingress connector is used for the NIO File Transport as described in the following documentation.


How this actually works?

The JDK 7 provided a special package called java.nio. Java NIO (New I/O) is an alternative I/O API for Java (from Java 1.4), serving as an alternative to the standard Java I/O and Java Networking API’s. Java NIO offers a different way of working with IO than the standard I/O API’s. This package provided a sub-package java.nio.file containing a file system change notification API, called the WatchService API. This API enables to register a directory (or set of directories) with a watch service which, when started, captures all the events of creation or modification of files and directories and makes them available via an event queue, similar to the Linux inotify API.

So, in this event based file transport we give set of parameters as follows:

Root path : Root path for a file/directory to watch the files from

Path pattern : A pattern for file path to watch for files

Pattern Syntax : Pattern syntax of the file path pattern (glob or regex)

Initially when the watch service get started it register all the directories and sub-directories under the root directory that we register the watch service for. Then it keeps the eye on each and every directory registered under the root directory. Whenever a file or directory is created or modified or deleted it trigger an event. But the event is triggered according to the registered events that we are interested. Then if it is a directory it is registered in the watch-service in order to detect the changes inside of that and if it is a file it is taken for the process.

The triggered events are added to a queue by the watch service and we have to process it by taking them as batches by key.pollEvents(). That returned event has 4 types of StandardWatchEventKinds as follows:

ENTRY_CREATE : A directory entry is created.

ENTRY_DELETE : A directory entry is deleted.

ENTRY_MODIFY : A directory entry is modified.

OVERFLOW : Indicates that events might have been lost or discarded.

Path pattern and pattern syntax is used for filter the files we want. We can give a path pattern in the syntax of glob or regex and ask for scan several directories for the several types of files we want.

If these types of pattern is given after we get an event from the file we check the file name + file path with the path pattern. So that we use getPathMatcher method provided by java.nio.FileSystem package. We create a custom GRPattern class which keeps the pattern details with pattern syntax and path pattern.


OVERFLOW Event

OVERFLOW is a special type of event and we do not have to register for the OVERFLOW event to receive it. When this type of an event received it means that the queue is overflowed and no space to add currently creating events. After we reset the queue it start the process again. But there may be one or more events lost due to this overflow. So it is to be handled manually as follows:

  1. We can take the system time and directory when an OVERFLOW event get detected. Then we can scan the relevant directory for the files which have been created before that time. That is not an overhead because since the normal procedure detects the files that have been created in that directory there are very small amount of files left to process. But if we don’t remove the detected file from the system this approach is not going to work.
  2. We can stop the scanning of current directory which is involved for the overflow. Then we can register that directory as a new directory created now and start the procedure from the beginning (only for that directory).

Example of OVERFLOW event

Suppose a case where a new directory get registered with the watch service and at a moment a huge number of files (such as 10000) copied to that directory or created inside that directory. Then the watch service will detect all the files and queue for processing. But if the dequeuing and processing part of the files is slower than the enqueuing of files to the queue, the event OVERFLOW will be thrown.


Advanced

There is a special type of case which we are a going to miss some amount of files due to that. If the file system creates files or directories in a high rate, there may be some amount of files created in a particular directory before we register that directory in the watch service although it doesn’t takes considerable amount of time. So in that case also it is to be handled manually as follows:

We know the directory when the directory get registered in the watch service. Then we can scan the relevant directory for the files which have been created in it. Those are the files which are missed due to the time taken by watch service to register particular directory. The files created after the watch service get registered will be caught by normal scenario.

For the scanning of particular directory we can use the FileVisitor interface provided by java.nio.file package. The interface has four methods that correspond to these situations:

preVisitDirectory : Invoked before a directory’s entries are visited.

postVisitDirectory : Invoked after all the entries in a directory are visited.

visitFile : Invoked on the file being visited.

visitFileFailed : Invoked when the file cannot be accessed.

So, by the implementation of this level we can detect any file which is created and which is to be created in the local file system without polling and with the requirement of the user.


What is Project-X ?

Project-X of AdroitLogic

Project-X is the latest product of Adroitlogic. It is a modularized framework that provides a base for any integration product. The pluggable connector architecture allows any transport or API to be integrated into the Project-X at your wish, while the pluggable processor architecture allows the same for adding integration processors.

What is UltraStudio ?

UltraStudio of AdroitLogic

UltraStudio is another latest product of AdroitLogic. It comes with hundreds of resources to kickstart your integration project, including a lot of sample projects to help you understand the new development flow. With a wide range of connectors for JMS, HTTP, FIX, AS2, SFTP, AMQP, and other protocols, and processors to manipulate the messages flowing through the UltraESB-X, developing integration flows has become easier and more intuitive than ever.

Special Thanks

My special thanks goes to the team of AdroitLogic for the help and advises given me for this task.


Chanaka Lakmal

Written by

Tech Enthusiast | Software Engineer @ WSO2 | Computer Science Engineering @ UoM | Rotaractor | Maliyadeva College