Apache Flux: Frictionless STORM topology management — Part 1

Introduction

Apache Flux is a framework for creating and deploying Apache STORM topologies. With Apache Flux, deploying topologies for real-time processing becomes less programmatic and more declarative.

One of the pain points are writing topology graph within the Java code and any changes require recompilation and repackaging of the topology jar. Flux aims to relieve the pain by allowing to package all storm components in a single fat jar, and use an external text file to define the configuration of the topology. It leverages YAML, a human-readable serialization format, to describe a topology on a whole.

At its core the main concept in Flux is a directed loop, a uni-directional data flow where events and data proceed always in the same direction. This makes everything “easier to manage as it gets more complex” and removes a lot of ambiguities on the relation of the various components.

Integrating Flux in your project

The easiest way to use Flux, is to add it as a Maven dependency available in Maven Central. Using shell spouts and bolts requires additional Flux Wrappers

<dependencies> <!-- Flux include --> <dependency> <groupId>org.apache.storm</groupId> <artifactId>flux-core</artifactId> <version>${storm.version}</version> </dependency> <!-- Flux Wrappers include --> <dependency> <groupId>org.apache.storm</groupId> <artifactId>flux-wrappers</artifactId> <version>${storm.version}</version> </dependency> <!-- add user dependencies here... --> </dependencies>

Additionally we can set Flux as the main class using the Maven Shade plugin as following:

<!-- create a fat jar that includes all dependencies --> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>1.4</version> <configuration> <createDependencyReducedPom>true</createDependencyReducedPom> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.apache.storm.flux.Flux</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>

YAML Configuration

Flux topologies are defined in a YAML file that describes a topology. A topology consist of 6 parts

YAML Template

name: "flux-config-example" config: topology.workers: 2 # definition of dependent components required components: ... # spout definitions spouts: ... # bolt definitions bolts: .... # stream definitions streams: ... # coupling of bolts and spouts to form a streaming topology

YAML configuration can also be constructed for different environments by managing environment dependent properties in their respective property file. Some of the options available in YAML configuration are,

  • Property Substitution/Filtering
  • Environment variable substitution/Filtering
  • Static factory methods
  • Constructor Arguments, References, Properties and Configuration Methods
  • And wide range of topology configurations

Deploying the Topology

Each topology is still deployed in the same manner but using storm jar, only the main class is now Flux. The deployment can be local by supplying a -locate as a parameter or remotely with -remote.

storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote flux_demo_config.yaml

Example Output

███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ +- Apache Storm -+ +- data FLow User eXperience -+ Version: 0.3.0 Parsing file: /Users/hsimpson/Projects/donut_domination/storm/shell_test.yaml ---------- TOPOLOGY DETAILS ---------- Name: shell-topology --------------- SPOUTS --------------- sentence-spout[1](org.apache.storm.flux.wrappers.spouts.FluxShellSpout) ---------------- BOLTS --------------- splitsentence[1](org.apache.storm.flux.wrappers.bolts.FluxShellBolt) log[1](org.apache.storm.flux.wrappers.bolts.LogInfoBolt) count[1](org.apache.storm.testing.TestWordCounter) --------------- STREAMS --------------- sentence-spout --SHUFFLE--> splitsentence splitsentence --FIELDS--> count count --SHUFFLE--> log -------------------------------------- Submitting topology: 'shell-topology' to remote cluster...

Reference

Conclusion

This blog aimed to introduce Apache Flux and it’s benefits on reducing the code complexity of managing STORM topology within the code. On further blog posts, we will go through the sample code.

Happy Coding!!! 🙂

Published

Originally published at http://syednotes.wordpress.com on November 28, 2021.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

A way of connecting microservices in NodeJS

1 new photos on my Flickr!

Game Critique Blog — Coding for Carrots

Join the LightSide

Creating an SSH-enabled Subversion Server on Linux

Basic Array Functions Which Every JS Developer Should Be Aware Of.

CSI: Software — Code Scene Investigators (or: Handling defects)

Netezza — Optimising Data Types

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
syedroshan

syedroshan

More from Medium

How ModelSim Scripts Can Boost Your Workflow ?!

MetaPrimo — security and new features for poker players

VPN: What is It and Why Do You Need It?

Google’s Privacy Sandbox: An Attempt to Control Data-Dependent Markets?