Apache Flux: Frictionless STORM topology management — Part 1
Introduction
Apache Flux is a framework for creating and deploying Apache STORM topologies. With Apache Flux, deploying topologies for real-time processing becomes less programmatic and more declarative.
One of the pain points are writing topology graph within the Java code and any changes require recompilation and repackaging of the topology jar. Flux aims to relieve the pain by allowing to package all storm components in a single fat jar, and use an external text file to define the configuration of the topology. It leverages YAML, a human-readable serialization format, to describe a topology on a whole.
At its core the main concept in Flux is a directed loop, a uni-directional data flow where events and data proceed always in the same direction. This makes everything “easier to manage as it gets more complex” and removes a lot of ambiguities on the relation of the various components.
Integrating Flux in your project
The easiest way to use Flux, is to add it as a Maven dependency available in Maven Central. Using shell spouts and bolts requires additional Flux Wrappers
<dependencies> <!-- Flux include --> <dependency> <groupId>org.apache.storm</groupId> <artifactId>flux-core</artifactId> <version>${storm.version}</version> </dependency> <!-- Flux Wrappers include --> <dependency> <groupId>org.apache.storm</groupId> <artifactId>flux-wrappers</artifactId> <version>${storm.version}</version> </dependency> <!-- add user dependencies here... --> </dependencies>
Additionally we can set Flux as the main class using the Maven Shade plugin as following:
<!-- create a fat jar that includes all dependencies --> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>1.4</version> <configuration> <createDependencyReducedPom>true</createDependencyReducedPom> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.apache.storm.flux.Flux</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>
YAML Configuration
Flux topologies are defined in a YAML file that describes a topology. A topology consist of 6 parts
YAML Template
name: "flux-config-example" config: topology.workers: 2 # definition of dependent components required components: ... # spout definitions spouts: ... # bolt definitions bolts: .... # stream definitions streams: ... # coupling of bolts and spouts to form a streaming topology
YAML configuration can also be constructed for different environments by managing environment dependent properties in their respective property file. Some of the options available in YAML configuration are,
- Property Substitution/Filtering
- Environment variable substitution/Filtering
- Static factory methods
- Constructor Arguments, References, Properties and Configuration Methods
- And wide range of topology configurations
Deploying the Topology
Each topology is still deployed in the same manner but using storm jar, only the main class is now Flux. The deployment can be local by supplying a -locate as a parameter or remotely with -remote.
storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote flux_demo_config.yaml
Example Output
███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ +- Apache Storm -+ +- data FLow User eXperience -+ Version: 0.3.0 Parsing file: /Users/hsimpson/Projects/donut_domination/storm/shell_test.yaml ---------- TOPOLOGY DETAILS ---------- Name: shell-topology --------------- SPOUTS --------------- sentence-spout[1](org.apache.storm.flux.wrappers.spouts.FluxShellSpout) ---------------- BOLTS --------------- splitsentence[1](org.apache.storm.flux.wrappers.bolts.FluxShellBolt) log[1](org.apache.storm.flux.wrappers.bolts.LogInfoBolt) count[1](org.apache.storm.testing.TestWordCounter) --------------- STREAMS --------------- sentence-spout --SHUFFLE--> splitsentence splitsentence --FIELDS--> count count --SHUFFLE--> log -------------------------------------- Submitting topology: 'shell-topology' to remote cluster...
Reference
Conclusion
This blog aimed to introduce Apache Flux and it’s benefits on reducing the code complexity of managing STORM topology within the code. On further blog posts, we will go through the sample code.
Happy Coding!!! 🙂
Published
Originally published at http://syednotes.wordpress.com on November 28, 2021.