Stream processing with Spring — Some basic ideas
Stream processing has recently become a phenomenon in Software Engineering. Although it has manifested in different forms and paradigms in the past, this has become both a challenging and interesting problem space in distributed systems lately. This is primarily due to the constant influx of data that is only on its expanding path. Streaming in the general world out there is synonymous with media streaming such as streaming video/audio via the Internet. While that is one way to look at streaming, in the recent years it has sort of acquired a generic meaning in that any kind data that is constantly produced at large scale and sent to a different system is considered streaming.
Numerous domains these days need the streaming paradigm. Banking, financials, stock trading, weather forecast, medical research, health care, telecommunications, law enforcement, media, entertainment and a plethora of other domains need streaming use cases to satisfy their business needs. There are multitudes of use cases that can benefit from a streaming model, some of which includes predictive maintenance, fraud detection, QoS measurement, log analysis, high volume data ingestion, closed loop feedback, partitioning and windowing of data etc.
Some basic features of modern streaming systems:
- High throughput/Low latency systems
- Ability to receive data from multiple sources using a common approach. Some examples of sources might be plain HTTP, TCP, UDP, JSON, or even as simple as plain text.
- Streams needs to go through a pipeline and make use of a data flow architecture.
- Front-end ingester that is highly scalable to handle data from multiple sources
- Combination of fast and big data.
- Streams coming from various sources end up in a sink after optional processing, transformation, filtering etc.
Once data is in the sink, it can be analyzed to produce meaningful insights with it. This can be pipelined to produce other useful metrics, data formats etc. depending upon the various use cases.
With the recent emergence of cloud and micro services and the extreme flexibilities that these offer for enterprises and other companies alike, streaming in the cloud has become yet another challenging area. Micro services have revolutionized the way we develop and deploy software in the cloud. At the core of it, micro services allow us to modularize a large system to its finer level components with bounded contexts. In the venerable Computer Science textbook, Structure and Interpretation of Computer Programs (SICP), authors Abelson and Sussman, have well put it out many years ago — “Well-designed computational systems, like well-designed automobiles or nuclear reactors, are designed in a modular manner, so that the parts can be constructed, replaced and debugged separately”. Micro services can be seen as a timely and modern adaptation of this very vision by the authors of this book.
Various streaming components mentioned above can be abstracted out as small individual micro services deployed in the cloud and connected through a flow. This is where Spring Cloud Streams (SCS) and the Spring Cloud Data Flow (SCDF) projects come in to play backed by the first class cloud and micro services support that they provide. Spring cloud streams are individual apps that can be deployed as micro services in the cloud. This is based on the excellent Spring Boot framework that is one of the best tools out there to develop micro services on the JVM.
Spring cloud streams allow individual applications developed as micro services. These services could be sources, sinks, processors, transformers, filters etc. and deployed on a destination platform using the data flow approach. This gives immense flexibility for the users as they can simply focus on application development. Once they choose a runtime platform (such as Cloud Foundry, Mesos, Kubernetes, Yarn etc.), they can run it there using Spring Cloud Data Flow.
Using Spring Cloud Data Flow, an organization can conceptually see their entire streaming data pipeline as simplistic as a UNIX pipes and filters like architecture while under the hood, it is doing complex distributed operations.
The two projects are in development and fast approaching towards GA as of this writing.
By doing this, Spring is once again delivering the early promises that gave birth to the Spring Framework in the first place on the JVM space, which was making developers’ lives easier. This means that they can focus on the core business logic exclusively and let the framework do all the hard labor of deploying the software, delegating other responsibilities etc.
There is much more to explore in and talk about this emerging landscape in Spring, especially the meaty technical details which I am hoping to elaborate in future short write ups like these.