Oracle Developers
Published in

Oracle Developers

Part II: Data processing pipelines with Spring Cloud Data Flow on Oracle Cloud

This is the 2nd (and final) part of this blog series about Spring Cloud Data Flow on Oracle Cloud

In Part 1, we covered some of the basics, infrastructure setup (Kafka, MySQL) and at the end of it, we had a fully functional Spring Cloud Data Flow server on the cloud — now its time to put it to use !

In this part, you will

  • get a technical overview of solution and look at some internal details — whys and hows
  • build and deploy a data flow pipeline on Oracle Application Container Cloud
  • and finally test it out…

Behind the scenes

Before we see things in action, here is an overview so that you understand what you will be doing and get (a rough) idea of why it’s working the way it is

At a high level, this is how things work in Spring Cloud Data Flow (you can always dive into the documentation for details)

  • You start by registering applications — these contain the core business logic and deal with how you would process the data e.g. a service which simply transforms the data it receives (from the messaging layer) or an app which pumps user events/activities into a message queue
  • You will then create a stream definition where you will define the pipeline of your data flow (using the apps which you previously registered) and then deploy them
  • (here is the best part!) once you deploy the stream definition, the individual apps in the pipeline, which will get automatically deployed to Oracle Application Container Cloud, thanks to our custom Spring Cloud Deployer SPI implementation (this was briefly mentioned in Part 1)

At a high level, the SPI implementation needs to adhere to the contract/interface outlined by org.springframework.cloud.deployer.spi.app.AppDeployer and provide implementation for the following methods — deploy, undeploy, status and environmentInfo

Thus the implementation handles the life cycle of the pipeline/stream processing applications

  • creation and deletion
  • providing status information

Show time…!

App registration

We will start by registering our stream/data processing applications

As mentioned in Part 1, Spring Cloud Data Flow uses Maven as one of its sources for the applications which need to be deployed as a part of the pipelines which you build — more details here and here

You can use any Maven repo — we are using Spring Maven repo since we will be importing their pre-built starter apps. Here is the manifest.json where this is configured

manifest.json for Data Flow server on ACCS

Access the Spring Cloud Data Flow dashboard — navigate to the application URL e.g. https://SpringCloudDataflowServer-mydomain.apaas.us2.oraclecloud.com/dashboard

Spring Cloud Data Flow dashboard

For the purpose of this blog, we will import two pre-built starter apps

http

  • Type — source
  • Role — pushes pushes data to the message broker
  • Maven URL maven://org.springframework.cloud.stream.app:http-source-kafka:1.0.0.BUILD-SNAPSHOT

log

  • Type — sink
  • Role — consumes data/events from the message broker
  • Maven URL maven://org.springframework.cloud.stream.app:log-sink-kafka:1.0.0.BUILD-SNAPSHOT

There is another category of apps known as processor — this is not covered for the sake of simplicity

There are a bunch of these starter apps which make it super easy to get going with Spring Cloud Data Flow!

Importing applications

After app registration, we can go ahead and create our data pipeline. But, before we do that, let’s quickly glance at what it will do…

Overview of the sample pipeline/data flow

Here is the flow which the pipeline will encapsulate — you will see this in action once you reach the Test Drive section.. so keep going !

  • http app -> Kafka topic
  • Kafka -> log app -> stdout

The http app will provide a REST endpoint for us to POST messages to it and these will be pushed to a Kafka topic. The log app will simply consume these messages from the Kafka topic and then spit them out to stdout — simple!

Create & deploy a pipeline

Lets start creating stream — you can pick from the list of source and sink apps which we just imported (http and log)

Use the below stream definition — just replace KafkaDemo with the name of your Event Hub Cloud service instance which you had setup in the Infrastructure setup section in Part 1

Stream definition

You will see a graphical representation of the pipeline (which is quite simple in our case)

Stream definition

Create (and deploy) the pipeline

Deploy the stream definition

The deployment process will get initiated and the same will be reflected on the console

Deployment in progress….

Go back to the Applications menu in Oracle Application Container Cloud to confirm that the individual app deployment has also got triggered

Deployment in progress…

Open the application details and navigate to the Deployments section to confirm that both apps have service binding to the Event Hub instances as specified in the stream definition

Service Binding to Event Hub Cloud

After the applications are deployed to Oracle Application Container Cloud, the state of the stream definition will change to deployed and the apps will also show up in the Runtime section

Deployment complete
Spring Cloud Data Flow Runtime menu

Connecting the dots..

Before we jump ahead and test our the data pipeline we just created, here are a couple of pictorial representations to summarize how everything connects logically

Individual pipeline components in Spring Cloud Data Flow map to their corresponding applications in Oracle Application Container Cloud — deployed via the custom SPI implementation (discussed above as well as in part 1)

Spring Cloud Data Flow pipeline to application mapping

.. and here is where the logical connection to Kafka is depicted

  • http app pushes to Kafka topic
  • the log app consumes from Kafka topic and emits the messages to stdout
  • the topics are auto-created in Kafka by default (you can change this) and the naming convention is the stream definition (DemoStream) and the pipeline app name (http) separated by a dot (.)
Pipeline apps interacting with Kafka

Test drive

Time to test the data pipeline…

Send messages via the http (source) app

POST a few messages to the REST endpoint exposed by the http app (check its URL from the Oracle Application Container Cloud console) — these messages will be sent to a Kafka topic and consumed by the log app

curl -X POST https://demostreamhttp-ocloud200.uscom-central-1.oraclecloud.com/ -H ‘content-type: text/plain’ -d test1

curl -X POST https://demostreamhttp-ocloud200.uscom-central-1.oraclecloud.com/ -H ‘content-type: text/plain’ -d test12

curl -X POST https://demostreamhttp-ocloud200.uscom-central-1.oraclecloud.com/ -H ‘content-type: text/plain’ -d test123

Check the log (sink) service

Download logs for log app to confirm . Navigate to the application details and check out the Logs tab in the Administration section — documentation here

Check logs

You should see the same messages which you sent to the HTTP endpoint

Messages from Kafka consumed and sent to stdout

There is another way…

What you can also do is to validate this directly using Kafka (on Event Hub cloud) itself — all you need is to create a custom Access Rule to open port 6667 on the Kafka Server VM on Oracle Event Hub Cloud — details here

You can now inspect the Kafka topic directly by using the console consumer and then POSTing messages to the HTTP endpoint (as mentioned above)

kafka-console-consumer.bat --bootstrap-server <event_hub_kakfa_IP>:6667 --topic DemoStream.http

Messages from Kafka topic

Un-deploy

If you trigger an un-deployment or destroy of the stream definition, it will trigger an app deletion from Oracle Application Container Cloud

Un-deploy/destroy the definition

Quick recap

That’s all for this blog and it marks the end of this 2-part blog series!

  • we covered the basic concepts & deployed a Spring Cloud Data Flow server on Oracle Application Container Cloud along with its dependent components which included…
  • Oracle Event Hub Cloud as the Kafka based messaging layer, and Oracle MySQL Cloud as the persistent RDBMS store
  • we then explored some behind the scenes details and made use of our Spring Cloud Data Flow setup where …
  • … we built & deployed a simple data pipeline along with its basic testing/validation

Don’t forget to…

  • check out the tutorials for Oracle Application Container Cloud — there is something for every runtime!
  • other blogs on Application Container Cloud

Cheers!

The views expressed in this post are my own and do not necessarily reflect the views of Oracle.

--

--

--

Aggregation of articles from Oracle engineers, Groundbreaker Ambassadors, Oracle ACEs, and Java Champions on all things Oracle technology. The views expressed are those of the authors and not necessarily of Oracle.

Recommended from Medium

Why Pure Code Is a Waste of Time and Clean Code Is More Than Enough

Building a Binance Reserves Tracking Bot. (Part 1)

Microsoft Teams App Mac

Why a Reverse Hackathon, and Why Now

Securing the Future with Million EPS @ Cortex Data Lake — Part 1

The 8 hour journey !

Unique & Finite E-Commerce Coupon Codes: Surprisingly Tricky at Scale

Automation of CI/CD Pipeline

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Gupta

Abhishek Gupta

Principal Developer Advocate at AWS | I ❤️ Databases, Go, Kubernetes

More from Medium

Spring Data Processing Pipeline: Getting Started with YugabyteDB CDC

Kafka —  Dead Letter Topic

Kafka Connect

How to Fetch the Data from Bigquery with Java