Apache Kafka to Oracle NoSQL Database: Build A Streaming Data Pipeline on OCI using Goldengate
--
Building a Streaming Data Pipeline from Apache Kafka to Oracle NoSQL Database on OCI using GoldenGate for Big Data
Introduction:
In the bustling arena of today’s business world, the ability to process data in real-time has become an invaluable asset for making swift, informed decisions. This is where the true significance of data integration and streaming solutions comes into play — as an essential link that bridges the various components of a data pipeline.
In this blog post, I’m going to walk you through constructing a data pipeline from Apache Kafka to Oracle Cloud Infrastructure’s (OCI) NoSQL Database, utilizing the power of OCI GoldenGate for Big Data.
OCI offers a fully managed NoSQL Cloud Database, a robust alternative to Amazon DynamoDB, and a comprehensive Change Data Capture tool known as OCI GoldenGate. We’ll explore how to leverage these powerful tools to read data from an Apache Kafka topic and write it to an OCI NoSQL database.
Oracle’s GoldenGate seamlessly orchestrates data replication from source to sink through remote capture processes, known as Extract and Replicat. The Extract process captures the data and records it to trail files. These trail files are then leveraged to create a Replicat process, effectively applying the records to the sink.
Prerequisites:
Before we begin, ensure that you have the following resources set up:
- An Oracle Cloud Infrastructure account.
- An Apache Kafka 3.2 cluster running in your environment.
- OCI GoldenGate for Big Data installed and configured.
- Network connectivity from OCI GG to your Kafka environment and all appropriate network security lists, network security groups has the rules for ports 9092
- IAM privilege on your OCI User to create OCI GG and OCI NoSQL
Write Streaming Data to the Apache Kafka Topic and Creating the OCI Goldengate Setup
Step 1: Generate Sample Data and Write to Apache Kafka Topic
To Deploy the below Apache Kafka Topic and Producer on OCI, you can refer this article.
We need to create a script that generates sample sensor data and writes it to a Kafka topic called “sensors”. Here’s a shell script (sensor_data.sh
) that generates random data as JSON and writes to the topic
#!/bin/bash
topic="sensors"
kafka_server="<your-ip>:9092"
log_file="sensors.log"
KAFKA_HOME="/home/opc/kafka_build_323/kafka_2.13-3.2.3"
while true; do
# Generate random sensor_id, temperature, and humidity
sensor_id=$(printf "s%03d" $((RANDOM % 100 + 1)))
temperature=$(echo "scale=2; 50 + (100 - 50) * $RANDOM / 32767" | bc)
humidity=$(echo "scale=2; 20 + (80 - 20) * $RANDOM / 32767" | bc)
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# Create single-line JSON payload
payload="{\"schema\": {\"type\": \"struct\", \"fields\": [{\"type\": \"string\", \"optional\": false, \"field\": \"sensor_id\"}, {\"type\": \"float\", \"optional\": false, \"field\": \"temperature\"}, {\"type\": \"float\", \"optional\": false, \"field\": \"humidity\"}, {\"type\": \"string\", \"optional\": false, \"field\": \"timestamp\"}]}, \"payload\": {\"sensor_id\": \"$sensor_id\", \"temperature\": $temperature, \"humidity\": $humidity, \"timestamp\": \"$timestamp\"}}"
# Send payload to Kafka topic
echo "$payload" | $KAFKA_HOME/bin/kafka-console-producer.sh --bootstrap-server $kafka_server --topic $topic
# Append payload to log file
echo "$payload" >> $log_file
sleep 1
done
A sample JSON message which is being written to the Kafka topic is as below :
{"schema": {"type": "struct", "fields": [{"type": "string", "optional": false, "field": "sensor_id"}, {"type": "float", "optional": false, "field": "temperature"}, {"type": "float", "optional": false, "field": "humidity"}, {"type": "string", "optional": false, "field": "timestamp"}]}, "payload": {"sensor_id": "s076", "temperature": 54.23, "humidity": 65.45, "timestamp": "2023-05-10T12:26:47Z"}}
Step 2. Deploy OCI Goldengate for Big Data and Create a Connection for Apache Kafka, OCI NoSQL Database using the OCI Console
OCI GoldenGate for Big Data allows you to integrate and stream data from Kafka to OCI NoSQL. To deploy OCI Goldengate for Big data follow the guide here
2.a) Create Kafka OCI GG Connection:
Create a Connection to the Apache Kafka Bootstrap server in Step 1. Go to OCI Console > Oracle Database > Goldengate > Connections > Create Connection
Assign the connection to the OCI Big Data Deployment
2.b) Create OCI NoSQL Database OCI GG Connection:
First create an API key for your user using a private-public key pair
Create a Connection to the Oracle NoSQL Go to OCI Console > Oracle Database > Goldengate > Connections > Create Connection
Step 3. Create the Extract (Capture) Process for Kafka
Go to the OCI Goldengate Big Data Console URL
3.a) Create a New credential for Kafka
As per Oracle’s official documentation follow these steps and Add a credential:
- Open the navigation menu, and then click Configuration.
- On the Credentials page, click Add Credential (plus icon).
- Enter the following details in the fields provided, and then click Submit:
- For Credential Domain, enter
OracleGoldenGate
. - For Credential Alias, enter
kafka
. - For User ID: enter
kafka://
- For Password, enter a password.
- For Verify Password, enter the password again.
3.b) Add Kafka Extract
Parameter File:
-- Parameter file for Kafka extract.
EXTRACT KFH
EXTTRAIL KI
-- TODO: Create a credential store entry with a userid prefix kafka://
SOURCEDB USERIDALIAS kafka DOMAIN OracleGoldenGate
JVMOPTIONS CLASSPATH $OGG_HOME/ggjava/ggjava.jar:$THIRD_PARTY_DIR/kafka/*
-- Connection alias: KafkaICN
TRANLOGOPTIONS _CONNECTIONID ocid1.goldengateconnection.oc1.ap-seoul-1.amaaaaaap77apcqath3ac2jsznxxnxsxiicy27s3wtcph54d6v63xme2jjta
TABLE source.*;
The Kafka topic is being captured in real-time and records return to the trail file ‘KI’
Step 4. Create the Replicat (Apply) Process for OCI NoSQL
From OCI Goldengate Console, go to ‘Overview’ > Create Replicat
Parameters file:
REPLICAT NOSQLKFH
MAP source.sensors , TARGET source.kfhsensors;
Properties file:
# Properties file for Replicat NOSQLKFH
# NoSQL Handler Template
gg.handlerlist=nosql
gg.handler.nosql.type=nosql
gg.handler.nosql.connectionId=ocid1.goldengateconnection.oc1.ap-seoul-1.amaaaaaap77apcqadb7vzzgykj2twstqzn3yfmhxd57r3jnns3s54pgwahaq
gg.handler.nosql.compartmentID=ocid1.compartment.oc1..aaaaaaaak72343qs5qs7wvjvjxlqzfbahtdi2mritj3fwostqqbocvfnk7za
gg.handler.nosql.ddlHandling=CREATE,ADD,DROP
gg.handler.nosql.interactiveMode=true
gg.handler.nosql.storageGb=10
gg.handler.nosql.readUnits=50
gg.handler.nosql.writeUnits=50
gg.handler.nosql.mode=op
gg.classpath=$THIRD_PARTY_DIR/nosqlsdk/*
jvm.bootoptions=-Xmx512m -Xms32m
Step 5. Monitor the Data Streaming
5.a) Check the sensors.log file from Step 1, it should be running a producer script to write data to the Kafka ‘sensors’ topic
tail -f sensors.log
5.b) Check the Statistics of your Extract and Replicat
5.c) Query the OCI NoSQL Database to check the records from Kafka are being persisted to the NoSQL table in real-time
Go to OCI Console > Databases > Oracle NoSQL Database > Tables
Make sure you are in the same compartment which you specified in the CompartmentID in the parameter file in Step 4. above. You can now see a table here called ‘kfhsensors’
Query the table to check the records
We can see all the records being written to the Kafka topic is now persisting to OCI NoSQL Database
Conclusion
The synergy of Apache Kafka, Oracle NoSQL, and OCI GoldenGate provides a solid foundation for real-time data streaming. The integration of these powerful tools gives businesses a streamlined path to making more effective data-driven decisions, thus providing them a leg up in a competitive marketplace.
Embracing OCI’s suite of managed services, which includes the NoSQL Cloud Database and GoldenGate, allows you to scale effortlessly and maintain reliable, high-performance real-time data processing. This arrangement not only alleviates the complexity and overhead associated with managing your data infrastructure but also guarantees high availability and durability of your data.
To sum up, crafting a streaming data pipeline from Apache Kafka to Oracle NoSQL Database on OCI using GoldenGate is a highly efficient method to handle and interpret real-time data. Let the data revolution propel your business forward!
In a future blog post I will demonstrate how to fork the Apache Kafka topic to write one stream to OCI NoSQL Database and another stream to persist data to Oracle Object Storage.
Oracle Developers and Oracle OCI Free Tier
Join our Oracle Developers channel on Slack to discuss this and other topics!
Build, test, and deploy your applications on Oracle Cloud — for free! Get access to OCI Cloud Free Tier!