Real-time sync from Salesforce and SQL Server to Kafka

An actual customer use case, with code & instructions, to sync data from Salesforce and SQL Server to Kafka in real-time using Grainite

Gautam Mulchandani

Published in

Grainite

4 min readAug 17, 2023

Introduction

In the past few blogs here and here, we covered how Grainite acts as an intelligent middleware to enable real-time, bi-directional data sync between any source and sink. The secret sauce behind Grainite is that it converges a message queue, an event processing engine, and a streaming database natively into one single product. This enables Grainite to hold the cursor state to enable bidirectional sync, but also perform stateful transformations needed to match schema changes if necessary.

Customer story: Real-time data movement between multiple sources and a sink

A customer of ours wanted to sync data from their Salesforce application as well as SQL server to a Kafka topic. They plan to feed the data from this Kafka topic to a downstream application.

Using a cloud-based data movement vendor would have been less than ideal for a few reasons.

The activity-based pricing would have blown away the customer’s budget as the payload is going to be significant.
Moving normalized data from the SQL Server database adds to the workload activity as it needs to be denormalized in the destination.
As of now the customer only plan to connect the sources to the sink. But in the future, if they want to perform any transformations on the incoming data, they would have to buy a separate tool like DBT.

Real-time data sync with Grainite

Grainite is a converged streaming application platform. The key differences between Grainite and other vendors are that it can enable real-time, bidirectional data synchronization and transformation between any source and sink.

Let’s dive into the details of this customer use case and the low-code written to solve their pain points. Anyone can download our code samples and achieve the same using our free docker image.

Use case details

This application shows how Grainite can stream changes from Salesforce and SQL Server to Kafka in real-time. In this example, the data events are directly transferred to Kafka without any transformations.

A few points to note:

Grainite comes packaged with connectors for major data sources such as SQL Server, Salesforce, MongoDB, etc. The customer doesn’t have to write any code.
Change events from SQL Server are pulled by Grainite tasks using the Change Data Capture (CDC) mechanism in SQL Server.
Events from Salesforce can be pulled either using Salesforce Object Query Language (SOQL) or the CDC mechanism if available. This is specified in the configuration in the application YAML file.
Since no transformations are performed in this example, there is no code to write in Grainite. The application YAML file contains all the configurations required to enable this streaming.
Grainite maintains the cursors to the data sources and is responsible for sending events to Kafka in a reliable, exactly-once manner.

Additionally, Grainite embeds a database engine that is optimized for streaming. This enables powerful capabilities for transforming the data streams in a stateful manner. The transforms, filters, and streaming joins can be done in Python or Java.

The application is available here. Before running it, your grainite environment should be set up with version 2326.11 or above. Follow the steps here to get Grainite running on your VM or desktop.

Steps to get started

Get the connection properties of salesforce, SQL server, and Kafka. For SQL Server, ensure that CDC is enabled. Salesforce provides a separate license for CDC. If you don’t have that, set the useCdc property to false.

git clone ….

Set the properties of the salesforce and sql server reader tasks and Kafka Writer in app.yaml.
Load the application into Grainite

gx load -c app.yaml

Grainite provides a way to load sensitive information like passwords and private keys (needed to connect to Salesforce) through the secrets mechanism. Load the secrets file for config params into Grainite. This way you do not need to put passwords in the clear in the app.yaml file

sh secrets.sh

Start the salesforce task to pull data from Salesforce

gx task start sfdc_cdc

Start the SQL Server task to pull data from SQL Server

gx task start sqlserver_cdc

Grainite connects to Salesforce and SQL Server and starts pulling CDC events as soon as the tasks start.

Run gx mon to monitor the status of these tasks and check how they are progressing.

gx mon -c ap.yaml

Data will show up in the Kafka topic as these events are pulled from the sources and pushed to Kafka.

Conclusion

By leveraging Grainite, this customer was able to easily set up a pipeline from SQL Server and Salesforce to Kafka. This was done in a low-code way where all the customer had to do was complete their APP.YAML file with necessary configuration settings.

This is just one example to show how Grainite makes it easy to move data between any sources and sinks. Grainite comes pre-built with multiple extensions that can help move data from sources such as Apache Kafka, Azure, Debezium, Salesforce, and SQL Server and replicate the data to downstream Databases and Data Warehouses. In addition to making the replication real-time, Grainite also automatically handles failures, retries & scaling and enables complex and stateful transformation capabilities, etc.