Replicating data from Salesforce to Google BigQuery using Mule4

Edgar Moran
Another Integration Blog
5 min readDec 10, 2022

--

Photo by Myriam Jessier on Unsplash

Recently I had the opportunity of presenting this topic at the Salesforce NYC Tour, and I find valuable this topic that I would like to share it as a post as well.

Replication jobs are one of the most commons tasks on companies that uses large volumes of data, and it has to be with how much data can be kept in specific systems and the storage available as well.

In this POC I will explain how we are able to replicate the data every 30 mins from Salesforce to Google Bigquery.

First we need to make sure we have our GCP account and a project created, in my case I called it MuleSoft, then in order to start using it we need to create a service account from IAM & Admin > Service Accounts

Creating a service account and setting the permissions (In this case we need to give read and write permissions for Google Bigquery)

Finally we will generate a new key (JSON key) for this sample. The service account will allow us to setup the connector and use it in Mule as well.

The control table

In order to keep control of the last replication date, the objects we want to replicate and maybe some of the fields we want to ignore, a control table will help to keep that information. A control table is better than use a regular Object Store just because we will persist the data even application is getting restarted or deleted. Out control table looks like this:

And for now we will replicate one single table:

From Salesforce we will use the Acquired Accounts object from a analytics trial account since it contains at least 10K records we can use for this POC.

The Mule Application

These are the steps that the mule application will be doing during the execution:

  1. Read the control table from GBQ and map the information and start looping over each recors (meaning each replication table)
  2. As we need to know the whole set of fields we need to retrieve, we will describe de SObject using the Describe SObject operation using the Salesforce connector, and also we will store in a variable what field we need to ignore and the last replication date.

3. Once collected the fields we will use the Get Updated Objects operation. This operation allows to set the start and end date in UTC format and retrieves the Salesforce Id’s that have changed or were created since the last replication date

4. Once we get the initial set of records we will write the first record, this will allow to write a CSV file under the /tmp directory and have an initial file containing the headers.

5. We need to order the columns by name ascending, this way we are making sure the file will write the correct information.

6. Then we can create a batch job, the batch basically will allow to append the records in the CSV file, we need to aggregate / commit 2K records at the time, the flow look like:

7. So now the CSV will have appended data and once the batch process finish we can start the actual replication.

The replication process starts deleting the temporal table (AcquiredAccount__c_temp). This table is important to have it, we need to basically push al data from our CSV file into this table first. The deletion happens as we need to make sure we don’t have any duplicate record with the same key which in this case is the Salesforce ID as is an unique value.

Then we recreate the BigQuery table to have a clean version, we read the File from our directory, we convert the file content into a Java object so it will perform better in the insert all operation.

Next we insert all data and finally we execute the Merge DDL

Finally we need to update the last replication date and delte the temoral CSV file from out /tmp directory

Our table in GCP will have the information from Salesforce now:

In the next run, any new change will be reflected in our table. this way we will have updated information and appended data that is getting created from Salesforce.

You can find the code of this POC in my GitHub repository.

Hope you like it and I’m happy to hear any feedback or additional recommendation.

--

--

Edgar Moran
Another Integration Blog

@Mulesoft Ambassador | Software Engineer @Cisco Meraki | Ex-Twitter | Sr. Force.com developer | innovating in technology, love coding, and photography !