Heroku Pattern: Cross-Org Data Synchronization

Published in

Salesforce Architects

11 min readOct 27, 2020

Several years ago, we (Sam and Steve) built an orchestration system for a large enterprise customer. This system was capable of synchronizing data across more than ten Salesforce organizations at scale. The original solution was a multicloud architecture, based on Heroku Connect. Data was routed from Heroku Connect through a pipeline built on AWS, using Kinesis and Lambda.

Since that time we’ve both had conversations with customers about this architecture, and those conversations led to this post. We wanted to revisit that original architecture and redesign it for the current Salesforce platform. Starting from key design considerations, we describe the building blocks of this architecture.

Design considerations

In designing the original system we quickly realized that there’s a lot of complexity to consider in a cross-org synchronization system. What follows is a list of questions that we knew we had to address as we began to work through our design. At first, this may seem overwhelming but keep in mind as you read this that we designed and built this system in an iterative fashion. We didn’t start with all of the answers or even all of these questions. Instead we built a simple proof of concept, and then added layers of capability over time.

Thinking about the data

Data Ownership and Security

How does data ownership change across the organizations?
Are objects public and read-only?
Are there complex role hierarchies to replicate?

Data Overlap

How many objects and record types need to be synchronized?
Are the object definitions the same between organizations?
Is there secondary data that must be synchronized?

Data Transformation

Do field values (e.g. units of measure, currencies, and so on) need to be translated across organizations?
Are the objects represented differently in each organization?
Do records need to be reconciled (e.g. duplicates consolidated) between organizations?

Taming chaos

Error Handling and Monitoring

What is the risk to the business if there are errors in synchronization?
How should potential data corruption or data loss in the event of an integration failure be handled?
How will these failures be detected and remediated?

Disaster Recovery

If there is a disaster, how is the integration itself recovered?
If the organizations are out of sync at the time of failure, how is this reconciled?

Keeping the lights on

System Administration

What tools will be available for managing the integration?
What skills will the users who manage the integration need?
What can be done to make the integration configuration-driven instead of code-driven?

Enabling the business

Cross-Org Business Processes

Are there cross-org business processes that will need to interact with the synchronized data in multiple places?
Is there any automation implemented in the organizations that could potentially cause conflicting or looping changes?
Are there business workflows that will span across users and roles on multiple organizations?

Asynchronous Nature of Integration

What happens if data is updated at the same time in multiple organizations?
How are conflicting changes managed?

Prioritized and Multispeed traffic

Is there data that needs to be synchronized in a more timely manner than other data?
How does the integration respond to bulk loads of data?

Architectural patterns

Bidirectional data replication

When we began to build this system, we started with a proof of concept around the most basic replication pattern. Heroku Connect can replicate data from an object in a connected Salesforce organization to a table in the PostgreSQL database. If the data changes on the PostgreSQL side, Heroku Connect can be prompted to replicate the change back to the Salesforce organization.

Replicating data between multiple Salesforce organizations can be orchestrated by combining Heroku Connect with an instance of Heroku Postgres.

Heroku Connect combined with a Heroku PostgreSQL database enables bidirectional data replication between two Salesforce orgs.

Data replication enabled with a shared database.

Heroku Connect replicates data between an object and a table. In order to replicate between multiple organizations, we need a way to take the change that was replicated to a Postgres table and apply that change to a corresponding downstream table. Then Heroku Connect can replicate the change to the corresponding downstream Salesforce org.

Changes to Salesforce record is replicated to a PostgreSQL table, and triggers can be used to propagate the change downstream

A PostgreSQL trigger can be used to migrate a change to a downstream table.

This approach can be used to enable direct replications between organizations. However, as the number of tables and organizations increase, having individual triggers manage the logic of replicating the data can become unwieldy. One strategy for dealing with this is to centralize the replication logic. This can be done by adding two tables:

Change Queue table: A table that is responsible for tracking all the changes across all of the replication tables. To simplify the implementation, the change queue table can store a copy of the changed data as a serialized JSON string.
Replication Mapping table: A table that is responsible for housing the rules for how data should be replicated. For example, org_a.account replicates to org_b.account.

Replication logic can be centralized by using a dedicated table to serve as a queue for all the changes.

Organization-to-organization replication using a centralized change queue and replication rules.

Note: Any tables that don’t replicate back to a Salesforce org need to be in their own database schema.

While this approach can work, it has some limitations. The database must be able to process all of the inbound and outbound triggers as well as the replication logic. Maintaining the replication logic in a procedural language on the database is problematic; it is difficult to test, expand, and scale.

We shifted the replication logic off of the database and onto an application server, that is, a worker dyno. The trade-off was a decrease in development friction at the cost of the additional complexity in communicating between tiers.

The cross-org replication logic can be shifted to a dedicated application. Mapping rules can be relocated to Heroku Redis.

Shifting the replication logic to an application tier.

This approach offloads the responsibility of replication to the application tier. Because the replication rules rarely change, we can further reduce the burden on the Postgres database by relocating them to a Heroku Redis instance. The syncing app can be designed to replicate data either in batches or dynamically.

To enable an event driven replication model, the PostgreSQL pg_notify() function can be used to send messages to the syncing app. The change_queue table is basically a work queue for the syncing app.

Data transformations with passed messages

Taking the core pattern above, we can evolve this design further by using a streaming message platform. In our original implementation we used AWS Kinesis, but today Apache Kafka on Heroku can provide that capability. A streaming message platform like Apache Kafka on Heroku is important to consider for three reasons:

There’s a potential for spikes in volume where the data flowing into the integration can temporarily exceed the capacity of the synchronization and transformation processing.
There may be multiple services that need to respond to the data changes.
The transformation can be complex and may need to be segmented to better support scaling and maintainability of the pipeline.

The replication pipeline can be expanded to support data transformations by adding Heroku Kafka to the architecture.

Organization-to-organization replication with transformations of the records.

Here a monolithic transformation app could read a message in from Kafka, transform the data, and then write back out to Postgres. Transformation apps from multiple external services could subscribe to this message stream and handle data in whatever ways may be needed.

Heroku Kafka enables storing streaming data in “topics”. Topics can be used to power multistep data transformations.

Multiple Kafka topics can be used to enable multistep transformations.

In this example, the logic for handling messages is segmented into four components. The first routes the message, determining what transformation needs to be applied and what the final destination of this message should be. From there, multiple message transformers can be implemented to apply different layers of changes from simple field mapping to more complex translations like processing parent/child relationships. The last agent then writes out the final message once modified.

In this pattern, multiple Kafka topics provide delivery for each segment of the pipeline. In this way, not only does the code remain more modular, but it also becomes possible to scale each component independently. This allows for message prioritization, but also helps optimize the cost of processing against the true demand.

Claim Check pattern

A final evolution of this pattern is to employ the Claim Check pattern. With this approach, instead of the complete representation of the transformed data being passed through the message pipeline, the message body is stored independently, while a message header is sent through Kafka. With each change to the data, a new representation of that data can be stored, with the final representation being written out to Postgres.

Cross-org integration with the Claim Check pattern.

There are two key advantages to this. First, it allows for a reduction in the size of messages that are delivered through Kafka. Kafka performance is optimized for 1K messages, and the default limit on message sizes in Kafka on Heroku is 1MB. Smaller messages consume less of Kafka’s storage capacity, which can enable longer message retention if needed.

The second advantage is that this provides a more robust pipeline. If at any point, a part of the transformation pipeline fails, we don’t lose any of the data that was in transit. If we see data corruption happening, we can review the data as it was in the pipeline, and isolate any errors that may be causing it.

The message store could be implemented a few ways, but typically this would be an S3 bucket or maybe even a secondary Postgres database. The critical need here is a store that can handle the necessary throughput, and store a potentially large number of messages. Also, since this message store will have a complete representations of the data, it’s important to ensure this store aligns with security policies, including encryption and access controls.

Heroku streaming data connectors

As we were writing this post, Heroku made the streaming data connector available as a beta offering. We did not evaluate it while writing this post, but the data connector is an exciting new Heroku capability and has the potential to simplify some of the integration details specific to integrating Kafka with PostgreSQL.

Scaling and error handling

As the linchpin in this architectural pattern, the database has several responsibilities: accept inbound data from upstream Heroku Connect connections, house replication rules, process replication triggers, and fulfill queries made by syncing apps and downstream Heroku Connect connections. As the demand on the system increases, the database capacity will need to be increased.

This pattern can be scaled in two ways. The simplest is to scale vertically, using a more powerful database, but this isn’t always an option. The alternative is to scale horizontally by increasing the number of databases.

Vertically scaling the database

Heroku provides five different tiers of database plans. Given the sensitivity of Salesforce data, high database utilization, and low tolerance for database outages, we recommend using only databases in the Private and Shield tiers for cross-org synchronization workloads.

Calculating required storage

When evaluating the required database size, measure the historical growth of your existing Salesforce data. This can be done using a SOQL query similar to the one below.

An estimate of the Postgres required size can be calculated by manually creating the tables, loading a sample of the desired replicated data, and then invoking the Postgres object size functions to calculate the various table sizes. The sample table sizes can be used with the projected growth to calculate the required future database size.

Projecting concurrent connections

Heroku Connect uses one database connection per mapping and another connection if any of the mappings are in read-write mode. Each Heroku Connect app requires a dedicated schema, that is one per organization. PostgreSQL performance has been observed to start to degrade after 50 schemas.

Horizontally scaling the database

Scaling a data tier typically involves separating read operations from write operations by having write primary nodes that are replicated to read-only clones. Given the bidirectional nature of Heroku Connect, this strategy is unlikely to provide much benefit. Rather, having a dedicated PostgreSQL database per Salesforce organization would enable partitioning the number of transactions per database.

If you have more data than what a single PostgreSQL database can handle, try one database per org.

Scaling the data tier by having a dedicated Postgres instance per Salesforce organization.

Error handling

Failures are inevitable. The system must take into consideration that some transactions will not be able to be successfully processed. How this is done depends on the level of complexity of the system.

An integration system that houses the replication logic at the data tier can use tables dedicated to capturing failures. A cron job can be used to scan for errors and then send a notification when new error rows are detected.

If the replication logic is hosted on an application tier, a dedicated queue or messaging topic can store transactions that could not be processed. Dead letter queues or retry queues may be appropriate depending on how errors are to be processed.

Conclusion

When creating a synchronization system, it’s important to consider the balance between complexity and flexibility. The system we built was very flexible, but also highly complex. It needed to be that way because of the business needs. A simpler direct integration wasn’t feasible.

Taking an iterative approach to building a system like this can help manage that necessary complexity. As it evolves, the solution should provide the flexibility to pivot as business needs change. Mergers, changing compliance requirements, or new products could all lead to new integration needs.

When designing a cross-org synchronization architecture it is best to start simple and and add complexity only when needed. When business needs dictate a more complex approach, the Heroku platform’s flexibility makes it a powerful solution. An architecture that starts with simple bidirectional replication can evolve into a highly scalable message-based integration without needing to shift to an entirely different platform.

To learn more about the options available see the Heroku documentation.

About the authors

Samuel Holloway is a Regional Success Architect Director at Salesforce. He focuses on custom development, enterprise architecture, and scaling solutions across the Salesforce clouds. When he’s not building solutions at Salesforce, he enjoys tinkering with computer graphics and game engines. You can see what he’s up to on GitHub @sholloway.

Steve Stearns is a Regional Success Architect Director who has been with Salesforce for over 10 years and has worked extensively on building scalable integrations with Salesforce.