Optimizing Data Synchronization for Downstream Systems in Azure Synapse Link

Nadeem Khan(NK)
LearnWithNK
Published in
5 min readSep 7, 2024

--

Keeping your source and downstream systems in sync is key to maintaining data accuracy and smooth operations. When using Azure Synapse Link, the strategy you choose for handling incremental records — like watermarking, truncating and loading, or migrating from Export to Data Lake — can make a big difference in your system’s efficiency. This article breaks down these options to help you find the best fit for your needs. Just a heads-up: we won’t be covering any strategies that use Spark Pool for downstream data processing.

Scenario 1: If a downstream process uses a watermarking approach for processing Incremental Records

Assumption

  • With the Spark pool option enabled in the Synapse Link Profile, the views on top of the incremental folder might become inaccessible, as the Spark job could delete them after capturing changes in the Delta Parquet files
  • The downstream system can then perform UPSERTs and DELETEs on the data.

Watermark column

You have to choose whatever works best. I am listing down a few general options with their pros and cons.

SinkCreatedOn

  • Pros: This is a metadata column that captures the timestamp for record creation or update, so it will always have a valid timestamp.
  • Cons: If you ever need to reinitialize the table, the column’s value will be overwritten with a new timestamp based on when it’s rewritten to the data lake. This means you could lose the history of the original column value.

createdon/modifiedon

  • Pros: This captures the exact time when a record is created or updated within D365.
  • Cons: If CreatedDatetime and ModifiedDatetime aren’t enabled for the table, a default date is populated for all rows. This makes it impossible to identify which rows to pick up incrementally.

Using SinkCreatedOn isn’t recommended unless the downstream system can handle duplicates. Instead, I suggest creating a configuration file that stores the watermark column for each table. This way, you can accommodate different watermark columns based on the specific business logic for each table..

Initial Sync to downstream system

Select only active records by filtering all deleted records.

SELECT Id, * FROM dbo.<table_name> WHERE Isdelete IS NOT NULL

Incremental Inserts and Updates to the downstream system

Using the watermark column, you can capture both inserts and updates

SELECT Id, * FROM dbo.<table_name> WHERE IsDelete is Null and watermarkcolumnval > @lastwatermarkcolumnval

Incremental Deletes to the downstream system

Based on the watermark column and Isdelete flag, you can capture deletes.

SELECT Id, * FROM dbo.<table_name> WHERE IsDelete is not Null and watermarkcolumnval > @lastwatermarkcolumnval

You can also combine INSERTs, UPDATEs, and DELETEs in a single query and let the downstream system determine which operation to perform based on the IsDelete column.

SELECT Id, * FROM dbo.<table_name> WHERE watermarkcolumnval > @lastwatermarkcolumnval

If a record is deleted in D365 FnO, Azure Synapse Link sets all columns from D365 FnO to NULL. Therefore, the downstream system must use the Id column to identify the deleted records.

Pros

The most reliable way to handle inserts, updates, and deletes.

Cons

If you’re migrating from Export to Data Lake, you’ll need to update the downstream system to store the Id column instead of RecId or _SysRowId for handling deletes.

Scenario 2: If a downstream cannot perform UPSERTs and Deletes

Assumption

The downstream system cannot perform UPSERTs and DELETEs.

Solution

Use the Truncate and Load approach to always capture active records. This way, all INSERTs, UPDATEs, and DELETEs will be handled automatically without needing extra logic.

SELECT * FROM dbo.<table_name> WHERE IsDelete Is Null

Pros

Suited for downstream systems that require only 2–3 refreshes per day.

Ideal for a migration strategy from Export to Data Lake to Azure Synapse Link. In the first phase, you can switch the downstream system to a Truncate and Load approach, then later update it to use the Id column.

Cons

This approach might become problematic if the downstream system requires frequent refreshes from D365. Additionally, refreshes may be slower since the downstream system has to process all the data each time

Scenario 3: If you are Migrating from Export to Data Lake and the downstream system uses Changefeed

Assumption

You are an Export to Data Lake user and planning your Migration to Azure Synapse Link

Solution

Create a Synapse Link Profile without enabling the Spark pool option. This will allow Synapse Link to preserve incremental records stored in timestamped folders and the initial export from selected tables in CSV format. Additionally, you’ll need a watermark column, similar to what’s described in Scenario 1

Initial Sync to downstream system

Select all the initial export data

  SELECT * FROM <container_name>.dbo.<table_name>

Incremental Inserts and Updates to the downstream system

You can select the latest version of a row based on the sysrowversion and watermark columns to fetch new inserts and updates.

WITH LatestRows AS (
SELECT
Id,
MAX(sysrowversion) AS LatestVersion
FROM
<container_name>.dbo.<table_name>
WHERE
IsDeleted IS Null
AND watermarkcolumnval > @lastwatermarkcolumnval -- Filter early by watermark
GROUP BY
Id
)
SELECT
t1.*
FROM
<container_name>.dbo.<table_name> t1
INNER JOIN
LatestRows t2 ON t1.Id = t2.Id AND t1.sysrowversion = t2.LatestVersion
WHERE
t1.IsDeleted IS Null;

Incremental Deletes to the downstream system

You can pull out deleted records (where IsDeleted is not NULL) from a table and retrieve the RECID of their most recent non-deleted version. Use the watermark column to fetch the newly deleted records.

WITH PreviousNonNullRecord AS (
SELECT
t1.Id,
t1.RECID,
MAX(t1.sysrowversion) AS LatestNonDeletedVersion
FROM
<container_name>.dbo.<table_name> t1
WHERE
t1.IsDeleted IS NULL -- Only non-deleted rows
GROUP BY
t1.Id,
t1.RECID
)
SELECT
t2.*,
p.RECID -- Fetch RECID from the most recent non-deleted record
FROM
<container_name>.dbo.<table_name> t2
LEFT JOIN
PreviousNonNullRecord p ON t2.Id = p.Id
WHERE
t2.IsDeleted IS NOT NULL -- Fetch only deleted records
AND t2.watermarkcolumnval > @lastwatermarkcolumnval -- Optional: Filter by watermark

Conclusion

Picking the right strategy for data synchronization in Azure Synapse Link really comes down to what your downstream system can handle and what your business needs. Whether you go with watermarking, Truncate and Load, or a timestamped incremental folder, it’s important to weigh the pros and cons of each. This way, you can keep your data syncs smooth and your data integrity intact during migrations.

If you found this guide helpful, don’t forget to clap 👏 and share it with your network! Have questions or experiences with D365 F&O data exports? Drop a comment below — I’d love to hear your thoughts and help clarify any confusion. Stay tuned for more in-depth blogs on Azure Synapse Link, where we’ll dive deeper into its inner workings and share key insights — especially if you’re migrating from Export to Data Lake. For the latest updates and expert tips on Microsoft’s evolving data solutions, follow me on Linkedin, Github and Medium.

Thanks for reading. Happy Learning.

--

--

Nadeem Khan(NK)
LearnWithNK

Lead Technical Architect specializing in Data Lakehouse Solutions with Azure Synapse, Python, and Azure tools. Passionate about data optimization and mentoring.