Understanding Azure Synapse Link: Initial Sync, Incremental Changes, and In-Place Updates

Nadeem Khan(NK)
LearnWithNK
Published in
4 min readSep 7, 2024

--

In the world of data management, where efficiency and accuracy are everything, especially with the massive amounts of information stored in data lakes, Azure Synapse Link shines. It’s like the bridge that effortlessly connects your operational and analytical data, making data movement smooth and giving you real-time insights.

In this blog, we’re diving deep into how Azure Synapse Link works — covering everything from its initial sync to handling incremental changes and in-place updates, all while taking advantage of the super-reliable Delta Parquet format. It’s important to understand how data flows from your D365 to Data Lake via Synapse Link to get the most out of what Synapse Link has to offer

Assumption: Spark Pool in enabled in Synapse Link Profile

Initial Sync

Azure Synapse Link writes data directly to the Data Lake in Delta Parquet format, skipping the need to create intermediary CSV files. The Delta Lake data is partitioned by year based on the ‘createdon’ column, which comes from the ‘Created Datetime’ field in D365 FnO.

After the initial sync, Azure Synapse Link uses two strategies to store the data in the data lake.

  • Append Only Mode
  • In-Place Updates

Append Only Mode: Incremental Changes

Based on the Time Interval you set when creating the Synapse Link profile, Synapse Link will add the data to the incremental folder

File format

CSV

Incremental Change Strategy

Azure Synapse Link uses Append-Only mode to store incremental changes, which works similarly to the Export to Data Lake Changefeed.

When are the first incremental changes captured?

Once the initial sync is complete, the Incremental Push Logic takes over. This feature will create a folder with a datetime stamp as soon as any table experiences a change. Then, depending on the ‘Time Interval’ you’ve set in the Synapse Link profile, Azure Synapse Link will keep appending any new changes to that same date-time-stamped folder.

For the remaining blog, I assume we have set “Time Interval” to 60 minutes.

When will a new DateTime-stamped incremental folder be created?

Synapse Link will keep using the same incremental folder until the 60-minute cycle is up. After that, Azure Synapse Link will create a new date-time-stamped folder as soon as it detects new changes.

How is the incremental folder named?

The incremental folder in Azure Synapse Link is named using a specific datetime format to reflect when the folder was created. Typically, the folder name includes a timestamp based on the UTC timezone, formatted as yyyy-MM-ddTHH:mm:ss.fffZ.

Why datetime folder is not evenly separated?

You might think that datetime folders with incremental changes would be evenly spread out since we’ve defined the ‘Time Interval’ in the Synapse Link profile to capture these changes, but that’s not always the case. A new folder only gets created when there’s something new to capture, which is why the folders aren’t evenly spaced out.

How are records stored within the datetime stamped folder?

In every, datetime folder, Synapse Link creates a subfolder for each table for which it captures the changes in that cycle (60-minute duration starting from the first change which created the folder).

Azure Synapse links set up views in the Lake Database, following a naming convention like dbo.<table_name>_partitioned, based on CSV files stored in datetime-stamped folders. However, if you have the Spark pool option turned on when creating the Synapse Link Profile, you might run into errors when querying these views because they act as a staging area for Spark jobs.

In-Place Updates

Azure Synapse Link performs ‘In-Place’ updates to make sure your tables always reflect their current state, just like they do in D365, and stores this data in Delta Parquet format.

In-Place’ updates override the existing row when a column value changes. For deleted records, all values from the source system are set to null, except for the spark-generated fields like Id, IsDelete, SinkCreatedOn, and others, which remain non-null. Additionally, the IsDelete field will be set to 1.

File format

Delta Parquet

How are “In-Place” updates performed?

Azure Synapse Link runs ‘Delta Lake Conversion’ jobs every 60 minutes, based on the settings in your Synapse Link profile. These jobs take care of ‘In-Place’ updates and create Delta Parquet files. Before that, Synapse Link uses intermediary CSVs with date-time-stamped incremental changes to put together those Delta Parquet files. Once the Delta Parquet files are ready, Synapse Link deletes the incremental changes

Why is Delta Lake Conversion spark job runs not evenly separated?

You might expect that Delta Lake Conversion jobs would run every 60 minutes since that’s what we set in the Synapse Link profile. But here’s the catch: these spark jobs only kick in if there’s an incremental change to capture in the Delta Parquet. Because of that, the jobs don’t always run on a perfectly regular schedule

Azure Synapse links set up external tables in the Lake Database, following a naming convention like dbo.<table_name>, based on delta parquet files stored in deltalake folder.

If you found this guide helpful, don’t forget to clap 👏 and share it with your network! Have questions or experiences with D365 F&O data exports? Drop a comment below — I’d love to hear your thoughts and help clarify any confusion. Stay tuned for more in-depth blogs on Azure Synapse Link, where we’ll dive deeper into its inner workings and share key insights — especially if you’re migrating from Export to Data Lake. For the latest updates and expert tips on Microsoft’s evolving data solutions, follow me on Linkedin, Github and Medium.

Thanks for reading. Happy Learning.

--

--

Nadeem Khan(NK)
LearnWithNK

Lead Technical Architect specializing in Data Lakehouse Solutions with Azure Synapse, Python, and Azure tools. Passionate about data optimization and mentoring.