Introducing Source Tracking in OrgFlow 2.0

Published in

OrgFlow

11 min readMar 1, 2023

In our recently released version 2.0 of OrgFlow, the biggest and most impactful new feature by far is the added support for source tracking, a feature that has been on our radar for quite a long time.

Ever since we started building OrgFlow, we’ve been aware of the huge potential performance improvements that source tracking could bring to OrgFlow, and how neatly it would fit into the product’s architecture and workflow. Limited coverage in terms of metadata support, as well as lack of support for sandboxes, have forced us to hold off. But those shortcomings have been addressed over time to the extent that we are now able to work around the remaining limitations.

And so having bubbled its way up our backlog for a while, this feature recently made it to the top, and we have spent the past several months developing, testing and fine-tuning it before finally including it in our 2.0 release.

Even though the addition of source tracking to OrgFlow is largely a qualitative change more than a functional one, its impact on the performance and safety of your overall Salesforce DevOps workflow is so significant that we wanted to highlight it with its own article.

What is source tracking?

Source tracking refers to the ability of Salesforce sandboxes and scratch orgs to maintain a detailed, machine-consumable record of metadata changes over time, at the individual metadata component level. Conceptually, source tracking is similar to the age-old setup audit trail, but source tracking is more reliable, more granular and more consistently structured to make it better suited for programmatic consumption and processing.

Initially introduced to support the metadata pull/push operations of Salesforce DX (which has since been expanded to the more general Salesforce CLI) to incrementally keep a scratch org in sync with a local directory on a developer’s workstation, source tracking has in fact always been available to third-party tools as well.

Originally only available for scratch orgs, starting with the Summer ’20 release, source tracking was made available for Developer and Developer Pro sandboxes, too. Over time, the number of metadata types supported by source tracking has steadily increased from a limited subset, to nearly full coverage (with a little big of lag). It is now considered by Salesforce to be mature enough that the newly released Salesforce DevOps Center depends on source tracking to drive its change detection.

From a technical standpoint, source tracking consists of a metadata “change tracking ledger” of sorts, in the form of the SourceMember object and its records. Each record of this object represents a single metadata change, and contains information about which metadata component was changed, the type of change (add/modify/delete), which user made the change, along with a revision number which continuously increases with each metadata change made in the org.

A diagram showing different changes made to a page layout over time along a Git commit graph. — Source tracking keeps a record of changes to Salesforce metadata over time.

Every time a metadata component is added, modified or deleted, a new SourceMember record is added, with a higher revision number than all previous changes. (As is customary with Salesforce software, there are several exceptions and limitations; more on those later.)

External tools can query these SourceMember records (through the Tooling API) in order to understand the full extent of metadata changes between any given two revision numbers. This allows tools (such as OrgFlow) to make better decisions about metadata retrieval and deployment. For example, a tool might store/cache the highest seen revision number when retrieving metadata, and subsequently query the SourceMember object to determine which metadata changed since the previous retrieve, and retrieve only the changed subset rather than all the metadata in the org.

How OrgFlow uses source tracking

OrgFlow utilizes source tracking data for three different purposes:

Dramatically speed up metadata retrieval
Increase accuracy of author attribution in Git history
Detect and avoid clobber during metadata deployment

Let’s look at each of these in turn.

Improved retrieve performance (partial retrieve)

The operation of syncing metadata changes from a Salesforce org to its backing Git branch is referred to in OrgFlow as an inbound flow, exposed though the env:flowin command. Prior to version 2.0, because OrgFlow had no notion of source tracking, an inbound flow would always entail a full retrieve of all relevant metadata in the org. OrgFlow would then compare the retrieved metadata with the metadata already in Git, to capture and commit the differences.

For customers with large orgs, and where a significant subset of all metadata was included in the OrgFlow stack, the full retrieve on every inbound flow could become a significant performance bottleneck, both for the individual developer’s inner loop and for automated CI/CD jobs such as schedule nightly upstream back-promotion.

Starting with version 2.0, OrgFlow now utilizes source tracking to avoid performing a full retrieve and instead perform a partial retrieve whenever possible. OrgFlow continuously stores the highest retrieved revision number for each environment in its cloud-based state store, and retrieves only those metadata components which actually changed since the previous inbound flow.

This results in a dramatic improvement of overall performance, and also makes it more feasible to perform inbound flows much more frequently — even many times per day.

If only a handful of components have changed in your sandbox since the previous inbound flow, only those components can be surgically retrieved quickly and efficiently, minimizing developers’ waiting time and resource usage. If nothing has changed at all, then no Metadata API calls are even made; the process instead exists quickly without consuming valuable build minutes in your CI/CD platform or needlessly wasting your Salesforce API limits.

Improved author attribution

Ever since our first release, OrgFlow has had a unique feature we lovingly refer to as author attribution, where OrgFlow commits metadata changes to your Git repository grouped by author.

A diagram depicting two different changes made to the same page layout by two different authors. — Author attribution reflects the original author of each change to each component in Git history.

Let’s say you have a sandbox environment connecteed to OrgFlow, and you have admins and developers making changes in this sandbox. Anna makes a change to the description of the Account object, and Christian makes a few changes to some FlexiPage components. On the next inbound flow, OrgFlow commits those changes in two separate commits (one for each author) and the commit signatures will contain the name and email address from Salesforce of each respective user, in the “authored by” part of the commit while OrgFlow itself (or whatever you configure) will be in the “committed by” part.

An web page showing the details of a commit in a Git repository, where the author is Christian Pfeil and the committer is OrgFlow. — OrgFlow’s author attribution in action; this commit is authored by Christian Pfeil and committed by OrgFlow. (Exact visual presentation depends on Git service provider and/or Git tooling used to view the commit.)

It’s important to note that neither Anna nor Christian need an account in your Git provider for this to work — commit signatures in Git are completely independent of whatever authentication layer that may or may not be in place on top of your Git repository. However, with many Git providers, if the email address in a commit signature does happen to correspond to a user account in your Git provider, additional functionality may light up, such as the display of a profile photo on the commit seen in the example from Azure Repos above.

Prior to version 2.0, OrgFlow would get the author information from the lastModifiedById and lastModifiedByName properties from the listMetadata() operation in the Metadata API. These properties are not very reliable, however — for one thing, they cannot be used to attribute deletes, because when a component has been deleted there is no longer anywhere to read these properties from. Besides that, the values have also proven somewhat unreliable in general.

Enabled by the introduction of source tracking in version 2.0, OrgFlow now instead uses the source tracking data to drive author attribution. This ultimately results in much more accurate and extensive author tracking in your Git repository. Deletes will now be fully attributed to the Salesforce user who did the actual delete. Additionally, OrgFlow itself will no longer ever be attributed as the author of a change, because OrgFlow does not create source tracking data during deployments.

Avoiding clobber on deployment

It turns out source tracking is also very useful during outbound flow (i.e. syncing changes from an environment’s Git branch out to its corresponding Salesforce org). Specifically, source tracking has allowed us to add another very powerful feature in version 2.0: the ability to detect and avoid clobber of changes made in the target org.

Because OrgFlow now stores the highest committed source tracking revision number in its cloud-based state store from the last inbound flow, it can now fetch and inspect the source tracking data from the target org before starting a new deployment, as a safety check to ensure there have not been further changes made in the org which would be overwritten (clobbered) by the deployment.

If OrgFlow determines that the deployment might potentially cause clobber in the target org (or if it is unable to determine) then it can pause, show you a summary of the changes that would or might be clobbered, and let you choose how to proceed.

The determined clobber of any given deployment evaluate as either certain clobber, potential/unknown clobber or no clobber. Examples of when clobber is potential/unknown include:

Source tracking is not enabled in the target org
The deployment contains changes to metadata whose types are not source-tracked
There have been change set deployments since last inbound flow

OrgFlow will handle each of these outcomes differently by default, depending on runtime circumstances such as whether stdin is connected to a terminal or not (i.e. whether prompting for confirmation is possible or not). In case you wish to override the default clobber behavior, whether during manual use or in CI/CD jobs, we have also added a command-line argument to the env:flowout and env:flowmerge commands:

--clobber=auto|accept|abort
    How to handle code clobber; auto = accept potential/unknown clobber,
    abort on certain clobber (default: auto)

Adding the notion of clobber to OrgFlow, along with the ability to detect and avoid it using source tracking, results in a much safer Salesforce DevOps workflow overall. For one things, you no longer have to ensure to always do an inbound flow before an outbound flow.

Automatic and transparent

My favorite thing about source tracking support in OrgFlow is that we have managed to make it completely automatic and transparent to the user. As a customer, you do not have to do anything to enable or activate this feature — on every inbound flow, OrgFlow will automatically choose between partial (source-tracked) and full retrieve, depending on runtime circumstances.

As a general rule, partial retrieve is always preferred and will be chosen whenever possible and appropriate. That said, OrgFlow will fall back to full retrieve in the following cases:

Source tracking is not supported/enabled

Source tracking for sandboxes is still an optional feature that you must enable in your production org. Once enabled, all Developer and Developer Pro sandboxes created or refreshed after that point will have source tracking. Other org types do not support source tracking.

If you have not enabled the source tracking feature, or if you are flowing in from a production org or from a Partial Copy or Full sandbox (which do not support source tracking) then OrgFlow will automatically fall back to full retrieve.

The environment is flowed in for the first time

Until at least one inbound flow has been performed for an environment, OrgFlow will not have a stored revision number based on which it can query the source tracking data in the sandbox to determine subsequent changes. OrgFlow will therefore fall back to full retrieve.

Change set deployments are detected

The reliability of source tracking data in a sandbox of course depends on source tracking records being created for every metadata change. If metadata changes occur that are not reflected in source tracking, then retrieving based on source tracking will miss those changes.

Source tracking records are guaranteed to be created in the following scenarios:

Metadata is changed through the Salesforce UI
Metadata is changed by means of a Metadata API deployment where the client explicitly passes the options needed to trigger source tracking. This includes the Salesforce CLI and any tools that use the Salesforce CLI (or the SDT JavaScript library) behind the scenes to deploy metadata. The Salesforce Extension Pack for Visual Studio Code and the Illuminated Cloud plugin for IntelliJ are two examples.

However, if metadata is changed through a change set deployment, then no source tracking records are created. The same is true for Metadata API deployments where the client is not source-tracking aware and therefore does not pass the right magic headers.

For this reason, OrgFlow regularly inspects the deployment history in each connected sandbox, and stores the most recent known deployment operation ID. If OrgFlow detects that new change set deployments have happened since the previous inbound flow, it automatically falls back to full retrieve to ensure it does not “miss” the changes made by those change set deployments.

It is unfortunately not possible to tell from deployment history which API deployments created source tracking data and which did not. Therefore, OrgFlow assumes that all API deployments are source-tracking aware. We selected this approach so that customers can seamlessly use OrgFlow and Salesforce’s developer tools side-by-side, without constantly making OrgFlow revert to full retrieve.

The trade-off is that, if you make metadata changes through third-party tools that do not update source tracking data during deployment, then you will need to take steps to ensure OrgFlow performs a full retrieve on inbound flow (e.g. using the argument described next).

Full retrieve is explicitly requested

We have also added command-line arguments to allow the user to control which retrieve mode is used during an inbound flow. For the env:flowin command the argument is:

--retrieveMode=auto|partial|full
    Retrieve method; auto = partial if supported and safe,
    otherwise full (default: auto)

(The env:flowmerge command has similar arguments to control the retrieve mode of the source and target inbound flows.)

As mentioned before, auto is the default and employs the graceful fallback rules outlined in the above sections. The user can specify either partial or full to force the use of one method or the other. Specifying partial when source tracking is not supported will result in a preemptive failure.

Specifying full will cause OrgFlow to always fall back to performing a full retrieve for the inbound flow, regardless of whether the other conditions are met or not.

Non-source-tracked metadata types

My second favorite thing about our source tracking support is that it is incremental; that is to say, OrgFlow has the ability to perform partial retrieve for metadata types that support source tracking, while performing full retrieve for those that do not —both in the same inbound flow!

To see which types support source tracking for a given Salesforce version, the Metadata Coverage Report is a really great tool. If you sort the list of types on the “source tracking” column, you can easily see which types support the feature and which do not.

It is up to you to choose which metadata types to include in your flow (using the .orgflowinclude file at the root of your Git repository). If you have chosen to include types that are known to not support source tracking, then OrgFlow will automatically split the retrieval up into two distinct subsets:

Partial retrieve (using source tracking) for the supported types
Full retrieve (the “classic” approach using listMetadata() and retrieve()) for any non-supported types

The two subsets will be retrieved concurrently, but still, in practice the full retrieve subset will most likely constitute a performance bottleneck, because by the time the partial retrieve has completed, Salesforce will still be working on putting those full-retrieve batches together for download (with the possible exception of edge cases where the partial retrieve yields a lot of changes and may therefore take longer).

For very performance-sensitive customer scenarios, we therefore recommend that you either exclude metadata that does not yet support source tracking, or consider flowing such metadata in a secondary stack whose processing times won’t impact your primary workflow.

In any case, source tracking support typically only lags behind for newly introduced metadata types for 1–2 releases, so even if OrgFlow must go into full retrieve mode for some of your metadata today, it might not do so tomorrow.

That was a behind-the-scenes look at the new source tracking feature introduced in OrgFlow 2.0. We hope you found it interesting, and we’re very excited for you to start using it to make their Salesforce DevOps workflows faster and safer! As usual, if you have any questions or feedback, don’t hesitate to get in touch — you’ll find all the ways to contact us on our website.