Configuration data migration in Salesforce

Published in

The Startup

6 min readMar 9, 2020

Data migration from one environment to another(or one sandbox to another) has always been a daunting task in Salesforce. The sandbox clone feature that Salesforce came up with recently leaves you much to be desired since the configuration data in Prod keeps changing as more gets added over time and sandbox refresh does not bring in all the data to the Developer sandboxes in particular.

So the burning questions still remains unanswered. How do you bring the configuration data to sandboxes(developer) easily? By configuration data I mean the custom object data and not Custom settings or Custom Metadata.

Well, you have a couple of options:

Use a 3rd party tool that can assist you with migration( Retains the relationships as well) — Incurs huge cost
Use dataloader.io to do the export & import — Needs to be done manually every time to set up the relationships based on ExternalIDs or any other unique field.
Use SFDX export/import commands — Comes with it’s own limitations which can be found in the following link — https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_composite_sobject_tree.htm

Does that mean there’s no one easy way? Do not lose hope yet as you have not run out of options. But seriously, I was hoping Salesforce itself would come up with one as this requirement seemed so trivial. Also the data commands which are on offer, have some silly limitations which render them almost unusable.

Instead of being fussy and waiting for them to come up with a better way, I thought of coming up with one of my own. Of course this is not without limitations, but I figured it might be better than a lot of other options available out there for handling a simple requirement. This rather is an extension of the data commands offered by Salesforce and not a standalone tool in itself. However, I have tried to iron out most of the limitations and a few still prevail.

To make the usage of the tool easier, I have even come up with a npm package which can be installed with the below command:

npm install sfdx-migrate — save

What does the tool do?

The tool helps you export multiple object data from a source org to a destination org with just 2 function calls. The uniqueness of the tool is in the fact that the relationships between the objects are retained regardless of how complex the relationships are.

The tool is simply an extension of the import/export data commands offered by Salesforce.

Why are the data commands not good enough?

The data commands operate on the premise of some serious limitations that it’s almost impossible to get the work done using them. Below mentioned are a few:

They expect you to specify the entire hierarchy of the data in a single SOQL(using sub queries) to be able to import the data with relationships.
They have a limit of 200 records in a single transaction during import.
They expect you to create a plan definition file which will be used during the import to set up relationships and resolve dependencies.
The migration of data with self dependency(parent-child relationship between the same object) does not work out of box.
They seem to have been built with the intent of using them manually for migration.

How does the tool work?

The tool internally uses the data commands but does some post processing on the data retrieved to make it easier to import. It also splits the huge chunk of records into batches during import to resolve the self dependency and also to work around the limitation of 200 records. All this without having to specify the relationships in a single SOQL.

Let us now get into the detailed implementation details of the tool. The tool works in 2 steps:

Export

The data to be migrated is exported using the ‘sfdx force:data:tree:export’ command and after the export is successful, some post processing is run on the records. The gist of the entire process is mentioned in the steps below:

The objects to be exported are to be specified in the form of SOQL queries in ‘queries.json’ (created in a location which is specified as a parameter to the import/export call), in the order of dependency with the parent object at the top followed by its children.
The relationships are to be specified with the __r.[API Name] notation in the SOQL.
For every entry specified in ‘queries.json’ file, the export command is run a and the results are stored in ‘sfdx-out’ folder.
Once all the object data is exported to the ‘sfdx-out’ folder in the specified directory, the tool goes through each object data file and finds out the objects this particular object is dependent on by finding out fields which end with __r and get the type of the relationship along with the field used to identify the related record.
It then goes through the dependent object data files to pull out references to the records referred in the object of interest data file which are usually in the format ‘@[ObjectAPIName]Ref[xyz]’. The API name of field used to specify the relationship is used as a key to search the records. The reference to the record is returned.
The references returned are used to replace the field values that end with __r of the records in the object of interest file. The field names are also changed to end with __c which is what the import command expects.
A dependency.json file is created to store the dependencies between objects which gets used during the import step.
The above steps are repeated for every object mentioned in the query definition file.

Import

After the Export, all the input data files in sfdx-out folder will be processed and stored in the sfdx-our-processed folder in the specified location which gives you a chance to review if all the references are resolved appropriately. The import process is then carried out in the following steps:

For every object in the query definition file, records which do not have a dependency on any object(including itself) i.e. records with no ‘@[ObjectAPIName]Ref[xyz]’ are identified.
If the number of independent records is more than 200, the subset is split into chunks of 200 and for every chunk, the below steps are performed:
Invoke the ‘sfdx force:data:tree:import’ command on a chunk of 200 records
The command returns IDs of the inserted records.
Using the dependency.json, update the references in the dependent files which handles the self dependency as well. The returned IDs are updated on the records in memory by adding a field “Id” on every record.
Repeat the process till all the independent records are inserted.
After steps 1- 6 are performed, all the self dependencies would be resolved. Now records without “Id” and no ‘@[ObjectAPIName]Ref[xyz]’ are picked and Steps 2–7 are repeated till no such record exists.
Step 1–7 are performed on every object mentioned in the queries.json file.

How to use the tool?

Detailed instructions on how to use the tool are given on the npm package details page and the GitHub repository page a well. From a high level, one just needs to follow the below steps:

Install the aforementioned package
Import it in the main file of your nodejs script.
Invoke the Export function by specifying the root location of you project
Login to the source org.
Once the data is retrieved are processed successfully, Invoke the import function by specifying the root location of you project.
Login to the destination org

Limitations

Since the tool is built on top of the sfdx data commands, few limitations still remain. Below mentioned are a couple of them

The maximum number of records that can be retrieved per object is 10000.
The tool still imports the records in batches of 200 so the import can take long if the data is huge.
The tool expects the data files to be in the same folder to be able to resolve dependencies as this happens outside the org after export.

Although there are a few limitations with the tool, it should definitely not discourage someone from using it as the limitations are quite trivial and the tool in itself is not designed to cater to huge data migrations. The package will clearly save a lot of time and effort for a developer. There might be few initial niggles with the package to start with which I am planning to fix over the course of time as and when I come across scenarios that do not work. In the meantime I would also publish the code in the repository for people to contribute and take it forward. This for sure is a good idea until Salesforce comes up with one of its own.

You can find the code to the tool on Github.