What’s LDV and Why Does it Matter for My Nonprofit or School? Part III

Published in

Salesforce Architects

9 min readAug 4, 2020

This is Part 3 of a three-part series on large data volumes (LDV), specifically in the nonprofit and higher education space.

In Part 1, we covered what we mean by LDV, and why the concept is important to understand. Part 2 described how to design an organization with LDV considerations in mind. In this post, we cover best practices for loading data into a LDV organization.

Data loading

Of course, you can expect any process involving large volumes of data to take some time to complete. Importing large volumes of data into Salesforce is no exception. Importing data into Salesforce often triggers other processes ― both system and user-created ― and these processes will slow the operation. Fortunately, by following best practices and preparing in advance you can minimize such slow downs and maximize the chances of a successful large-scale import. As Benjamin Franklin said, “By failing to prepare, you are preparing to fail.”

Best practices for large-scale data imports

Before you start a large-scale import or data migration, inform your Salesforce account team, customer success manager, or (if needed) Salesforce Customer Support. Ideally, do this during the planning stage, or at least two weeks prior to the activity, so that Salesforce teams will be ready to assist you need help.
Cleanup and preprocess the data in a staging area before migration. Prepopulate external IDs and the Salesforce GUID for lookup and primary key values before inserting. Do not query in real time to fetch these IDs.
Create a separate user profile for data migration, turn off validation rules for this profile, and then make sure the data is validated as part of preprocessing.
Prepare your data, by ordering by parent record ID to prevent locking. Consider, for example, a situation in which one account has multiple contacts. If your data is unordered, the contacts could be separated into separate batches. When running a parallel bulk import, each of the parallel processes will lock the parent account when loading a contact. If this happens concurrently, lock retries or even failures can occur. If instead you sort contacts by parent account, then all contacts belonging to the same account are highly likely to be in the same batch, minimizing potential lock issues.
Account for ownership. If you have inactive owners that may have left the organization, you may need to reactivate them to load the data or reassign the ownership to someone else.
Use a staging environment to test loads prior to loading to production to identify issues early. Make sure to account for this step in the project timeline.
Deactivate all automations (including workflow rules, processes, trigger handlers, and triggers). If you are using Table-Driven Trigger Management (TDTM) with NPSP or EDA, turn off all trigger handlers that you can while importing data. If you have triggers that aren’t part of the TDTM framework, be sure to deactivate those triggers, too.
Ensure that you plan to perform post-processing or additional data loads that may be required to complete any processing that the temporarily deactivated automations would have handled.
Use the Bulk API, which by default runs in parallel mode, allowing a large number of records to be loaded asynchronously. Using this API you can migrate 50 million records within 24 hours or 100 million over a 48-hour weekend. Most data import tools can be configured to use either the SOAP-based API or the Bulk API; when possible, select the Bulk API.
Pay attention to the record load order (see Suggested load orders for NPSP and EDA).
If possible, avoid using upsert operations, which perform entire table scans before inserting.
Defer sharing calculations.
Import over the weekend or during an approved maintenance period.
Tune the batch size appropriately to get the best throughput. Start with 1000 per batch when using the Bulk API and gradually increase the size until you encounter locking or failures.

Although it is not recommended, it is possible to load data with automations enabled. The entire operation will almost certainly take longer, and you will likely find that you will need to use smaller batch sizes.

Tools for loading data

Multiple tools are available for simplifying the task of loading data into Salesforce. It’s a good idea to familiarize yourself with the advantages and limitations of each tool so you can make an informed decision on selecting the best one for your specific situation and skill set. For more on choosing the right import tool, see the How To Import Data into Salesforce Series and the Data Management Trailhead module.

NPSP Data Loader

The NPSP Data Loader is an excellent tool for getting started with loading data into NPSP and Salesforce. Using a combination of a predefined Microsoft Excel template, the Salesforce Data Import Wizard, and the Data Import Object, this tool can significantly reduce the work required for a data load.

For an LDV organization, however, this tool may not be the best choice because during the import process it performs automations that can lead to performance issues. Also, since this tool uses the Data Import Wizard, it is only recommended for importing up to 50,000 records at a time.

If you decide to use the NPSP Data Loader, keep the following in mind:

You may need to reduce the batch size, since one import record may create up to two Contacts and up to two Accounts, as well as Affiliations, Addresses, Opportunities, Payments, Campaign/Campaign Members, and potentially GAU Allocations.
For LDV organizations, include First and Last Name in Contact Matching. Turn off Middle Name and Suffix, which prevent the compound name from being indexed.
Remember that the initial run of some scheduled jobs can be very slow. Rollup jobs, for example, need to calculate rollups for all Contacts and Accounts initially, but each subsequent run will only make updates if needed.

Data Import Wizard

The Data Import Wizard is the guided data loading tool built into Salesforce. It does have limitations that affect large data imports: it can handle only up to 50,000 records at a time and it can handle only Account, Contact, Lead, and custom objects.

Since the Data Import Wizard cannot import Opportunity objects (used to manage donations), you’ll need to use another approach if your data includes opportunities.

Data Loader

Data Loader, a client application for the bulk import or export of data, is among the most commonly used tools for large-scale data loads for nonprofits and schools. It has a number of advantages over the Data Import Wizard for LDV:

It can be run from the command line, so it can be easily scheduled
It can insert or upsert up to 5,000,000 records at a time
It can use either the standard SOAP-based API or Bulk API

Even with Data Loader, large loads can still be time consuming, as you can only load one object at a time. Also, you’ll need to load the data in the recommended order, with all the relevant key fields to build the relationships.

dataloader.io

Powered by the MuleSoft Anypoint Platform, dataloader.io is a cloud-based tool that many nonprofits and schools find easier to use than Data Loader. With dataloader.io, you can use lookups to import information from an object and its relationships with other objects in a single operation; for example, you can upload Contacts with lookups to their parent Accounts.You can also store your upload settings, so that they can be reused later.

A free version of dataloader.io is available with Salesforce, which is limited to 10,000 records per month. You can upgrade to Professional (100,000 records per month) or Enterprise (unlimited records per month) versions.

Third-party tools

Several third-party data loading tools are available on AppExchange, some of which offer free versions with limited functionality.. Like Data Loader, many use the standard SOAP-based API or Bulk API. Some third-party tools feature additional capabilities that you may find helpful, such initiating post-load processes once the data load completes.

An example data load

A typical process for data load includes the following steps:

Clean your data. Ideally data should be cleaned at the source. This includes removing duplicate records, validating data (e.g. address validation), and performing integrity checks.
Produce external IDs for each of your data entities (such as Accounts and Contacts, etc). Often, this can be the source system’s ID.
Review automation and then disable it. You must understand what the automations in your organization are doing and how they work, especially if you are using NPSP or EDA. Can the functionality that the automation provides can be initiated manually following the data load? Or will additional data load steps be required to replicate the results of the automation? In NPSP there are several scheduled jobs which you should disable during the data load and then run once the data load is complete. (Make sure your schedule accounts for the time these jobs may take to run following a large data load.)
Load the data using one of the available data load tools. Follow the recommended data load order, using your External IDs to ensure referential integrity. If you opted not to use External IDs, after each object is loaded, you will need to use the Salesforce ID for each of the loaded records to build referential integrity.
If needed, initiate any automation to generate additional records or update records.
If needed, load additional data that may have been unavailable in the source system. For example, the data needed to relate a Contact to an Opportunity using the contact opportunity role, may not exist in the source system; you would need to create and load it separately.
Re-enable all automation. You may need to retune your automation based on the amount of data you now have in the system.

Suggested load orders for NPSP and EDA

When you load records into NPSP and EDA in the order recommended, you ensure that referential integrity is maintained.

In the case of NPSP, for example, you need to load accounts and contacts into your organization before loading opportunities, since opportunities need to be related to an account and to a contact via the opportunity contact role, which you load next.

Note that for some objects (marked with an asterisk), an Apex trigger is fired by default when data is loaded or updated. Review and disable these automations as needed when importing large amounts of data.

For NPSP, we recommend loading records in the following order:

User
Account*
Contact*
Address*
Lead
Affiliation
Relationship*
Campaign
Campaign Member
General Accounting Unit
Recurring Donation*
Opportunity*
Opportunity Contact Role
Partial Soft Credit
Payment
GAU Allocation
Deliverable
Engagement Plan Template
Engagement Plan
Engagement Plan Task
Level
Activity

*Trigger fired insert or update.

For EDA, we recommend loading records in the following order:

Accounts and Contacts

School Accounts

Education Institutions
Including other colleges, high schools
Departments
Academic Programs
Sports Organizations and Clubs
Businesses

Student Accounts*

Administrative
Household

Contacts*

Faculty
Students
Family

Program Plan

Plan Requirement

Student Information

Address*

Campus Address
Home Address

Affiliations*

Primary Education Institution
Primary Department
Primary Academic Program

Relationships*

Family Members
Program Enrollment

Curriculum

Terms

Past
Current
Future

Courses

Course Offerings

Past
Current
Future

Student Progress

Course Connection (aka Course Enrollment)

Link to Course Offering
Program Enrollment
Student Contact
Program Enrollment Affiliation

Advisor Link

Advisee Case
Case Team
Community Users

*Trigger fired on insert or update.

Additional resources

We’ve covered what LDV means, how LDV can affect NPSP and EDA performance, ways to design organizations for LDV, and best practices for loading large amounts of data. There’s much to learn on LDV, and we encourage you to explore the topic further. Check out the Large Data Volumes module on Trailhead, Best Practices for Deployments with Large Data Volumes, and the following resources:

Working with limits

Data management

NPSP and EDA automation

About the Authors

Chris Rolfe is a Customer Success Architect at Salesforce.org in EMEA. As a member of the advisory team he ensures our EMEA customers in higher education and nonprofit areas are successful in their implementation of Salesforce technologies, enabling them to achieve their mission. Connect with Chris on Linkedin.

Richard Booth is a Customer Success Architect at Salesforce.org in EMEA. He helps nonprofit and higher education organizations make the best possible use of Salesforce technologies to deliver value and support their mission. Connect with Richard on LinkedIn.

Marie van Roekel is a Customer Success Architect at Salesforce.org, based in the Netherlands. As a member of the advisory team, she works with nonprofit and educational institutions to ensure they are successful in their implementations of Salesforce technologies and best practices, enabling them to better achieve their mission.