Data Model Design Considerations When Building Salesforce Applications — Part 1

Published in

Salesforce Architects

8 min readOct 5, 2020

image of a succulent representing intelligent data design

The Salesforce Platform makes it easy to create a data model for your application using simple clicks. Still, as an architect, you should take time to carefully design your data model so that it aligns with current requirements and the application’s expected future state. A good data model design makes it easier to scale your applications, maintain them, and take advantage of Salesforce features, including application security features.

The impact of any poor design decisions that you may make when designing the data model are often not immediately noticeable. Rather, the weakness of a data model is exposed as the application scales due to an increase in:

Data volume
The number of processes operating on objects
The number of users
The number of system integrations and interfaces

A good data model design makes it easier to continuously refactor your application as requirements are added and extended.

As an architect, it’s important to clearly enumerate all the assumptions you are making when you design your data model. Having a document with all your assumptions spelled out validates and deepens your understanding of current requirements, future desired application state, and current limitations. The document also acts as a useful reference for future design decisions.

This blog series covers design considerations for creating a data model for applications on the Salesforce Platform.

This post covers data model design best practices that apply to any Salesforce application including managed packages. In Part 2 we explore data model design considerations with respect to a managed package.

Understanding objects provided out-of-the-box

Out-of-the-box, Salesforce provides an extensive set of standard objects in sales, service, marketing, pricing, quoting, billing, field service, commerce, and many more areas. Among these objects, designed for verticals spanning healthcare, financial services, consumer goods, and manufacturing among others, there’s a good chance that what you need is already present.

Take some time to carefully review these objects, their fields, and relationships between them. Exploring the entity relationship diagrams (ERDs) for standard Salesforce objects is a great place to start. Understand the personas and the business processes the Salesforce data model supports. If you don’t know what Salesforce provides out-of-the-box, you’ll end missing out on important features and capabilities.

For example, imagine you were unaware of the Case object and as a result you created your own custom case object for handling customer issues. With your custom case object you wouldn’t be able to take advantage of Case assignment rules, escalation rules, Web-to-Case, case teams, and built-in security features that come with the standard Case object.

As you review the available standard Salesforce objects, document any license dependencies. Also remember that the Salesforce Platform is always being improved. Using standard objects means you are likely to benefit from this continued innovation, whereas with custom objects you may not. If you do decide to build custom objects, I encourage to take a look at the cloud information model, an application-agnostic data model that can simplify integration.

Data volume considerations

As you are defining your data model, forecast the volume of data each object will likely aggregate in the next few years. Write down the growth rate assumptions you use for the calculation. Pay special attention to objects with large data volumes (e.g. objects that hold tens of millions of data rows).

Large data volumes can lead to sluggish performance, including slower queries, search and list views, and sandbox refreshes. You can avoid these slow-downs by designing your data model to account for large data volumes from the start.

Here are some tips to keep in mind when working with large data volumes:

Be sure to test your reports, list views, and custom code by loading large volumes of data in full-copy sandboxes.
If query performance is slow, see if indexing the fields can help.
Explore Lightning Platform query optimizer to improve the performance of your SOQL queries.
Consider using big objects if you need to store large volumes of data purely for security and audit purposes.

For more on working with large data volumes, read the three-part series on LDV, check out the Trailhead module on LDV, or review the best practices documentation.

If you can store the data outside of your Salesforce organization, consider using Salesforce Connect, a framework that enables you to view, search, and modify data in external data sources. This can be effective when you need small amounts of data at any one time or you want real-time access to the latest data, even if it’s not stored in your organization. Salesforce Connect provides seamless integration with the Lightning Platform. External objects are available to Salesforce tools, such as global search, lookup relationships, record feeds, and the Salesforce mobile app. External objects are also available to Apex, SOSL, SOQL queries, Salesforce APIs, and deployment via the Metadata API, change sets, and packages.

Normalization vs. denormalization

Normalization is a technique that involves organizing data into multiple tables to reduce data redundancy. Data redundancy (that is, repeating similar data in a single object) leads to additional consumption of data storage capacity.

With denormalization, you design your data model using as few objects as possible so that you can read in data with simple queries and fewer joins.

Your decision to choose between normalization and denormalization will depend on your user experience requirements, analytics needs using standard Salesforce reporting, data security, and data volumes. Also, note that Salesforce stores field attributes and metadata in various tables, and what you see as an object is a virtual table. For SOQL statements, the Salesforce Platform generates optimized SQL that involves complex joins between metadata data tables and index tables. Hence, if you are creating unnecessary tables (over-normalization), this adds additional overhead.

Each technique has its advantages and disadvantages, and as an architect you’ll need to analyze the trade-offs and choose the approach that best suits your needs.

Let’s say your users need to enter multiple phone numbers or addresses into a record, and you know that there won’t be more than five different addresses or phone numbers. Here, it is better to avoid normalization in favor of denormalization, and do not create individual objects for address and phone numbers. Instead, store them as fields on the objects where users will be entering the data. This way, when you need to report on the data, fewer joins are required and read operations are faster.

In contrast, let’s say you have a requirement to store the number of cases per quarter each rep should target. Here, it is better to create an object that has a lookup to the User record for the rep. Using a text field to keep the user name or email instead of lookup would mean breaking referential data integrity. You can model the field attribute that shows the fiscal quarter as a simple picklist value since the maximum number of picklist entries, if the application will be used for a decade, will be 40 (four quarters per year for 10 years). An example normalized data model for this use case is shown below.

Illustration of how to use multiple objects and relationships in data model design to maintain referential data integrity

Choose the right data types for your object fields

Salesforce supports many different types of fields. It is essential to understand the limitations of each of these types from the perspective of reporting, encryption, and field conversion from one data type to another.

If you need to encrypt data, be sure to review which standard fields and custom fields can be encrypted.

When choosing data types, also consider reporting requirements. For example, you likely want to avoid using the picklist (multi-select) data type because it has limited reporting capabilities for filtering and grouping.

Lastly, carefully consider which fields you designate as external IDs with the External ID attribute. With external IDs, you can upsert data using an external system’s identifiers. Salesforce allows up to 25 external IDs on an object; use this limit wisely.

Choose the right relationships between objects

Master-detail relationships offer two primary benefits:

They provide you with out-of-the-box rollup summary formula fields that can count and aggregate detail (child) records.
They provide tighter links between objects, where deletion of a master (parent) automatically handles deletion of child records.

However, you should understand the sharing and data skew implications of master-detail relationships before deciding to use them.

If a record on the master side of a master-detail relationship has too many detail records (e.g. thousands) you are likely to encounter UNABLE_TO_LOCK_ROW errors since every time you edit a detail record, the master record is locked. The more detail records you have, the more likely that these will be edited by users, causing the master record to be locked.

Also note that with master-detail relationships, child record security is governed by the parent records, so you cannot have a different security mechanism for child records.

Lookup relationships, in contrast, give you the flexibility to use platform sharing capabilities on child records as needed. Lookup relationships also give you the option to choose between required lookup and optional lookup. This configuration can be set by selecting “Clear the value of this field” as the option for what to do if the lookup record is deleted. With optional lookup, Salesforce doesn’t lock the lookup records. Instead, it only validates that the lookup values exist and avoids additional locks.

Designing a data model for future-state analytics

One vital dimension that is often overlooked in data model design is analytics. Here, I am referring not just to the KPIs users will need to report on, but how easy it will be to self-service new KPIs in the future.

As an object’s records grow, you will see that the native reports and dashboards start to slow down. Here are some strategies that can help mitigate this challenge:

Use appropriate filters on your reports and dashboards.
Plan for data archiving. Usually, real-time reports and dashboards are required on the latest data. Hence having an archival data strategy helps to keep your objects light for native Salesforce reporting. You can archive the historical data to Salesforce big objects or a data warehouse depending on your data compliance needs.d
Determine if indexing fields can help increase query performance.
Determine if skinny tables can help. Using skinny tables (with assistance from Salesforce Customer Support) can help you denormalize data, but note that skinny tables can lead to additional overhead if not used appropriately.

Suppose you need to simplify data for non-real-time reporting purposes. In this case, one common technique is to create an ETL job to aggregate data from multiple tables and store it in a single object used solely for reporting.

What’s next?

This blog post explored common but vital data model design considerations for building Salesforce applications. Part 2 of this series covers data model design considerations for managed package applications.

About the author

Mohith Shrivastava works as a Developer Advocate at Salesforce. He currently holds Salesforce System and Application architect certifications. Mohith has a decade of experience architecting and building enterprise-scale products on Salesforce Customer 360 and the Salesforce Platform. You can follow him on his twitter handle @msrivastav13.