5 Things Architects Should Know About Data Cloud

Published in

Salesforce Architects

4 min readJul 13, 2023

Data Cloud Technical Capability Map Diagram — Data Cloud Technical Capability Map

Salesforce Data Cloud helps you connect and harmonize large volumes of data to create customized experiences for your customers at scale. And if you’re a Salesforce architect, you need to know how Data Cloud works. As companies race to adopt generative AI, the importance of data is more tangible, and more urgent, than ever before. In this blog I’ll discuss five key concepts that architects need to know about Data Cloud now. Let’s dive in.

1. Data Cloud data is stored in a different type of database.

Data Cloud can handle petabyte scale data because the storage layer is not the same relational database that we are used to working with. Data Cloud stores all it’s data in a data lakehouse. Under the hood this data is stored in a Parquet file format in S3 buckets. Apache Parquet leverages a columnar file storage format which, unlike the row-oriented CSV data format, is designed for large sets of complex data. On top of the columnar storage we leverage Apache Iceberg, which is an abstraction layer between the physical data files and how they are organized to form a table. Iceberg supports supports data processing frameworks like Apache Spark, Apache Presto and ultra-performant query services like Amazon Athena and Amazon EMR. All of this technology together is what supports the management of record-level updates and SQL queries in the Data Cloud data lakehouse. The diagram above details the additional technology that powers Data Cloud.

2. Data Cloud uses a new set of objects.

Diagram showing the Data Cloud objects that support data transformation. — The Data Cloud objects that support data transformation.

You might have heard that Data Cloud supports structured and unstructured data. This is true. But the power of Data Cloud comes from enforcing structure on this data by transforming it. To support the the transformation process there are three new types of objects you should become familiar with.

Data Source Object (DSO) — The original data source in the original file format.
Data Lake Object (DLO) — The data after it’s been transformed and stored in the data lake in the Parquet format.
Data Model Object (DMO) — The data after it’s been mapped to the Salesforce metadata structure.

Data Model Objects are what are represented on the Customer 360 Data Model. Want to learn more about Data Model objects? This a good place to get started: Customer 360 Data Model for Data Cloud.

3. Data cloud leverages industry standard formats.

Remember how we spoke about Parquet and Apache Iceberg? These are industry standard formats. We leveraged Parquet because it is an open-source format that is supported by other Cloud providers like Snowflake. But we also contributed back to the open source community. Traditional data lakehouses are built for batch processes, but we have added capabilities that support both batch and streaming events at scale. Iceberg is also a community-driven open-source format.
Because we are using these industry standard formats we are able to architect things like the live query functionality which will allow other data lakes, like Snowflake, to query the data in DMOs without actually moving or copying the data.

4. You can interact with Data Cloud data using Platform capabilities.

If we’re leveraging all this open-source technology, why choose Salesforce over rolling your own solution? For one, since we are mapping all this data to our metadata structure and copying that metadata structure back to the lakehouse, we are able to act on this hyperscale data with the capabilities we know and love from the Salesforce Platform. This data, modeled as Data Model Objects (DMOs) is what supports all the functionality like identity resolution, segmentation, activation and, of course, Einstein services. There are also out-of-the-box features (like the Customer Data Profile) that allow you to visualize your Data Cloud data right from the core Salesforce Platform, without building a custom UI.

Diagram showing the conceptual model of where Data Cloud data sits within the larger Customer 360. — Conceptual model of where Data Cloud data sits within the larger Customer 360.

5. You can choose to send Data Cloud Data back to the transactional database.

Information from Data Cloud does not automatically go back to the source system (for example, Sales or Service Cloud), but there are certain components that you can leverage to see data cloud data alongside the transactional database data. For example you can see the segments a contact is in from a Contact record. What if you need to push data that is derived from Data Cloud back to Sales or Service Cloud or even Marketing Cloud? Leverage Data Actions which allow you to send information using Platform Events or Flow.

Conclusion

As a customer’s most trusted technical resource, architects need to truly understand how our products work so they credibly advise on the optimal solution. I hope that this blog has provided insight into Salesforce Data Cloud and that you continue to learn more about this exciting technology, especially as the need to bring together different data sources to get a complete view of your customer becomes even more critical to fuel AI capabilities. Be sure to download the Data Cloud Technical Capability Map and visit the Template Gallery on architect.salesforce.com for even more diagrams.