Salesforce Marketing Cloud — Data Extraction
What are the different ways to extract data from Salesforce Marketing Cloud and how to use them
There is normally two main ways to extract data from Salesforce Marketing Cloud in a programmatic manner. Either rely on its’ built in automation studio framework or rely on its’ SOAP API to query some of its’ datasets.
Setting up a data extract is based on two parts 1) A schedule 2) A data extract Activity
To setup the schedule, just drag and drop the “schedule” icon onto the starting source column and press configure.
Once you press configure, you will be prompted with the screen shown above, that lets you setup the starting time for the automation and the frequency at which to run it.
Data Extract Activity
Prompted with the below screen, create a new data extract
The properties need to be configured, with name, file naming pattern and extract types.
The data extract, needs to be setup as a tracking extract in order to allow for the extraction of the key entities. Additional configuration is required:
Once ready, the automation should look like the one shown above. Once this is setup, it will be necessary to create a file transfer activity as next step to transfer the file out to a FTP, based on the file pattern chosen during the data extract configuration.
Salesforce Marketing cloud allows to query the data available within its’ data extensions and some of its’ core data, through data views, using SQL queries.
This allows the extraction of core metrics, such as sends, clicks, bounce … and the merging of that information with additional data sources such as a customer classification table.
The queries can be defined within the automation studio and need to be in a syntax similar to Ms SQL Server 2005. The query will generate an output to a specific data extension, that needs to be created before the query runs. Marketing Cloud offers different options with respect to handling the data already present in the data extension: Append, Replace or Overwrite.
Once the query has run, the data will there before be stored in a data extension within Salesforce Marketing Cloud. There will still be the need to extract that specific dataset. For this a data extract and a transfer file activity will need to be set up ass additional steps within the automation.
In a similar manner to the normal data extracts, we need to configure the data extract activity, in this case however, it will be necessary to select the extract type as a “Data Extension Extract”. This will output the dataset to SFMC secure location. The file transfer activity that needs to follow will transfer the file from this specific location to your desired FTP location.
FuelSDK is the official SDK for Salesforce Marketing Cloud, with it it possible to query the different data extensions available and push or retrieve data. From my experience, pulling small amount of data from the API tends to be fine, but for large amount of information but for larger datasets, it is preferable to rely on the FTP integration.
High-level API call
At a high level, the following piece of code would pull the rows from a given
In order to setup the API call we need the following four parameters
- An Auth Stub object
- A Data Extension Name
- A search filter
While the Data Extension’s name is pretty straight forward to get the other parameters needs to explained in a little bit more details.
The auth Stub can be setup through the ET_Client object from FuelSDK.
When retrieving data the props object refers to the different columns that are intended to be retrieved. The props, in this case, is a list of string variables:
props = ["SubscriberKey", "email", "updated_date"]
Simple operators allow for comparison between the value contained in a column (Property) and some constant (Value or DateValue). They accept a range of comparison: equal, greaterThan, lessThan … The full list of SimpleOperators provided by Marketing cloud is available here.
The ComplexOperator ie: LogicalOperator, allows to combine different search filters together. Supported Logical Operator are OR and AND.
There is currently an open issue for the Python SDK, with respect to allowing the nesting of search queries using this operators.
The response object has two particular properties of interest
results properties will return the list of rows fetched within the specific query page, while the
more_results properties will return a boolean to indicate whether there are additional records to be fetched.
The following is an output from a results call, a List of DataExtensionObject, each representing a row of information:
Within the DataExtensionObject, the column names and values are including in a Properties well … object property, with nested another Property property containing a list of APIProperty. If you got lost, and are only looking to extract that information as a dictionary the following piece of code should help:
row.getMoreResults() allows in turn to get the extra pages of the results from Marketing cloud and returns
 when there is no other pages.
More from me on Hacking Analytics: