The Best External Databases for Salesforce Connect

A man wearing a button-down bird shirt writes the word ‘database’ on a mirror with a black marker.

As data grows many enterprises will need to think differently about how to architect their CRM for large data volumes. With an external database, Salesforce Connect is a great way to integrate outside systems or offload non-core or historical data. Salesforce Connect virtualizes data in external objects which solves the problems of storage or replicating large volumes of data via API.

With the big cloud vendors you can spin all manner of databases easily and at a relatively low cost, but what database should you choose for Salesforce Connect?

To answer this, let’s look at the core capabilities of cloud database vendors and offer some recommendations that we use and support.

The marketing lingo has evolved a lot in the last five years. Cloud databases, cloud data warehouses, data lakes, and the modern data stack all refer to the same trend but with nuances in architecture and capabilities. If you’re a Salesforce expert, this blog will help inform you of the leaders in the cloud data warehouse market.

Cloud Database Capabilities to Consider

Overall, for Salesforce Connect you need to have speed, scalability, and connectivity. While Salesforce Connect was originally used for OData integrations, connecting live to your legacy application database is generally a bad idea. Not least because of how this workload will disrupt its performance. The significant gains will come from centralizing enterprise data and creating better data processing steps.

Storage Costs

Salesforce storage costs are very high compared to today’s public cloud options, however, the premium reflects the proximity of storage to the core CRM. Storing related or external data in the CRM is very convenient and therefore costly, whereas using blob storage is super cheap but requires extra services to process it.

The new era of cloud database vendors has wrapped up these extra services and called them modern cloud data warehouses. As a result of this architecture, cloud vendors like Snowflake sell storage at the same price as the public cloud vendors. Snowflake charge $23/month for 1TB of storage, the same as an AWS S3 bucket.

Moving large volumes of data between enterprise systems is challenging. A cloud data warehouse allows you to take advantage of ultra-low-cost storage, and when paired with Salesforce Connect, makes access as convenient as on-platform data. It’s the best of both worlds.

ETL to ELT

The leading cloud data warehouses are architected with a complete separation of compute and storage so huge volumes of data can sit in storage without affecting compute operations. This has led to a new paradigm of data integration, ELT.

Bonus: Read more about ETL vs ELT

In summary, instead of limiting the amount of data extracted from source systems, new tools simply replicate and land raw data in the data warehouse. Tools like Fivetran, Stitch, and Airbyte make it incredibly simple and automated to replicate Salesforce data to a cloud data warehouse. Once this has happened, consolidation and modeling of data can be performed within the data warehouse itself using popular frameworks like debt.

Fast Query Response

Salesforce Connect will call out to the external database and return results to the external object. For something like a related list or report, this happens while the page loads. This means your database needs to be able to process the query at this time to ensure users are not waiting for a lagging component or report.

There are many databases available in the cloud, only a few vendors have designed new query engines from scratch for the cloud. For relational databases like Snowflake and BigQuery, the types of queries run are fairly easy to process and so perform well, but large tables may benefit from some indexing work. On the other hand, Rockset is a new database that indexes all columns by default and is optimized for millisecond query response. Given the expected Salesforce page load times, you don’t need to optimize for absolute speed, but it must be fast enough for a good user experience.

The best overall performing external database also needs to be scalable for operational use.

Scalable Compute

Modern cloud data warehouses have largely solved the scalability problem. When using a data warehouse as external storage, you need a technology that can scale to the workload created by CRM users and processes that run off it. With up to thousands of CRM users, scaling for concurrency is the most important factor.

Snowflake allows you to have your Salesforce Connect queries hit a dedicated compute resource. If you have many concurrent CRM users, Snowflake also has the option of auto-scaling horizontally. Say you had surges of queries, like a sales team doing their forecasts at the same time. Snowflake would automatically add compute resources up to the limits you set and then automatically switch them off as demand subsides. Snowflake charges you by the compute time and allows you to micro-manage that time.

On the other hand, BigQuery works in a more automated way. It will auto-scale vertically and horizontally as needed, but the data warehouse admins have less control over exactly how. BigQuery charges you per byte processed in a query and allows you to set limits on what queries will run.

In either case, these platforms allow you to manage costs, share workloads, and optimize performance for your Salesforce Connect use case.

Truly Cloud-Based

Before the latest generation of cloud data warehouses, ‘cloud databases’ were simply on-premise databases hosted in a public cloud. This means they live inside a cloud VPC and don’t have an HTTP API endpoint.

Platforms like Snowflake, BigQuery, and Rockset are all ‘true cloud services. All customers share the same service, which means they can be queried from a public endpoint. Apps like Omnata can query them directly from another cloud without middleware infrastructure. Middleware-free integration removes the complexity of a third party processing data and significantly reduces the cost of tools, people, and cyber-security overheads.

What Have We Left Out?

Heroku Postgres and Salesforce have a native integration, called Heroku Connect, that uses OData with external objects. Heroku Postgres is great for developers and recommended when you have an app that bridges both Heroku and Salesforce. However, it’s not as well-suited to being a central data repository. As a transactional database, it requires a lot of close attention to indexing compared to a modern, columnar, cloud data platform. Heavy workloads, large datasets, and many concurrent users could create challenges.

AWS Redshift became popular in the first iteration of cloud data warehouses. It is somewhat less known Redshift was licensed from Actian and is built on top of Postgres. For our purposes, Redshift isn’t truly cloud-based as it runs inside your AWS VPC. While initially groundbreaking in speed and scalability, more recently Snowflake and BigQuery have upped the standard for large datasets and concurrent users.

Databricks is another heavy hitter in the modern data platform space that is built on Apache Spark. Originally optimized for data processing and machine learning, they have made moves towards more general SQL-like capabilities with Databricks SQL. However, like AWS Redshift, Databricks is deployed inside a customer’s cloud VPC and so its API endpoint can only be accessed from within the instance. For our purposes, Databricks don’t offer a ‘true cloud service’ that can be queried directly from another cloud, but we’re hopeful that this will be on the roadmap.

What do we recommend? Omnata Uses and Supports Snowflake, BigQuery, and Rockset

Omnata’s native Salesforce app lets you integrate enterprise data from the leading cloud data platforms. To do this, you need database technology that can scale to handle large datasets and large numbers of concurrent users. You also need to reduce the costs of middleware and specialized integrators. This is achieved by removing the middleware layer by choosing a true cloud platform that can be queried directly.

Overall, many organizations are already centralizing enterprise data on platforms like Snowflake and BigQuery and these are your best bet to use with Salesforce Connect. However, if you have a narrower scope of real-time use cases then Rockset is a compelling option.

Does Your Company Already Use One of These Technologies?

These new technologies have made it much easier to bring on new teams, datasets, and use cases. You should spark up a conversation with your data team today, or talk to Omnata about how you can get value from these platforms.

--

--