Building a Modern SaaS Connectivity Platform

Published in

SailPoint Engineering Blog

7 min readMay 9, 2023

Connectivity is one of the core capabilities of our SaaS platform. It provides us with the ability to manage and secure identities across a wide variety of sources. Over the last few years, we have invested in creating of a modern, cloud-native connectivity platform. This post covers why and how we built this platform.

First Things First: What Is Connectivity?

Our connectivity layer provides a common abstraction for accessing sources. These sources can range from an on-premise Active Directory installation or a cloud-based HR system like Workday. For each source, we maintain a connector, which is a piece of software responsible for executing a standard set of operations against the target system. For the most part, these operations revolve around identity governance, such as listing accounts or updating roles and entitlements. There are additional use cases such as accessing account activity data and pass-through authentication.

This abstraction, having connectors separate from the rest of our identity services, allows them to focus on their specific functionality without having to worry about the low-level detail of how to talk to a Lightweight Directory Access Protocol (LDAP) server or the authentication mechanism used by a specific REST API. Like most successful systems, our connectivity implementation has grown organically over time to meet the needs of the day. We’ve since recognized there was an opportunity to develop a connectivity platform better suited to the cloud-native and SaaS-focused world of today. In particular we wanted a solution which:

allows for rapid development of new connectors to keep pace with the explosion of SaaS services our customers are using
allows connectors, which are primarily developed in-house today, to be developed by a number of different parties, such as SailPoint customers and partners
provides a secure multi-tenant cloud-based connector runtime to reduce the burden of operating connectors on our customers
supports ingestion of real-time updates to reduce reliance on expensive bulk updates and allows our SaaS platform to react quicker to changes in user’s identities
provides a better debugging/troubleshooting experience for connector authors and customers

The new platform we wanted to build needed to meet these goals while also being: secure, reliable, and performant.

Connectivity at SailPoint before SaaS Connectivity

Today, connectors are written in Java and run as part of a virtual appliance (VA) deployed in a customer’s environment. Connectors are primarily written in-house by a specialized team of engineers. To implement a new connector, an engineer must write a Java class which implements a common interface. The primary task of a connector is to take some input and use the API provided by the target system to retrieve or modify data in the target system.

To execute an operation on the target system, a service communicates with the VA, the VA talks to the target system, and the VA finally returns the responses back up to service. Some form of on-premise connectivity is a necessity for most customers, who often need to manage accounts which reside in an internal system and would not consider exposing their services to the internet. Unfortunately, this adds additional latency and complexity for SaaS-based sources. Moreover, troubleshooting the operation of connectors in a customer’s environment is often challenging and lacks visibility.

Finally, it isn’t generally viable to have the VA in a customer’s environment receive real-time updates about changes in target systems via a push-base mechanism, like Webhooks. This would require having a publicly accessible endpoint on the VA which is a non-starter for almost all customers.

Enter Stage Left: SaaS Connectivity

When designing the new SaaS Connectivity platform, there were a few things we needed to figure out: How do our services communicate with connectors? How are connectors written and tested? How can we operate connectors in a secure, performant manner, and in a multi-tenant environment?

For the common abstraction layer, we chose to design a simple JSON-RPC style protocol with a set of standard commands which connectors may implement. The protocol supports simple request-response commands as well as streaming commands. Both are important for aggregations which often require retrieving a large amount of data from the target system. This protocol is independent of any transport layer, so commands and responses can be transmitted over HTTP or passed via messaging systems, such as AWS SQS. This protocol is also independent of any programming language.

Here’s an example of a command and its output: Not much to it!

How Do Our Internal Services Use This Platform?

The connectivity platform is used by multiple internal services with a variety of use cases and access patterns. These access patterns range from: bulk ingestion of data, to latency sensitive commands such as, pass through authentication. As a result, we support a number of different response mechanisms: Kafka, SQS, and a synchronous HTTP API.

Kafka allows connector output for streaming commands to be durably persisted, this way they can be consumed by one or more services. The synchronous API allows applications the ability to execute and receive responses in real-time. These response mechanisms are extensible and allow us to integrate with other services in exciting ways. For example, we’ve recently added the ability for command response to directly route to our temporal based workflow engine.

So, How Are Connectors Built?

While the connectivity platform doesn’t require connectors to be written in a specific language, we chose Typescript on Node.js as the initial language we wanted to target for implementing connectors. We made this decision for a few reasons:

TypeScript (and JavaScript) are broadly used and accessible languages with a mature runtime
SaaS services often provide a Typescript or Javascript API Client which makes it easier the target systems API.
The Node.js runtime and ecosystem are a natural fit for the networking-focused functionality that connectors require

To ease the development process, we’ve developed an opensource Typescript SDK for authoring connectors and an opensource CLI tool for deploying and testing connectors on our runtime. The Typescript SDK allows developers to build a new connector by implementing a predefined set of command handlers.

An example implementation of a predefined command in our SaaS Connectivity SDK

How Is Connector Behavior Verified?

In some ways, the operations implemented by a connector are relatively straightforward: retrieve accounts, modify accounts, add or remove entitlements, etc. However, easier access to the code behind all these operations means a greater chance for bugs. For example, data can be dropped or malformed, update operations can be misapplied or only partially applied. The APIs provided by third-party, external services can be buggy or unreliable when used in certain ways.

A test suite written by a connector developer can ensure that the connector operates as the developer expects. This doesn’t necessarily ensure, though, that the connectors correctly implement the contract required by the protocol or that it handles all of the edge cases correctly. To help mitigate these issues, we’ve developed a validation suite as well. This suite is able to verify that connector operations are implemented correctly and consistently. Since all connectors are expected to implement the same set of operations, we’re also able to do this without requiring any specification information about the connector aside from credentials to communicate with the target system.

For each command supported by the protocol, we define a series of checks to ensure that the command operates successfully. For example, we verify that, if an account is updated by a command, that a subsequent read of the account reflects those changes. This suite also allows for a form of regression testing against a target systems API. Target systems can change the behavior of their API (either intentionally or unintentionally) and break a connector. Sometimes, this manifests as command failures which are noisy and easy to diagnose. Other times, API changes can subtly break a connector in a way that is not obvious during normal operation but can be caught by the validation suite.

How Are Connectors Run?

As mentioned above, one of the primary difficulties we face with the existing connectivity solution is the requirement that connectors run in a virtual appliance managed by customers. At the heart of the new connectivity platform is the new connector runtime. The connector runtime is a service that operates connectors in our infrastructure and is also responsible for storing connector code, managing versions, etc.

We intend to write more about the connector runtime, but at a high level, we use Firecracker to dynamically provision a lightweight virtual machine (MicroVM) for each connector process. Each MicroVM is unique to a specific tenant and connector. Thus, a misbehaving connector can’t interfere with another customer’s connector or another connector for the same customer. The low overhead of Firecracker allows us to attain this fine-grained level of isolation. Additionally, all network traffic from a connector flows through a user-space network proxy, built using the netstack library from Google’s gVisor project. In the next section, we’ll learn how this opens up room for interesting enhancements around security and observability.

Future Enhancements for SaaS Connectivity

Now that we have the base functionality in place there are some enhancements we want to make in the future. Up next, we want to support real-time ingestion of changes from source via webhooks. This will help us reduce the time between a change occurring in a target service and those changes appearing in our SaaS platform.

Additionally, there are also some runtime-specific changes we want to make. First, implementation of trusted host filtering in our network proxy. This will ensure that a connector can only communicate with a set of trusted hosts. Helping us to prevent connectors from accessing resources on the internet, intentionally or otherwise, which aren’t required as part of their normal operation. For example, a GitHub connector should only talk to api.github.com. Along this same lines, there are also opportunities to leverage information from our own network proxy. This information would enable us with detailed telemetry that we can provide to developers and customers regarding a connector’s operational data. (e.g., How many connections is it opening to external systems? How much network traffic is it generating?)

Try it out!

If you’re a SailPoint customer or partner, you can head on over to the SailPoint Developer Community to start working with the new connectivity platform today!