Towards Fully Autonomous Secure Deployments

Hugo Haas
Salesforce Engineering
4 min readJun 15, 2017
Robot Driven Development — @Alex Knight

Salesforce’s business has grown rapidly over the past 18 years. We steadily provide innovation with 3 major releases of our CRM suite every year, and roll out updates to our hosted software daily.

This great business success and continued innovation poses a lot of interesting challenges from an infrastructure perspective: the number of instances that Salesforce runs for its customers is exponentially increasing; the complexity of the implementation of the software that we’re delivering is ever deepening; and the number of engineers contributing to the code base and releasing changes is also growing.

The diagram above shows the growth of number of instances and transactions over time.

Deploying Salesforce software has a number of constraints, including two stemming from Salesforce’s #1 value: Trust. First, access to production environments is very restricted and regimented. Second, very precise control over the rollout of our software is required because, for many Salesforce customers, Salesforce is core to running their business and they expect a very predictable release schedule that they align with on their end.

Over the years we built tools and processes to solve this problem. A team of release engineers works with release managers to roll out the software with our tooling. This setup has a few downsides and our current scale is making us revisit it:

  • The complexity and work those teams do grows with the number and complexity of the software deployments that we are doing, which is on an aggressively upward trend as discussed above.
  • This process means that the software engineers developing the software are 1-step removed from their software rolling out and running in production. Salesforce is big on service ownership and this distance slows down detection and fixes of bugs.

Thus, to keep up with our growth and continue to deliver to our customers with high quality and trust, we are now beginning a major shift in how we do deployments, moving towards a fully declarative model:

Software engineers developing our software specify in a manifest, alongside their code, its deployment atop our infrastructure. This is done by specifying a list of packaged artifacts that ought to be installed on target allocated capacity. They also specify a rollout strategy suitable to their software, for example enabling them to deploy a new version while maintaining a quorum in a distributed system, along with a programmatic way to assess health of the deployed software.

The deployment system takes this as input and works to get production data centers to reflect the service owners’ desired state. This approach has the following benefits:

  • Deployments controlled directly by service owners: The rollout strategy enables service owners to express their constraints — including rollout steps (e.g., canary, 10%, …), schedule constraints— and have the system formulate and execute a conforming plan. Each service owner is directly in control of the rollout of their software in production, as opposed to it being the job of a separate team.
  • Faster deployments with increased confidence: The deployment system uses those rollout steps and health signals to safely proceed through the rollout, or rollback and alert the service owner. This means that we can catch potential problems sooner while rolling out with the combined parallelism and safety that suits each service and change.
  • Increased trust: By fully automating the roll out we are further reducing the need for human access to production environments thereby reducing the risk of human error and improving security overall.

This work fits in a major modernization of Salesforce’s infrastructure and solving this problem at the scale of Salesforce is truly exciting. We’re expecting major gains in productivity, improvement in the quality of our software delivery, and continuing to serve our customer with even more rapid innovation with the delivery of declarative, health-mediated deployments.

We will continue to give updates on this project as we make progress and go into implementation details. In the meantime, if you’re interested in tackling the challenge of building a fully autonomous secure deployment system at scale, we’re actively hiring for this team.

--

--