Under the hood: Teletraan deploy system

Baogang Song | Pinterest engineering lead, Infrastructure

Among the things a developer worries about most, deploy is near — or at — the top of the list. A deployment is often the first time a new code change runs in the production environment. A dependable and straightforward deploy tool is a crucial part of any developer’s arsenal.

A deploy system should support the following functionalities:

  • Rollback. This is the most important feature of any deploy tool. Having a time machine to go back to a certain previous state is priceless.
  • Hotfix. There are times when rollback is either impractical or hard, and so a hotfix is easier to perform and faster to deploy with higher priority over the regular ones.
  • Rolling deploy. Deploy shouldn’t interrupt service, but if absolutely necessary, the impact has to be minimal. It’s important to halt the deployment if a certain number of servers have failed to upgrade or a service SLA is violated.
  • Staging and testing. Deploying to production directly has higher risks than deploying to a staging environment or canary to verify things work first. Often times engineers don’t follow this best practice because of the overhead of creating a staging environment and integrating it with their tests. A good deploy system minimizes such overhead.
  • Visibility. Make sure it’s easy to find out which code changes are available to deploy and the number of hosts running new and older versions. It’s also important to easily track which code change was introduced when and by whom, as well as the critical metrics and alarms status during a deployment.
  • Usability. A simple user interface is key for the above functionalities.

Introducing Teletraan

Teletraan is our internal deploy system for supporting all of the above functions (named after a character from the famous Transformer TV series). It was built by a small group of development tools engineers on the Cloud Engineering team that drives reliability, speed, efficiency and security for the site and infrastructure.

Design overview

Teletraan follows the traditional client-server model with MySQL as the backend data storage.

Deploy agents are daemons running on all the hosts and interact with Agent Service periodically to get the latest instructions. During a deployment, an agent downloads and extracts service build tar along with specific deploy scripts, and executes them. These deploy scripts include PRE/POST-DOWNLOAD scripts, PRE/POST-RESTART scripts and the RESTART script itself, and are responsible for stopping and starting services.

Teletraan Workers perform jobs in the background, such as transitioning deploy states based on deploy progress and performing auto deploys based on schedule.

Teletraan Service provides APIs support for Deploy Board and any RESTful calls. It’s responsible for most deploy-related actions, including deploy and rollback. It also creates and maintains service deploy configuration, answers deploy and agent status queries, enforces permission control and more.

Advanced features

In addition to the core functionalities listed above, Teletraan also supports several advanced features:

  • Pause and resume. It comes in handy when a developer wants to double check something before the code is fully deployed to the cluster.
  • Qualification. Once configured, a successful deploy will trigger an acceptance test to qualify it. An accepted deploy could be used for future promote or auto deploy to the next stage
  • Auto Deploy. Automatically promote builds from one stage to another whenever a new build is available or based on cron job-like schedule settings. An auto deploy could be paused, rollbacked or overridden by the system automatically upon failures.

Teletraan has helped us move faster and ship code easier. We want to share these deploy tools with the world, and are planning to open-source Teletraan later this year. Keep an eye on the blog for updates.

Baogang Song is an engineering lead on the Internal Development Tools team, which is part of the Cloud Engineering team at Pinterest.

Acknowledgements: Teletraan was built by Jinru He, Nick DeChant and Baogang Song from the Internal Development Tools team.

For Pinterest engineering news and updates, follow our engineering Pinterest, Facebook and Twitter. Interested in joining the team? Check out our Careers site.