At Invoca, our engineering team has worked hard to transition to a decentralized service ownership structure. Each team has services and features that they own, from ideation all the way to deployment and monitoring. The autonomy of the teams to define and prioritize their respective roadmaps has been effective at ensuring reliability and stability while continuing to deliver new features. This structure gets difficult, however, when there are efforts that span multiple teams and require tight coordination. We use working groups as a tool when working on cross team initiatives that are otherwise having difficulty gaining momentum.
Wikipedia defines a working group…
We have a lot of metrics that power dashboards and alerts around our production infrastructure. This includes our Rails frontend servers. We believe visibility is critical to being able to anticipate and act on any reliability or performance issue. Something we were lacking though was a complete picture of requests by Rails controllers and actions (in aggregate), that also included the average and upper percentile response latency.
For those familiar with Rails, you know about the stats that show up at the end of each request in the standard Rails logger:
As part of our Journey to Continuous Deployment at Invoca, one of the roadblocks to more frequent and flexible deploys was database schema migrations. We had to lock users out during a deploys that contained migrations because when users read or wrote to tables that were being migrated they would get errors. While effective at reducing schema incompatibility errors, our goal was to deploy daily and at any time of day. We needed a way to migrate our databases without having a maintenance window.
We solved the migration problem by using, and later extending, a tool for MySQL from Percona called Online Schema Change. Using triggers and atomic table swapping, Online Schema Change enables you to change table schemas without going out of service. We created a ruby wrapper around the tool and extended the Rails database rake tasks, which allowed us to continue using Rails migration files and reduce the amount of developer workflow changes needed for migrations. …