Thumbtack, Google Cloud, and the future of infrastructure
Our post about Thumbtack’s transition to GCP data infrastructure just went out this morning:
In early 2015, we began building data infrastructure at Thumbtack. At the time, the company's data was spread across a…cloud.google.com
This transition to GCP is the culmination of quite a few months of work on our end — but more importantly, it is representative of a broader sea of change as organizations like ours shift towards a totally serverless, fully-managed world.
In the old days, building a software company, particularly one with “big data”, meant a ton of physical machines and self-managed software deployments with a huge operational load. When I began working on infrastructure, my first cluster was running vanilla Hadoop on a collection of 9 Mac Minis and a cheap Netgear router in my office:
When running a cluster in this configuration, your biggest challenge is not on the software side. It turns out that the rising heat from a heavy MapReduce workload will cause the top couple of machines to overheat, resulting in some serious map task failures in your MapReduce jobs :)
Thankfully, we’ve come a long way from this sort of thing. Everything is abstracted away, and the public clouds of GCP, AWS, Azure are dealing with the details of complex, multi-region, large-scale infrastructure deployments and providing managed services on top of that infrastructure. For those of us who have brought up Hadoop clusters from the hardware up, Dataproc is absolutely magical.
Given another few years, and infrastructure engineering will simply involve wiring together assortments of managed services. That’s a fantastic end state, because we’ll all be focusing on engineering work that is core to our respective businesses, and leaving the infrastructure to the cloud providers.