Year 3 of Life at a High Growth Startup
By Sunil Shah
As I approach my third anniversary at Mesosphere, an infrastructure software startup, it’s a good time to reflect on the growth of the company, its associated challenges, and to share some of the lessons I’ve learned along the way.
As a quick refresher, Mesosphere is an open-core company — that is, a company seeking to commercialise an existing open source software project. We’ve accomplished that by building a product called DC/OS (the Datacenter Operating System), built on the Apache Mesos open source project.
Mesos was created by one of our founders while he was a graduate student at UC Berkeley’s own AMPLab. It’s a cluster manager that allows for more efficient use of large clusters of computers, providing an abstraction for users who wish to run applications on a farm of physical (or virtual — if using the cloud) machines, deciding what applications run where.
Cluster managers like Mesos offer great value to large corporations who, like the famed tech giants of Silicon Valley, operate their own datacenters housing tens of thousands of machines. However, these businesses have neither the technical expertise to deploy a system like Mesos nor an understanding of the benefits of doing so. DC/OS, on the other hand, is a batteries-included distribution of Mesos that includes dozens of other components, that makes it easy to set up and manage a cluster of machines (think Microsoft Windows but across an entire datacenter).
This is an ambitious goal which requires a lot of engineering resources to accomplish.
When I last wrote about Mesosphere we had about 60 employees. Since then, our headcount growth has been remarkably rapid. We’re now at over 250 employees, with offices in China, Hamburg, London, New York, and San Francisco.
We’ve also taken on further funding to accomplish this goal, with a strategic $73.5M series C round led by Hewlett Packard Enterprise and Microsoft.
Growing this quickly presents a number of challenges. Hiring for the infrastructure software space requires hard-to-find engineers who have expertise in multiple areas of computer science, including distributed systems, networking, operating systems, and so on. Once qualified candidates are found, interviewing takes up a significant amount of time — not just the time spent conducting the interview, but time spent coordinating with other members of the interview panel, preparing questions and compiling feedback. This often left the engineering team struggling to focus on the day-to-day rigours of actually writing code.
We’ve had to re-organise the engineering team four times since I joined — going from a flat organisational structure to adding a single engineering director, to adding multiple engineering managers, followed by the formation of multiple engineering organisations each with engineering directors. The uncertainty this created for anxious engineers was challenging but more often than not, there was a collective sigh of relief as we realized the new organisation structure was better aligned and communication between teams was easier.
These reorgs, coupled with changing product priorities, offered many opportunities for an engineer to work on something different as new projects frequently spin up.
In my case, I began working on a new developer tools integration that a customer indicated was critical to their deployment of DC/OS. This went well, so a team was formed around the product, and I eventually became the manager.
This was incredibly tough at first — in part because I was now managing more experienced peers. I remember meeting an experienced director of engineering at a networking event once and commenting on how difficult it was to actually complete every piece of inbound work. He gave me some sage advice, which was that, “as a manager you just have to come to terms with letting a certain amount of work fall off your list each day”.
A few months later, I took over as product manager for our developer tools product when our existing product manager was re-assigned. Suddenly my weekends and evenings disappeared under a barrage of emails, customer calls, community outreach, evangelism at conferences and meetups, and trying to release new developer tools integrations. The pressure was intense enough that, after a particularly nasty mountain biking crash, I felt obligated to dial into a strategy meeting from the emergency room!
This wasn’t sustainable, of course, and I quickly tried to evolve a more efficient way of working. One answer was to lose the product manager role, which came about after my manager realised that the demands on my time were unsustainable. The other was to develop processes around issue tracking and documentation that allowed me to spend my time on the things that matter.
As the business grew and competition intensified (a clear validation of the market opportunity), we made some fundamental shifts in our product lineup. For one, we decided to open source a large part of DC/OS, to encourage community participation and to allow the small to medium businesses that were unable to becoming paying customers to benefit from the software we had built. We’ve also dramatically grown our sales, marketing and customer success teams. These teams are now attempting to grow our funnel of potential customers and to nurture sustainable relationships with the customers that we currently have.
As it stands today, many large enterprises depend upon Mesosphere’s DC/OS to run their datacenters, with deployments being used for internet of things infrastructure, telecommunications infrastructure, financial software, video games, and more.
I’ve found the experience worthwhile (albeit a little stressful at times) — having picked up valuable skills in product management, customer support, recruiting and of course, writing production grade enterprise software! As Mesosphere matures, I’ve decided to move on to an organization where I will be able to gain operational experience with systems like DC/OS. In August, I’ll be joining the Distributed Systems team at Yelp to help build a Mesos-based platform to run batch workloads at scale. I couldn’t be more excited to continue the journey towards datacenter nirvana.