Anand Parikh is the Manager of the Data Centers and Networking (DCN) division of Technical Program Managers at Facebook. He has been with the company for 5 years. He has a Civil Engineering degree from West Virginia University and spends his spare time hiking around California.
What is a TPM?
TPM stands for “technical program manager.” We are a team that manages the coordination of business decisions and leads design through the implementation of our infrastructure. The DCN team has grown from just a few people to a couple of dozen during my time here. We’re a global team in California, Ireland and Hong Kong. Our focus is the “Technical” portion of “TPM”, which is why we have TPMs that have varying areas of expertise:
- Network Engineering TPMs work cross-functionally with other teams to build out Facebook’s network. These TPMs work on data center networking, backbone networking, and edge deployments. Our goal is to connect the world, and it all begins with building out the network.
- Data Center TPMs manage our large data center build and turn up projects, playing a key role as Facebook continues to scale. They are responsible for aligning multiple teams to build and operate these massive data center buildings.
- Hardware TPMs strive to make the hardware more scalable, efficient, and effective. The TPMs work with many cross-functional teams to develop and deliver first-class, hyper-scale hardware.
- Capacity TPMs team up with engineers to determine how to scale and manage Facebook capacity to accommodate the users. They’re consistently challenged to draw from technical skills to drive programs, including knowledge of data centers, power, hardware, networking, and software.
- Engineering TPMs work with engineers on software efficiency, reliability, and quality to integrate into our back-end infrastructure while simultaneously helping to shape product vision.
What tools and technologies do TPMs use on a daily basis?
The goal of a TPM is to help engineers move their programs to completion more efficiently. The most effective way to do this is through face-face communication. Facebook is all about connecting people — no tools, documentation or software could ever replace the face to face interactions that are so vital to our progress.
That being said, there are a number of internally developed tools we use to manage programs from start to finish. “Tasks” are used to assign responsibility to an owner of an action item, to follow up and to document the resolution of the requests.
Our “Projects Tool” and “Facebook Groups” allow us to communicate, to solicit feedback, and to track progress. These tools allows us to have others review and comment in an open forum to ensure that the best decisions are made. By internally dogfooding Facebook Groups, we are able to better understand the pain points and use cases of our users as well.
What are some pain points in your day-to-day work?
When we talk about Facebook infrastructure, it includes a plethora of products that serve our community: Facebook, Messenger, WhatsApp, Groups, Instagram, Oculus, etc. All of these services need to rely on our infrastructure to provide sufficient capacity, efficiency, and reliability for the hundreds of millions of daily users.
These unprecedented scaling challenges imply that we are constantly reevaluating our roadmap. So far, we have been able to accommodate an evolving infrastructure from a simple text News Feed, to photos, video, 360 video, and now Live video. Going forward we’ll need to figure out how to integrate emerging technologies like VR, which will introduce new variables and cause us to rethink our strategy.
The TPMs work together to develop our software, network, and hardware stacks in blocks that remain flexible so we can adapt for product growth. Sometimes it seems like we’re trying to hit a moving target due to the high rate of change. We have to reassess what we’re building for on a continual basis.
What challenges do you see that hinder the industry from advancing?
In the area of Data Centers and Networking, we encounter scenarios where our scaling demands cannot be met by existing technologies, or the new devices have yet to be widely adopted by the industry.
Let’s focus on one example: Data Center Networks, which is Intra-Data Center Network Traffic. Just a year ago we were first adopting 40GbE technology. Now we already see the need for a 100GbE Intra-DC network to allow our thousands of software engineers to build applications that scale and in turn build better product experiences for our users.
To satisfy our demands, we have to work very closely with the industry to also adopt these new technologies and drive down the costs to an acceptable level. If we can’t get a higher quantity of fast networks, the rest of the industry will also suffer.
Aside from data centers, we recognize that great content can only reach users through network connections. We work to ensure that our FBCDN Edge Egress capacity continues to grow to provide this access. The limited amount of fiber infrastructure in the ground also provides a constraint, which is why Facebook has been involved with fiber investments (Asia-Pacific Gateway Cable). Our announcement of the Telecom Infra Project initiative at MWC also addresses the fact that many people globally do not have access to cellular networks (4G,LTE, etc).
What are your biggest predictions for the year ahead?
One big challenge we announced in late February was the launch of Live. It is a product that began rolling out in 2015 in the U.S. and in less than 4 months we are in 30 countries on iOS. Over the upcoming months, we will work to improve the user experience and to enable our community to broadcast via Android as well.
Live is taking off and what we have seen is that on average people watch Live video 3x longer than static content. This introduces a huge opportunity for TPMs to tackle improvement in hardware, software, Data Centers, Network and FBCDN Edge capacity. All the TPM disciplines have to work together to meet the demands. This is going to be big in the near future and it has big challenges associated with it.