I Hear Facebook has data, does it have data tools?

Julien Kervizic
Hacking Analytics
2 min readOct 11, 2018

--

Two years after I left Facebook, I still often get asked what kind of tools data people were using back there. Below, I tried to summarize the main tools that were used to explore datasets back then:

Scuba: Scuba offered slice and dice functionality typically handled by a pivot or some cube like structure, just at a significantly larger scale and in real-time. The downside of it was that data displayed in the tool was not always the most accurate. Further description of the tool is available in the following paper

Dataswarm: A data workflow automation and scheduling platform, predecessor to airflow. Like airflow it is also centered around the concept of directed acyclic graphs (DAGs). At Facebook dataswarm represented the way to automate anything that required batch automation of data pipelines. It allowed for running multi-steps data pipeline, interacting in different platforms or programming languages through the concept of operators and allowed to make dependencies explicit within the different processing steps. Some are available here

Deltoid: Deltoid in its numerous iterations offered a standard A/B testing platform for handling the reporting and analysis of the different experiences being tested at Facebook. It allowed for an easy ingestion and analysis of different experiment based on allocation or…

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com