Conducto V1

Conducto V1 drove billions of dollars in revenue over a decade by enabling a few developers to continuously deliver and iterate upon flexible and scalable pipelines.

Jonathan Marcus
Conducto
3 min readJul 9, 2020

--

Conducto’s co-founders were early members of one of the top trading teams in the world. We were super successful for a decade. How did we get there? You probably expect that our data scientists (aka quants, aka machine learning researchers) were excellent and that our low-latency programmers were top-notch. They were, and they deserve a ton of credit. But underpinning our scalability and growth was the pipeline framework that later became Conducto.

What kinds of challenges did we solve?

Our goal was to gather a lot of data and make accurate & timely predictions about the future. Our tasks will be very familiar to anyone who has done CI/CD or data science:

  • Continuous deployments. New code, new models, and new ideas were deployed every single day. Bare-metal servers had to be configured perfectly to optimize for every nanosecond. Some changes were rolled out all at once, others in canary fashion. We had tight resource constraints, so we ran multidimensional optimizations to balance traffic, latency, and compute loads.
  • Data acquisition, indexing, and cleaning. A data science pipeline is only as good as the data fed into it. We built pipelines to gather terabytes of data every day from hundreds of sources. New data was indexed for fast access and then analyzed to assess its quality. In addition to daily updates, we infrequently reprocessed our entire data lake to take advantage of new indexing and compression methods.
  • Machine learning pipelines. At their simplest, these pipelines are just “compute features → run ML algorithm → backtest.” In practice we computed many types of features, ran different algorithms, and backtested in multiple ways. All of these were done in parallel, branching at the latest possible step to minimize duplicate work. And oh yeah — a single pipeline could run millions of distinct tasks and crunch petabytes of data.
  • Distributed build and test. Our codebase was too big for make -j8. Building and testing on one machine took over an hour. We parallelized our builds and tests to run in ~5 minutes. We also had intensive QA that verified new code on all data we’d ever received, for extra assurance that our changes were okay.

All of these problems required flexible solutions. Anything rigid would quickly become obsolete as our ideas advanced (along with those of our competitors!). The ability to quickly experiment and productionize was our lifeblood.

Enter Conducto V1

Solving these problems takes a mixture of business logic — written in Bash, C++, Python, and R — and pipeline tooling. At first we glued our modules together with DAG tools, with Jenkins, with cron, and with ad hoc pipeline scripts, but these were very brittle. It was hard to develop new features, and when they broke it was very hard to identify and reproduce the problem.

We created Conducto V1 to solve this problem. We made pipelines-as-code so that our pipelines could be as flexible as the tasks they linked. We built the UI with side-by-side tree and node panels to debug our errors as quickly as possible. Our pipelines became an intuitive extension of our regular coding abilities.

The Conducto UI (updated since V1) shows the whole pipeline and the details of current node, side-by-side.

One tool to rule them all

We were just a handful of engineers, and we all had other work to do. But with Conducto V1 we were able to create, maintain, and improve pipelines for all of the above problems.

We were phenomenally productive, and we made our teammates even more productive. The tool spread from our team to the entire company, where it currently has hundreds of users.

Our problems were not finance-specific, and nor is this tool. It is just a better way of linking together the pieces of logic you already have. Get started today.

--

--

Jonathan Marcus
Conducto

CEO and co-founder at Conducto. Former quant developer @JumpTrading. Likes board games, data science, and HPC infrastructure.