Orchestrate Your Data Science Project with Prefect 2.0

Make Your Data Science Pipeline Resilient Against Failures

Khuyen Tran
The Prefect Blog

--

Motivation

There are a lot of components of a typical data science pipeline such as loading data, processing data, training a model, and making predictions. As a project grows, the number of components, as well as the dependencies between them, proliferate.

If each component has an independent chance of failing, it increases the likelihood that the entire pipeline fails with each run. Thus, it is inevitable that there will be failures in your pipeline.

Image by Author

Instead of preventing failures from happening, we should write code so that if a failure occurs, our pipeline will:

  • fail gracefully
  • recover quickly

How can we do that? That is when negative engineering comes in handy.

What is Negative Engineering?

Before talking about negative engineering, let’s talk about positive engineering. Positive engineering is writing code to achieve a certain objective. That objective could be:

  • training a good ML model
  • gaining insights from your data

--

--