Introducing Kedro Hooks

Simplifying the process of extending the framework

Photo by Clint Adair on Unsplash

The motivation for Hooks

Over the past few months, a number of semi-related problems have surfaced surrounding the architecture and capability of Kedro as a framework.

User-centric design thinking

The design process of every Kedro feature is user-centric and Hooks were not an exception. In the ideation phase of the feature, we first collected all major use cases where users have to extend Kedro. The use cases we collected ranged from pipeline visualisation and deployment to data validation.

User journey mapping for Kedro execution timeline
Lifecycle points in the execution timeline
  1. After the data catalog is created
  2. Before a pipeline run
  3. Before a node run
  4. After a node run
  5. After a pipeline run

What are Hooks?

Hooks are a mechanism to allow a user to extend Kedro by injecting additional behaviour at certain lifecycle points in Kedro’s main execution. The following lifecycle points, known as Hook Specifications, are provided in kedro.framework.hooks:

  • after_catalog_created
  • before_pipeline_run
  • before_node_run
  • after_node_run
  • after_pipeline_run
  • on_node_error
  • on_pipeline_error
  • Adding a transformer after the data catalog is loaded.
  • Adding data validation to the inputs, before a node runs, and to the outputs, after a node has run. This makes it possible to integrate with other tools like Great Expectations.
  • Adding machine learning metrics tracking, e.g. using MLflow, throughout a pipeline run.
  • Adding pipeline monitoring with StatsD and Grafana.

How to use Hooks

The general process you should follow to add Hooks to your project is:

  1. Identify the Hook Specification(s) we need to use.
  2. Provide Hook implementations for those Hook Specifications.
  3. Register Hook Implementations in ProjectContext.
Overview of Hooks registration process

Example: Using Hooks to integrate Kedro with MLflow

The following section will illustrate this process by walking through an example of using Hooks to integrate Kedro with MLflow, an open-source tool to add model and experimentation tracking to your Kedro pipeline. (Previous versions of Kedro required hard coding the MLflow integration logic inside their nodes, as previously described in this article).

  • Log the parameters after the data splitting node runs.
  • Log the model after the model training node runs.
  • Log the model’s metrics after the model evaluating node runs.

Step 1: Identify what Hook Specifications we need to use

To identify what Hook Specifications are needed, we need to think the lifecycle points in the Kedro execution timeline that we need to interact with.

  • We will need to start an MLflow run before the Kedro pipeline runs by implementing the before_pipeline_runHook Specification.
  • We want to add tracking logic after a model training node runs, so we need to implement the after_node_runHook Specification.
  • After the Kedro pipeline runs, we also need to end the MLflow run by implementing the after_pipeline_run.

Step 2: Provide Hook implementations

Having identified the necessary specifications, we need to implement them. In the Kedro project we create a Python package called hooks in the same directory as the nodes and pipelines and then create a module called hooks/model_tracking_hooks.py with the following content:

Step 3: Register Hook implementations in ProjectContext

After defining Hook Implementations with model-tracking behaviour, the next step is to register them in the ProjectContext in run.py as follows:

Step 4: Run the pipeline

Now we are ready to run the pipeline that has been extended with the MLflow machine learning tracking capability. Run the pipeline with kedro run and open the MLflow UI to see the tracking results. This is an example of a model tracking run.

The parameters are those we use to run Kedro and the Artifact is the model produced by that run

Find out more!

Further examples for using Hooks to implement data validation and pipeline monitoring can be found on our Github repo for Kedro examples.

--

--

QuantumBlack, AI by McKinsey, helps companies use data to drive decisions. We combine business experience, expertise in large-scale data analysis and visualisation, and advanced software engineering know-how to deliver results.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
QuantumBlack, AI by McKinsey

An advanced analytics firm operating at the intersection of strategy, technology and design. www.quantumblack.com @quantumblack