Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Kedro: A New Tool For Data Science

10 min readJun 4, 2019

--

Image by stan (licensed under CC BY-SA 2.0)

Any code that feeds some business decision process should be considered as production code

What is Kedro?

The team from Quantum Black explain Kedro

Is Kedro a workflow scheduler?

What do I need to know?

A fifteen minute spaceflight with Kedro

Getting set up

kedro info

Kedro workflow

Set up the project template

kedro install

Set up the data

companies:
type: CSVLocalDataSet
filepath: data/01_raw/companies.csv
reviews:
type: CSVLocalDataSet
filepath: data/01_raw/reviews.csv
kedro ipython
io.load(‘companies’).head()

Create and run the pipeline

Data engineering pipeline

preprocessed_companies:
type: CSVLocalDataSet
filepath: data/02_intermediate/preprocessed_companies.csv
preprocessed_shuttles:
type: CSVLocalDataSet
filepath: data/02_intermediate/preprocessed_shuttles.csv

Data science pipeline

Combining the pipelines

def create_pipeline(**kwargs):  de_pipeline = Pipeline(
[
node(preprocess_companies, “companies”, “preprocessed_companies”),
node(preprocess_shuttles, “shuttles”, “preprocessed_shuttles”),
node(create_master_table,[“preprocessed_shuttles”, “preprocessed_companies”, “reviews”],“master_table”,),
],
name=”de”,
)
ds_pipeline = Pipeline(
[
node(split_data,[“master_table”, “parameters”],[“X_train”, “X_test”, “y_train”, “y_test”],),
node(train_model, [“X_train”, “y_train”], “regressor”),
node(evaluate_model, [“regressor”, “X_test”, “y_test”], None),
],
name=”ds”,
)
return de_pipeline + ds_pipeline

Kedro runners

kedro run — parallel

Contributions welcome!

Acknowledgements

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Jo Stichbury
Jo Stichbury

Written by Jo Stichbury

Technical content creator writing about data science and software. Old-school Symbian C++ developer, now accidental cat herder and goose chaser.

Responses (3)