New features available with Kedro

We’ve added datasets and documentation enhancements to the recent 0.18.4 release of Kedro

The image shows the silhouette of a person in front of a yellow, green and blue “jellyfish” representation of data.
Data Represented in an Interactive 3-D Form” by Idaho National Laboratory is licensed under CC BY 2.0

Datasets enhancements

The new release of Kedro (0.18.4) focuses on improving datasets to enhance input and output in a data and machine-learning pipeline.

# Load a Spark DataFrame on S3
flight_patterns:
type: spark.SparkDataSet
filepath: s3a://your_bucket/data/01_raw/flight_patterns*
credentials: dev_s3
file_format: csv

# Save an image created with Matplotlib on Google Cloud Storage
results_plot:
type: matplotlib.MatplotlibWriter
filepath: gcs://your_bucket/data/08_results/plots/output_1.jpeg
fs_args:
project: my-project
credentials: my_gcp_credentials
  • svmlight.SVMLightDataSet to work with svmlight/libsvm files using scikit-learn library
  • video.VideoDataSet to read and write video files from a filesystem
  • video.video_dataset.SequenceVideo to create a video object from an iterable sequence to use with VideoDataSet
  • video.video_dataset.GeneratorVideo to create a video object from a generator to use with VideoDataSet
  • pandas.SQLQueryDataSet now takes the optional argument execution_options to reduce memory usage when dealing with large dataset .

Documentation improvements

To accelerate the process of getting Kedro up and running, we’ve made some changes to our documentation to improve it for new users.

Contributions from the Kedro community

The release also includes some configuration improvements and numerous bug fixes and minor enhancements in response to reports from our users on Kedro’s Slack organisation. Take a look at the full release notes on GitHub for details. We’re proud of the fact that 14 of the PRs included in this release are contributions by members of Kedro’s open-source community. We’d particularly like to thank the following GitHub users:

A recording of the October 2022 Kedro showcase online event

--

--

QuantumBlack, AI by McKinsey, helps companies use data to drive decisions. We combine business experience, expertise in large-scale data analysis and visualisation, and advanced software engineering know-how to deliver results.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
QuantumBlack, AI by McKinsey

An advanced analytics firm operating at the intersection of strategy, technology and design. www.quantumblack.com @quantumblack