Introducing Wildebeest, a Python File-Processing Framework

Greg Gandenberger
Jul 30, 2020 · 3 min read
Image for post
Image for post
Photo Credit: Gopal Vijayaraghavan cc

Introduction

ShopRunner has more than ten million product images that we use to train computer vision classification models. Moving those files around and processing them is a pain without good tooling. Just downloading them serially takes many days, and the occasional corrupted image can bring the whole process to a halt. Without good logging and error handling, it might then be necessary to start the process over until the next error is raised.

Basic Example

The following code uses a fairly minimal Wildebeest pipeline to download a list of images to the current working directory as PNGs, parallelizing across up to ten threads.

Image for post
Image for post

Additional Capabilities

You can do more with Wildebeest than just download images:

  • Do arbitrary processing on each file, for instance to resize each image before writing it to disk.
  • Add columns to the run report that record arbitrary properties of the file, such as the average brightness of each image.
  • Selectively skip files based on arbitrary criteria. For instance, you can skip an input file when a file already exists at the intended output location, making it easy to pick up where you leave off quickly after a failure.

Conclusion

Wildebeest makes big data processing jobs fast and easy. To get started with it, you can read the docs, check out the code on GitHub, or install the package from PyPI.

Acknowledgements

Thanks to Michael Sugimura for feedback on an earlier draft and to the ShopRunner data science team for contributions to the library.

ShopRunner

ShopRunner

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store