A Guide to Distributed TensorFlow: Part 1

How to set up efficient input data pipelines for deep learning using TFRecord and tf.data.Dataset API

Roshan Thaikkat
Oct 9, 2020 · 6 min read

TL;DR

Deep Learning

Wouldn’t it be nice if you could just take your experimental neural network models and scale them up with massive amounts of data, without having to rely on anyone to do it for you?

Problem setup

The workflow we describe here involves training a model on data gathered from cloud storage. Training is distributed using the Google Kubernetes Engine.

TensorFlow

Input pipeline

Minimal example of a data preprocessing function.
This function will distribute`fn(..)` across cluster workers. The resulting data instances per asset are serialized in a distributed way. We’re using Dask’s vanilla map and gather strategy.
Create batches of assets, distribute the workload via `map_fn(..)` and save results using TFRecordWriter.
Array serializing functions.
Load previously saved TFRecords

The Next Step

When Machines Learn

Sharing research and insight into applying machine learning to industrial asset management.

When Machines Learn

A blog to share research and work in applying machine learning in heavy industry. Focus includes asset management and process optimization.

Roshan Thaikkat

Written by

Data Scientist at Tagup, Inc. I’m excited about deep learning, RL and automated machine learning.

When Machines Learn

A blog to share research and work in applying machine learning in heavy industry. Focus includes asset management and process optimization.