Summary of TensorFlow at Google I/O 2018

zong fan
8 min readMay 28, 2018

--

In latest Google I/O, 7 talks are represented and some TensorFlow new features and functions are released. I think this could be treated as supplementary for Tfdev-summit 2018 in last March. Here is a brief summary with some important feature that I think DL developer should notice.

(Session List Link)

1. TensorFlow for JavaScript

(Video Link) (Homepage)

Supports JS for model training and deployment.

Advantages of In-browser ML:

  • No driver / install
  • Interactive
  • Sensors
  • Data stay in device client

Ability:

  • Author models directly in browser
  • Import pre-trained model for inference
  • Re-train imported models

Pipeline:

Framework

Performance

Demos:

Emoji Scavenger Hunt (IOS/Andorid)

Human Pose Estimation

2. TensorFlow in production: TF extended, TF Hub, and TF serving

1). TensorFlow Hub

(Homepage)

A Library to foster publication, discovery and consumption of reusable parts of machine learning module.

Module Features

  • each contains weights and graph
  • composable, reusable (common signature), retrainable

Module usage

instantiating a module through file path or URL

import tensorflow_hub as hub 
m = hub.Module("path/to/a/module_dir", trainable=True, tags={"train"}) # file path
features = m(images)
logits = tf.layers.dense(features, NUM_CLASSES)
prob = tf.nn.softmax(logits)

or after setting TFHUB_CACHE_DIR and then create a module from URL.

export TFHUB_CACHE_DIR=/my_module_pathm = hub.Module("https://tfhub.dev/google/progan-128/1") 

model uploaded name rules:

  • tfhub.dev: repository url
  • google: module publisher
  • progan-128: module name
  • 1: module version

Integrated with TensorFlow estimator:

review = hub.text_embedding_column("review", "http://tfhub.dev/google/universal-sentence-encoder/1") 
features = {"review": np.array(["an argula masterpiece", "inedible shoe leather", ...])}
labels = np.array([[1], [0], ...])
input_fn = tf.estimator.input.numpy_input_fn(features, labels, shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units, [review])
estimator.train(input_fn, max_steps=100)

Available Modules

  • Industry standard: Inception, ResNet and inception-ResNet
  • Efficient: MobileNet
  • Cutting edge: NASNet and PNASNet (NASet-Large cost 62000+ GPU hours)

2). TensorFlow Serving

(Homepage)

Flexible, high-performance serving system for machine learning model deployment .

Features

  • Multiple models: simultaneously; dynamic model loading/unloading
  • Isolation: loading/serving threads for low latency during model version transition.
  • High throughput: Dynamic request batching, performance conscious design.

Architecture

  1. Servables: object that client use to perform computation
  • Not manage their own life cycle;
  • Including TensorFlow SavedModelBundle and lookup table for embedding or vocabulary lookups.
  • Versions: one or more versions of a servable could be loaded concurrently (support for gradual rollout and experiment)
  • Streams: a sequence of versions of a servable sorted by increasing version number.
  • Models: represents as one or more servables. a composite model is either multiple independent servables or single composite servable. A large lookup table could be sharded into many TensorFlow serving instances.

2. Loaders: manage a servable’s life cycle; Standardize APIs for loading and unloading a servable independent of specific learning algorithm.

3. Sources: Plugin modules that find and provide servables.

  • provide zero or more servable stream. For each stream a Source supplies one Loader instance for each version of servable.
  • discover servables from arbitrary storage system (RPC etc)

4. Aspired Versions: a set of servable versions that should be loaded and ready. When a Source gives a new list of aspired versions to the Manager, it supercedes the previous list for that servable stream. The Manager unloads any previously loaded versions that no longer appear in the list.

5. Managers: handle the full lifecycle of Servables: loading, serving, unloading servable. Listen to Sources and track all versions; Postpone loading if not ready or unloading until newer version loaded.

NEW distributed serving use-case:

REST API: seamlessly serve ML in web/mobile RESTful microservices.

3). TensorFlow Extended

(Homepage)

TF Extended (TFX) is tensorflow-based general-purpose machine learning platform.

Features

  • flexible: continuous training and updating -> higher accuracy and faster convergence.
  • Portable:

with TF

with Apache Beam: batch and streaming data processing

  • with Kubernetes/Kubeflow: deployment of machine learning.
  • Scaleable : local <-> cloud
  • Interactive: visualization

Architecture

Tools released

Pipeline

  1. use Facet to analyze data.

2. use tf.Transform for feature transformation

3. train with TensorFlow Estimator

4. Analyze model with TensorFlow Model Analysis: slice metrics.

5. serving with TF serving

3. TensorFlow High-Level API

1). Colab

(Tutorial)

An easy way to learn and use TensorFlow.

Workshops: some exercises

2). APIs

  • tf.keras
  • tf.data: easy input pipelines
  • Eager execution: imperative interface to TensorFlow with one command: tf.enable_eager_execution()

4. TensorFlow Lite for mobile developers

(Homepage)

Feature

  • Cross-platform
  • Light: core interpreter size: 75k ; with all ops: 400k

Architecture

  1. Converter
  • FlatBuffer based
  • faster to mmap
  • few dependencies
  • pre-fused activation and biases
  • weights truncation

2. Interpreter Core

  • static memory plan
  • static execution plan
  • fast load-time

3. operation kernels

  • specifically optimized kernels optimized for NEON or ARM

4. Hardware acceleration delegation

  • Direct GPU integration
  • Android neural network API HVX

5. Quantized Training

  • Fine-tune weights
  • Estimate quantization parameters

ML Kit: newly announced machine learning SDK exposed to both on device and cloud powered API.

ops and model support:

  • ~50 common op
  • allow custom ops
  • now only limited to inference ops
  • support models: MobileNet, InceptionV3, ResNet50, SqueezeNet, DenseNet, InceptionV4, SmartReply, quantized version of MobileNet, InceptionV3

Usage:

Pipeline

  1. convert to TF Lite format:
  • use frozen graphdef or SavedModel and avoid unsupported operators; write custom operators for any missing functionality.
  • visualize model to check
from tf.contrib.lite import convert_savedmodel
convert_savedmodel.convert(saved_model_dir=”/path/to/model”, output_tflit=”model.tflite”)

2. write custom op

TfLiteRegistration reg = {
.invoke = [](TfLiteContext* context, TfLiteNode node) {
TfLiteTensor* a = &context->tensors[node->inputs->data[0]];
a->data.f[0] = M_PI;
return kTfLiteOk;
}
}

3. C++ API

  • load model
std::unique_ptr<tflit::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(“model.tflite”)
  • register ops
tflite::ops::builtin::NeededOpsResolver minimal_resolver;
  • builder interpreter
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(model, minimal_resolver)(&interpreter);
  • execution
// feed input
int input_indxe = interpreter->inputs()[0]:
float *intput = interpreter->typed_tensor<float>(input_index);
// … fill in the input

// run inference
interpreter.Invoke();
// read output
//…
  • Python API
  • Java API
  • Android APP gradle file
  • iOS CocoaPods

5. Distributed TensorFlow training

(Homepage) (DistributionStrategy)

1) Data parallelism

Async parameter server

Sync Allreduce Architecture: next round computation wait until all worker received updated gradients.

Ring Allreduce Architecture: Fast

Use parameter server if a number of less-powerful devices such as CPU; use Sync AllReduce if fast devices with strong communication links., like GPU and TPU.

Input pipeline bottleneck

Solution: tf.data.Dataset API.;parallelize file reading and data transforming; prefetch (dataset.prefetch(buffer_size=1)) to decouple the time of data produced and consumed (prepare data when accelerator is still training).

initial

parallelization

def input_fn(batch_size):
files = tf.data.Dataset.list_files(file_pattern)
dataset = tf.data.TFRecordDataset(files, num_parallel_reads=40) # num of cpus
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(NUM_EPOCHES)
dataset = dataset.map(parser_fn, num_parallel_calls=40)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(buffer_size=1)
return dataset

Fused transformed ops

dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000, NUM_EPOCHES))
dataset = dataset.apply(tf.contrib.data.map_and_batch(parser_fn, batch_size)
)

Multi machine distributed training

use Estimator train_and_evalute API which use async parameter server approach.

2). Model parallelism

3). scaling to multiple-gpu in TensorFlow

distribution = tf.contrib.distribute.MirroredStrategy() # mirrored strategy for multi GPU distribution
run_config = tf.estimator.RunConfig(train_distribute=distribution)

classifier = tf.estimator.Estimator(
model_fn=model_function,
model_dir=model_dir,
config=run_config)
classifer.train(input_fn=input_function)

Mirrored Strategy:

Implement Sync Allreduce Architecture and model parameters mirrored across devices.

  • no change to model/training loop
  • no change to input function (require tf.data.Dataset API)
  • seamless checkpoint with summary

TPU, Keras API, multi machine Mirrored-strategy is in working.

--

--