Summary of TensorFlow at Google I/O 2018

8 min readMay 28, 2018

In latest Google I/O, 7 talks are represented and some TensorFlow new features and functions are released. I think this could be treated as supplementary for Tfdev-summit 2018 in last March. Here is a brief summary with some important feature that I think DL developer should notice.

(Session List Link)

1. TensorFlow for JavaScript

(Video Link) (Homepage)

Supports JS for model training and deployment.

Advantages of In-browser ML:

No driver / install
Interactive
Sensors
Data stay in device client

Ability:

Author models directly in browser
Import pre-trained model for inference
Re-train imported models

Pipeline:

Save: Keras model or TF SavedModel model format accepted.
Convert: tfjs-converter
Including graph optimization
Optimize weights for browser caching
32+ tf/keras layer and 90+ tf ops support

Framework

Performance

Demos:

Emoji Scavenger Hunt (IOS/Andorid)

Human Pose Estimation

2. TensorFlow in production: TF extended, TF Hub, and TF serving

1). TensorFlow Hub

(Homepage)

A Library to foster publication, discovery and consumption of reusable parts of machine learning module.

Module Features

each contains weights and graph
composable, reusable (common signature), retrainable

Module usage

instantiating a module through file path or URL

import tensorflow_hub as hub 
m = hub.Module("path/to/a/module_dir", trainable=True, tags={"train"}) # file path
features = m(images)
logits = tf.layers.dense(features, NUM_CLASSES)
prob = tf.nn.softmax(logits)

or after setting TFHUB_CACHE_DIR and then create a module from URL.

export TFHUB_CACHE_DIR=/my_module_pathm = hub.Module("https://tfhub.dev/google/progan-128/1")

model uploaded name rules:

tfhub.dev: repository url
google: module publisher
progan-128: module name
1: module version

Integrated with TensorFlow estimator:

review = hub.text_embedding_column("review", "http://tfhub.dev/google/universal-sentence-encoder/1") 
features = {"review": np.array(["an argula masterpiece", "inedible shoe leather", ...])}
labels = np.array([[1], [0], ...])
input_fn = tf.estimator.input.numpy_input_fn(features, labels, shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units, [review])
estimator.train(input_fn, max_steps=100)

Available Modules

Industry standard: Inception, ResNet and inception-ResNet
Efficient: MobileNet
Cutting edge: NASNet and PNASNet (NASet-Large cost 62000+ GPU hours)

2). TensorFlow Serving

(Homepage)

Flexible, high-performance serving system for machine learning model deployment .

Features

Multiple models: simultaneously; dynamic model loading/unloading
Isolation: loading/serving threads for low latency during model version transition.
High throughput: Dynamic request batching, performance conscious design.

Architecture

Servables: object that client use to perform computation

Not manage their own life cycle;
Including TensorFlow SavedModelBundle and lookup table for embedding or vocabulary lookups.
Versions: one or more versions of a servable could be loaded concurrently (support for gradual rollout and experiment)
Streams: a sequence of versions of a servable sorted by increasing version number.
Models: represents as one or more servables. a composite model is either multiple independent servables or single composite servable. A large lookup table could be sharded into many TensorFlow serving instances.

2. Loaders: manage a servable’s life cycle; Standardize APIs for loading and unloading a servable independent of specific learning algorithm.

3. Sources: Plugin modules that find and provide servables.

provide zero or more servable stream. For each stream a Source supplies one Loader instance for each version of servable.
discover servables from arbitrary storage system (RPC etc)

4. Aspired Versions: a set of servable versions that should be loaded and ready. When a Source gives a new list of aspired versions to the Manager, it supercedes the previous list for that servable stream. The Manager unloads any previously loaded versions that no longer appear in the list.

5. Managers: handle the full lifecycle of Servables: loading, serving, unloading servable. Listen to Sources and track all versions; Postpone loading if not ready or unloading until newer version loaded.

NEW distributed serving use-case:

REST API: seamlessly serve ML in web/mobile RESTful microservices.

3). TensorFlow Extended

(Homepage)

TF Extended (TFX) is tensorflow-based general-purpose machine learning platform.

Features

flexible: continuous training and updating -> higher accuracy and faster convergence.

Portable:

with TF

with Apache Beam: batch and streaming data processing

with Kubernetes/Kubeflow: deployment of machine learning.

Scaleable : local <-> cloud
Interactive: visualization

Architecture

Tools released

TensorFlow Transform: consistent in-graph transformation in training and serving
TensorFlow Model Analysis: scaleable, sliced and full-pass metrics.
TensorFlow Serving
Facets: visualization of datasets

Pipeline

use Facet to analyze data.

2. use tf.Transform for feature transformation

3. train with TensorFlow Estimator

4. Analyze model with TensorFlow Model Analysis: slice metrics.

5. serving with TF serving

3. TensorFlow High-Level API

1). Colab

(Tutorial)

An easy way to learn and use TensorFlow.

Workshops: some exercises

2). APIs

tf.keras
tf.data: easy input pipelines
Eager execution: imperative interface to TensorFlow with one command: tf.enable_eager_execution()

4. TensorFlow Lite for mobile developers

(Homepage)

Feature

Cross-platform

Light: core interpreter size: 75k ; with all ops: 400k

Architecture

Converter

FlatBuffer based
faster to mmap
few dependencies
pre-fused activation and biases
weights truncation

2. Interpreter Core

static memory plan
static execution plan
fast load-time

3. operation kernels

specifically optimized kernels optimized for NEON or ARM

4. Hardware acceleration delegation

Direct GPU integration
Android neural network API HVX

5. Quantized Training

Fine-tune weights
Estimate quantization parameters

ML Kit: newly announced machine learning SDK exposed to both on device and cloud powered API.

ops and model support:

~50 common op
allow custom ops
now only limited to inference ops
support models: MobileNet, InceptionV3, ResNet50, SqueezeNet, DenseNet, InceptionV4, SmartReply, quantized version of MobileNet, InceptionV3

Usage:

Pipeline

convert to TF Lite format:

use frozen graphdef or SavedModel and avoid unsupported operators; write custom operators for any missing functionality.
visualize model to check

from tf.contrib.lite import convert_savedmodel
convert_savedmodel.convert(saved_model_dir=”/path/to/model”, output_tflit=”model.tflite”)

2. write custom op

TfLiteRegistration reg = {
 .invoke = [](TfLiteContext* context, TfLiteNode node) {
 TfLiteTensor* a = &context->tensors[node->inputs->data[0]];
 a->data.f[0] = M_PI;
 return kTfLiteOk;
 }
}

3. C++ API

load model

std::unique_ptr<tflit::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(“model.tflite”)

tflite::ops::builtin::NeededOpsResolver minimal_resolver;

builder interpreter

std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(model, minimal_resolver)(&interpreter);

execution

// feed input
int input_indxe = interpreter->inputs()[0]:
float *intput = interpreter->typed_tensor<float>(input_index);
// … fill in the input

// run inference
interpreter.Invoke();
// read output
//…

Python API
Java API
Android APP gradle file
iOS CocoaPods

5. Distributed TensorFlow training

(Homepage) (DistributionStrategy)

1) Data parallelism

Async parameter server

Sync Allreduce Architecture: next round computation wait until all worker received updated gradients.

Ring Allreduce Architecture: Fast

Use parameter server if a number of less-powerful devices such as CPU; use Sync AllReduce if fast devices with strong communication links., like GPU and TPU.

Input pipeline bottleneck

Solution: tf.data.Dataset API.;parallelize file reading and data transforming; prefetch (dataset.prefetch(buffer_size=1)) to decouple the time of data produced and consumed (prepare data when accelerator is still training).

initial

parallelization

def input_fn(batch_size):
    files = tf.data.Dataset.list_files(file_pattern)
    dataset = tf.data.TFRecordDataset(files, num_parallel_reads=40) # num of cpus
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.repeat(NUM_EPOCHES)
    dataset = dataset.map(parser_fn, num_parallel_calls=40)
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(buffer_size=1)
    return dataset

Fused transformed ops

dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000, NUM_EPOCHES))
dataset = dataset.apply(tf.contrib.data.map_and_batch(parser_fn, batch_size)
)

Multi machine distributed training

use Estimator train_and_evalute API which use async parameter server approach.

2). Model parallelism

3). scaling to multiple-gpu in TensorFlow

distribution = tf.contrib.distribute.MirroredStrategy() # mirrored strategy for multi GPU distribution
run_config = tf.estimator.RunConfig(train_distribute=distribution)

classifier = tf.estimator.Estimator(
model_fn=model_function,
model_dir=model_dir,
config=run_config)
classifer.train(input_fn=input_function)

Mirrored Strategy:

Implement Sync Allreduce Architecture and model parameters mirrored across devices.