In latest Google I/O, 7 talks are represented and some TensorFlow new features and functions are released. I think this could be treated as supplementary for Tfdev-summit 2018 in last March. Here is a brief summary with some important feature that I think DL developer should notice.
1. TensorFlow for JavaScript
(Video Link) (Homepage)
Supports JS for model training and deployment.
Advantages of In-browser ML:
- No driver / install
- Interactive
- Sensors
- Data stay in device client
Ability:
- Author models directly in browser
- Import pre-trained model for inference
- Re-train imported models
Pipeline:
- Save: Keras model or TF SavedModel model format accepted.
- Convert: tfjs-converter
- Including graph optimization
- Optimize weights for browser caching
- 32+ tf/keras layer and 90+ tf ops support
Framework
Performance
Emoji Scavenger Hunt (IOS/Andorid)
2. TensorFlow in production: TF extended, TF Hub, and TF serving
1). TensorFlow Hub
(Homepage)
A Library to foster publication, discovery and consumption of reusable parts of machine learning module.
Module Features
- each contains weights and graph
- composable, reusable (common signature), retrainable
Module usage
instantiating a module through file path or URL
import tensorflow_hub as hub
m = hub.Module("path/to/a/module_dir", trainable=True, tags={"train"}) # file path
features = m(images)
logits = tf.layers.dense(features, NUM_CLASSES)
prob = tf.nn.softmax(logits)
or after setting TFHUB_CACHE_DIR
and then create a module from URL.
export TFHUB_CACHE_DIR=/my_module_pathm = hub.Module("https://tfhub.dev/google/progan-128/1")
model uploaded name rules:
- tfhub.dev: repository url
- google: module publisher
- progan-128: module name
- 1: module version
Integrated with TensorFlow estimator:
review = hub.text_embedding_column("review", "http://tfhub.dev/google/universal-sentence-encoder/1")
features = {"review": np.array(["an argula masterpiece", "inedible shoe leather", ...])}
labels = np.array([[1], [0], ...])
input_fn = tf.estimator.input.numpy_input_fn(features, labels, shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units, [review])
estimator.train(input_fn, max_steps=100)
Available Modules
- Industry standard: Inception, ResNet and inception-ResNet
- Efficient: MobileNet
- Cutting edge: NASNet and PNASNet (NASet-Large cost 62000+ GPU hours)
2). TensorFlow Serving
(Homepage)
Flexible, high-performance serving system for machine learning model deployment .
Features
- Multiple models: simultaneously; dynamic model loading/unloading
- Isolation: loading/serving threads for low latency during model version transition.
- High throughput: Dynamic request batching, performance conscious design.
Architecture
- Servables: object that client use to perform computation
- Not manage their own life cycle;
- Including TensorFlow SavedModelBundle and lookup table for embedding or vocabulary lookups.
- Versions: one or more versions of a servable could be loaded concurrently (support for gradual rollout and experiment)
- Streams: a sequence of versions of a servable sorted by increasing version number.
- Models: represents as one or more servables. a composite model is either multiple independent servables or single composite servable. A large lookup table could be sharded into many TensorFlow serving instances.
2. Loaders: manage a servable’s life cycle; Standardize APIs for loading and unloading a servable independent of specific learning algorithm.
3. Sources: Plugin modules that find and provide servables.
- provide zero or more servable stream. For each stream a Source supplies one Loader instance for each version of servable.
- discover servables from arbitrary storage system (RPC etc)
4. Aspired Versions: a set of servable versions that should be loaded and ready. When a Source gives a new list of aspired versions to the Manager, it supercedes the previous list for that servable stream. The Manager unloads any previously loaded versions that no longer appear in the list.
5. Managers: handle the full lifecycle of Servables: loading, serving, unloading servable. Listen to Sources and track all versions; Postpone loading if not ready or unloading until newer version loaded.
NEW distributed serving use-case:
REST API: seamlessly serve ML in web/mobile RESTful microservices.
3). TensorFlow Extended
(Homepage)
TF Extended (TFX) is tensorflow-based general-purpose machine learning platform.
Features
- flexible: continuous training and updating -> higher accuracy and faster convergence.
- Portable:
with TF
with Apache Beam: batch and streaming data processing
- with Kubernetes/Kubeflow: deployment of machine learning.
- Scaleable : local <-> cloud
- Interactive: visualization
Architecture
Tools released
- TensorFlow Transform: consistent in-graph transformation in training and serving
- TensorFlow Model Analysis: scaleable, sliced and full-pass metrics.
- TensorFlow Serving
- Facets: visualization of datasets
Pipeline
- use Facet to analyze data.
2. use tf.Transform
for feature transformation
3. train with TensorFlow Estimator
4. Analyze model with TensorFlow Model Analysis: slice metrics.
5. serving with TF serving
3. TensorFlow High-Level API
1). Colab
(Tutorial)
An easy way to learn and use TensorFlow.
Workshops: some exercises
2). APIs
tf.keras
tf.data
: easy input pipelines- Eager execution: imperative interface to TensorFlow with one command:
tf.enable_eager_execution()
4. TensorFlow Lite for mobile developers
(Homepage)
Feature
- Cross-platform
- Light: core interpreter size: 75k ; with all ops: 400k
Architecture
- Converter
- FlatBuffer based
- faster to mmap
- few dependencies
- pre-fused activation and biases
- weights truncation
2. Interpreter Core
- static memory plan
- static execution plan
- fast load-time
3. operation kernels
- specifically optimized kernels optimized for NEON or ARM
4. Hardware acceleration delegation
- Direct GPU integration
- Android neural network API HVX
5. Quantized Training
- Fine-tune weights
- Estimate quantization parameters
ML Kit: newly announced machine learning SDK exposed to both on device and cloud powered API.
ops and model support:
- ~50 common op
- allow custom ops
- now only limited to inference ops
- support models: MobileNet, InceptionV3, ResNet50, SqueezeNet, DenseNet, InceptionV4, SmartReply, quantized version of MobileNet, InceptionV3
Usage:
Pipeline
- convert to TF Lite format:
- use frozen graphdef or SavedModel and avoid unsupported operators; write custom operators for any missing functionality.
- visualize model to check
from tf.contrib.lite import convert_savedmodel
convert_savedmodel.convert(saved_model_dir=”/path/to/model”, output_tflit=”model.tflite”)
2. write custom op
TfLiteRegistration reg = {
.invoke = [](TfLiteContext* context, TfLiteNode node) {
TfLiteTensor* a = &context->tensors[node->inputs->data[0]];
a->data.f[0] = M_PI;
return kTfLiteOk;
}
}
3. C++ API
- load model
std::unique_ptr<tflit::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(“model.tflite”)
- register ops
tflite::ops::builtin::NeededOpsResolver minimal_resolver;
- builder interpreter
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(model, minimal_resolver)(&interpreter);
- execution
// feed input
int input_indxe = interpreter->inputs()[0]:
float *intput = interpreter->typed_tensor<float>(input_index);
// … fill in the input
// run inference
interpreter.Invoke();
// read output
//…
- Python API
- Java API
- Android APP gradle file
- iOS CocoaPods
5. Distributed TensorFlow training
(Homepage) (DistributionStrategy)
1) Data parallelism
Async parameter server
Sync Allreduce Architecture: next round computation wait until all worker received updated gradients.
Ring Allreduce Architecture: Fast
Use parameter server if a number of less-powerful devices such as CPU; use Sync AllReduce if fast devices with strong communication links., like GPU and TPU.
Input pipeline bottleneck
Solution: tf.data.Dataset
API.;parallelize file reading and data transforming; prefetch (dataset.prefetch(buffer_size=1)
) to decouple the time of data produced and consumed (prepare data when accelerator is still training).
initial
parallelization
def input_fn(batch_size):
files = tf.data.Dataset.list_files(file_pattern)
dataset = tf.data.TFRecordDataset(files, num_parallel_reads=40) # num of cpus
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(NUM_EPOCHES)
dataset = dataset.map(parser_fn, num_parallel_calls=40)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(buffer_size=1)
return dataset
Fused transformed ops
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000, NUM_EPOCHES))
dataset = dataset.apply(tf.contrib.data.map_and_batch(parser_fn, batch_size)
)
Multi machine distributed training
use Estimator train_and_evalute
API which use async parameter server approach.
2). Model parallelism
3). scaling to multiple-gpu in TensorFlow
distribution = tf.contrib.distribute.MirroredStrategy() # mirrored strategy for multi GPU distribution
run_config = tf.estimator.RunConfig(train_distribute=distribution)
classifier = tf.estimator.Estimator(
model_fn=model_function,
model_dir=model_dir,
config=run_config)
classifer.train(input_fn=input_function)
Mirrored Strategy:
Implement Sync Allreduce Architecture and model parameters mirrored across devices.
- no change to model/training loop
- no change to input function (require
tf.data.Dataset
API) - seamless checkpoint with summary
TPU, Keras API, multi machine Mirrored-strategy is in working.