Tensorflow — day 1+2: High level APIs

Nguyễn Văn Lĩnh
datatype
Published in
7 min readFeb 15, 2018

After 2 days of starting to touch the Tensorflow framework. I am starting to write my experience recap. Of course, it follows the tutorial of tensorlow but I try to make figures and important points for new comers as me (simple/easier to understand).

Installing the simple way

Anaconda2 + Python 2.7 is my choice as installing in Anaconda create a isolate environment, any update or remove packages will not affect python libraries. Moreover, I also enjoy the newest Jupyter version.

Currently, the default version of Tensorflow is 1.4. In order to run all official tutorial of Tensorflow, we need the Tensorflow 1.5 → just upgrade.

In my case of Ubuntu, Python 2.7, use GPU, already have Tensorflow 1.4 by default of Anaconda2. Quite simple one command.

pip install --ignore-installed --upgrade \
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp27-none-linux_x86_64.whl

But the CUDA 9.1 is the lastest drive version, if you install Tensorflow 1.5, it only can use CUDA 9.0. Remember to downgrade/relink the Cuda and recheck by

nvcc --version

Follow the first tutorial of Tensorflow in offical website

I follow this link

However, It is a bittle tedious to learn a new framework (coding style) by try to run it by the command line and start to read the source code.

Briefly, this tutorial shows how to classify 3 types of iris flowers (Setosa, Versicolor. Virginica) in the legendary Iris data. This data contains 120 training and 30 testing samples, respectively. Each data sample is described by 4 features: sepal length, sepal width, petal length, and petal width. Finally, the classifier is created by two layers dense network.

This tutorial will try to make a simple explaination for provided source code. Of course, it is coded in high-level Tensorflow APIs, so we will get to know the premade estimator and compare with the custom estimater coding approach.

Tensorflow notices

Firstly, Tensorflow frameworks will DEFINE FIRST, RUN LATER.

  • All variables definition, computational operators are need to be defined first. There are no returning values if calling them as in Numpy or Python.
  • The execution step will feed the data into the model definition, run it, and return the output.
  • In the high level api, you don not need to consider explicit numbers training samples, matrix/tensor size in neural network layers. Tensorflow will do/arrange it automatically. You need to input the hyper — parameters such as batch-size, training steps, hidden units number.
  • All input data need to be transform/casting into the “tensor” form. Luckily, the tf.feature_column also provide functions to transform categorical/other kind of input features to numeric/hashed data.

Secondly, there are 3 core structure to make the input Tensor to follow our navigation:

  • tf.data.Dataset: take the raw data and produce “Tensor” datatype.
  • tf.feature_column: a general input feature handle, to transform many kind of input data, inform its type to the machine learning model to make sure the model is unchanged to handle new input data.
  • tf.estimator: model definition and do the train/evaluate/predict the model.

If defining that Food = Data, Box = Tensor, Food type = data type, then this figure describes the relation among them.

The relation of 3 main Tensorflow structure

[premade_estimator.py] Analyses the program flow

[premade_estimator.py] Running flow chart

There are 3 main steps, clearly only the tf.estimator involve in two.

  • Data pipeline
  • Model definition
  • Model execution

Take a lot at the Data pipeline, after get the whole data in tuples, we need to process into the training ( train_input_fn()) or testing data (eval_input_fn()).

(train_x, train_y), (test_x, test_y)

The train_input_fn() takes the (train_x, train_y, args.batch_size) as input. Obviously, tf.data.Dataset.from_tensor_slices plays the most important role to create TensorSliceDataset from a dict of features (train_x). As the train data is 120 samples, we need to repeat it to get enough data for batches.

#Source: #https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.pydef train_input_fn(features, labels, batch_size):    
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) # Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(1000).repeat().batch(batch_size)

# Return the dataset.
return dataset

The eval_input_fn() is almost the same, except it needs to declare whether the data label is available or not. This aims for the case of predicting labels for new data or simply for the testing purpose (of course data labels are provided). As no training in this step, no need for shuffle or batch the data.

#Source: #https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.pydef eval_input_fn(features, labels, batch_size):    
"""An input function for evaluation or prediction"""
features=dict(features) if labels is None:
# No labels, use only features.
inputs = features
else:
inputs = (features, labels)
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices(inputs)
# Batch the examples
assert batch_size is not None, "batch_size must not be None"
dataset = dataset.batch(batch_size)
# Return the dataset.
return dataset

Model definition: The high level APIs with premade models from Tensorflow makes a simple job for model definition: we need to do 2 steps:

  • Tell the type of data input or what inside the boxes, here the input is numeric data, just tell tf.feature_column.numeric_column with the feature names. Look a bit strange, but remember that we already make the dict(features) in the data pipeline. It means the data will be access by the key name (for e.g: train_data[‘PetalLength’] ). Then, along with the feature type, we need to provide its name (or key). Noticed it is the formal definition, no data passing yet.
# Feature columns describe what is the input look like.
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
  • Call premade Tensorflow classifier, only need to pass my_feature_columns to tell about the feature type and name, and hyper parameters setting.
# Build 2 hidden layer DNN with 10, 10 units respectively.
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 10 nodes each.
hidden_units=[10, 10],
# The model must choose between 3 classes.
n_classes=3)

The model training part is also short with premade model, the lambda notation allows us to passing input variables to a function (iris_data.train_input_fn) inside a function call (classifier.train). The actual data goes through this step to the model definition and make all of them running.

# Train the Model.
classifier.train(input_fn=lambda:iris_data.train_input_fn(train_x, train_y,args.batch_size),steps=args.train_steps)

The evaluation part is elegant as well, just calling classifier.evaluate

# Evaluate the model.
eval_result = classifier.evaluate(input_fn=lambda:iris_data.eval_input_fn(test_x, test_y,args.batch_size))
print('\nTest set accuracy:{accuracy:0.3f}\n'.format(**eval_result))

Premade estimater vs Custom estimator

Finally, the custom estimator example helps us to understand the whole tf.estimator procedure above. We continue follow the official document from Tensorflow

[custom_estimator.py] Variables flowchart
  1. Build network structure
def my_model(features, labels, mode, params):net = tf.feature_column.input_layer(features,params['feature_columns'])for units in params['hidden_units']:        
net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
  • tf.feature_column.input_layer makes the Input layer from input data: actual data features and the data ‘schema’ (params[‘feature_columns’]) or our my_feature_columns. It creates the first layer of the net layer with only input data.
  • net = tf.layers.dense(net, …) is add another layer into the top of the current net layer. As high level APIs, it will automatically arrange the weight matrices and initialization.

2. Data forwarding

# Compute logits (1 per class). 
logits = tf.layers.dense(net, params[‘n_classes’], activation=None)
  • Call tf.layers.dense again to get the network output after fowarding input data through layers. Noticed that the logits are numbers, not the class labels.

3. Prediction, evaluation

Active by the mode, for e.g:

assert mode == tf.estimator.ModeKeys.TRAIN

because we need to define all steps by ourself, but in one function → a need of part-based source code.

Indeed, as the data forwarding and getting the output firstly, we can use this result for the prediction and evaluation task.

predicted_classes = tf.argmax(logits, 1)
  • Convert logits numbers to classes, each sample have 3 logits number (for Iris data according to 3 classes), which classes get the highest logits is the predicted class.
  • From the logits we define the loss functions by compare with the input labels. The loss is the criteria for training purpose, get the gradient, backpropagation.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
  • We still need to define the evaluation criteria to known the performance of current model with the data.
# Compute evaluation metrics. accuracy = tf.metrics.accuracy(labels=labels, predictions=predicted_classes, name=’acc_op’)

4. Training network

After getting the loss definition, call a tf.train method, and minimize the loss :)

# Create training op.    
assert mode == tf.estimator.ModeKeys.TRAIN
optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

The rest source code is quite clear, you can remember by look at the figure.

Recap

  • Tensorflow takes longer time to learn and reading the source code. Nevertheless, more well structure, a little bit “strange” logic but define first and run later is more optimal in term of whole data processing pipeline.
  • High level APIs provides a lof of flexible to interact/transform data as well as define/run the model.
  • It is also easier than Torch and PyTorch in term of GPU, Cuda setup.

--

--