A simple intro to keras for tensorflow 2
The Keras project’s aim was to create a simple programming interface for humans. The fact that tensorflow now includes a version of keras in its own library can be taken as evidence that this was a success.
This is a brief introductory guide to the Keras workflow and the three API styles for model design in particular. See this other post for a step-by-step tutorial.
The Keras workflow
In Keras, training a ML model should feel similar to using a sklearn classifier. The API provides methods to design, compile, fit and evaluate simple and complex machine learning architectures. The beauty of Keras arises from its simplicity. The typical structure of a training process is:
- Design a model architecture
- Compile the model with an optimizer, loss and a choice of metrics
- Train the model for a given number of epochs
Lets have a look at some real code to see this structure in action (from the tensorflow website):
In the example above, a very simple model architecture was used. Depending on the needed level of complexity, there are three ways of designing the model (so-called API-styles):
The Sequential API
In this approach, we first instantiate a tf.keras.models.Sequential class, then we simply add all the layers we want to create the desired architecture. See the example below for a simple convolutional architecture.
After defining the input size, all subsequent sizes are inferred from the applied operations. In this example, the max-pooling layers half the image size, whilst the convolutional layers reduce the size along the x and y axis by half the kernel size on each side due to no-padding.
To verify the model structure, the model.summary() command can be used. It presents the following summary:
Model: "Keras Test CNN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 576) 0
_________________________________________________________________
dense (Dense) (None, 64) 36928
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________
Looking at the “output shape” column, we can confirm our calculations, as each conv2d layer reduces the x and y dimension by 2 (kernelsize != 3) and each max_pooling2d layer halves them. Did you notice how the first dimension of the output shape is always None? This is because it will be set at compilation time by the batchsize.
The last column “Param #” indicates the number of trainable parameters this layer adds to the model. This can be a very important metric to estimate the model complexity.
Although the sequential style is very convenient to use, it does not allow to build more complex architectures like residual networks for example. To allow for more control, the functional API can be used.
The Functional API
Layers can be seen as functions or callables, which take a tensor and yield a tensor. Following this logic, the machine learning model can also be created from linking layers by feeding their return values into each other. By doing so, we create a directed graph along which the tensordata flows.
This allows the model designer to build residual connections, shared layers, and even multiple inputs or outputs. In the example below a simple residual network is defined.
As you can see in line 9, the output of block 2 is the sum of the input x and the result of the two convolutional layers. This would not be possible to implement with a sequential approach.
Note how you define a keras.Input item which is very similar to tf.placeholder. In the end, the model instance is constructed from the input layer and the output layer which is at the leaf node of the defined graph.
Subclassing
The climax of complexity is subclassing. This OOP-pattern allows you to implement your own Layers and modify every behaviour of your model. This can be done by writing a class which extends keras.layers.Layer and implements all necessary methods.
You can combine subclassing with a sequential or functional API style!
After implementing a custom layer, you can simply chain it into your functional/sequential style model and use the default methods for conveniance!
Conclusion
All in all, the Keras workflow makes it easy to design, train and evaluate models in tensorflow. With the use of the functional API and subclassing, more complicated architectures can be realized.
Thanks for reading! If you enjoyed this, please clap a few times :)