GSoC 2018 — VAES III: Summary
This post summarizes very briefly the work done during the GSoC period and probably serves as a report of all the implementations and a transcript of the discussions I had with my mentors. I’m hoping that this will help future contributors get an idea of what’s done and what’s yet to be completed.
As discussed in the previous post, Variational Auto Encoders require the encoded representations to be “close” to each other for meaningful samples to be generated. We enforce this by adding a term to the loss function that penalizes the encoded representation’s distribution if it deviates from standard normal distribution. It is assumed that the output mean and standard deviation vectors are natural logarithms of themselves for numerical computation reasons. The forward function is fairly straightforward. The backward pass function had to be overloaded because gradients flow through two separate vectors. This overloaded function can be used for other similarity measures. For more details on this, refer to Part II.
The second task I took up is the creation of Auto Encoders in TMVA. I defined a new class MethodAE which handles the creation of Auto Encoders. With MethodAE, the architecture of the network can be defined really easily.
Internally, it’s a single
TDeepNet object that’s created. One to-do is the decoupling of the Encoder and Decoder from the DeepNet object. This would enable easier portability and inferences since the end goal is a generative model.
Refactor and Regression support
One other thing I found during the creation of MethodAE is that there’s a couple of duplicate functions across the Method classes. These functions were used for parsing the training strategy string. So, I moved this code to a new file called
StringUtils.h . The functions sit under the TMVA namespace.
I also added regression support for MethodDL and MethodAE. The missing functions
GetMultiClassValues() were the cause for exploding loss during training auto encoders. Two PRs have been submitted, one for the ROOT repository and the other for tmvadnn repository.
Allowing arbitrary padding in convolution causes trouble during backpropagation especially when the stride is greater than 1. All cases of backpropagation/transposed convolution can be covered if we allow only fixed padding types like SAME, VALID etc like Tensorflow does. But, arbitrary padding is still necessary if the user wants to retain the spatial scale or wants to concatenate tensors of different dimensions. So, I’ve implemented a padding layer that takes 4 arguments: left, top, bottom and right paddings. It can be defined like this:
Currently, the matrix is being copied into a padded matrix. The copying is a bottleneck since more efficient ways of implementing this exist. One way of doing this would be to preallocate the matrix and just manipulate the pointers to either point to the “raw” input or the padded input. This can be tricky due to two reasons. One of them is that all the data is actually stored as a flattened version of the matrix. Maintaining pointers to the raw input data can be tough since zeros are pre-filled and appear between the indices of a 1D array. The other reason is that since PaddingLayer2D inherits from
GeneralLayer , it has its own copies of data. Sharing data between two objects in the current design can be little convoluted.
Future Work and Challenges
There’s a fair amount of work yet to be done to fully support Generative models in TMVA. Firstly, a transposed convolution layer has to be implemented. Although the math is a little “messed” up for transposed convolutions, it can actually be implemented as direct convolutions with some padding between the input elements. For any stride greater than 1 for a direct convolution, it’s reasonable to assume that its transpose would have a stride less than 1. To realize this idea of stride being less than 1, we resort to spacing out the input elements. This again presents a problem in the implementation part. Padding with zeros between every input element and copying it into a new matrix is inefficient. The performance would take a huge hit if done naively. Nevertheless, it’s a good idea to start with the naive implementation and then optimize it. Transposed convolutions are useful for backprop in convolutions, VAEs and GANs. I plan to continue my contributions to TMVA after GSoC ends and transposed convolutions is at the top of the to-dos list.
Another thing that’s vital to the future of the DNN module is the graph like layer design. Currently, all the layers are contiguous (each layer stacked on the other). Residual Connections are used in the majority of architectures today. Such connections require a design that supports a graph-like layer structure so that each layer can have multiple inputs to it or vice versa. Implementation of this would lead to more elegant and “permanent” solution to the encoder layer in VAEs.
GSoC 2018 with CERN was amazing, all things considered. It’s such a rare opportunity to contribute to a deep learning library that is in its early stages of development. I learned a lot about the numerical computation issues when dealing with large tensors. I realized that design is one of the most important parts of development. I am looking forward to continued contributions to TMVA and all the learning that it offers.
A huge shout-out to my mentors Kim Albertsson and Lorenzo Moneta for guiding me through out the development process and helping me out with debugging and other technical issues. This project would not have been possible without their help. I know it’s a clichéd thing to say, but it’s true.
I would also like to thank Manos Stergiadis for his help in the decoding of convolution layer. Shout-out to my teammates Harshit, Ravi Kiran and Anushree.
The links to all my PRs are listed here:
KL Divergence: https://github.com/tmvadnn/root/pull/9
Regression Support (tmvadnn): https://github.com/tmvadnn/root/pull/13
Regression Support (ROOT): https://github.com/root-project/root/pull/2278
Padding Layer: https://github.com/tmvadnn/root/pull/15
Thank you :)