Demystifying Tree Convolution networks for query plans

5 min readOct 1, 2020

Image depicting multiple layers of tree convolution in a model. All puns intended.

TL;DR

Tree convolution networks are fantastic for learning representations from the logical plans of standard SQL queries. This post covers their concepts and shows how we can implement them on Tensorflow.

In his presentation at AIDB @ VLDB 2019, Ryan Marcus introduced the notion of an inductive bias and explained of why deep learning networks with strong inductive bias tend to perform better than their fully connected (FC) counterparts. The key idea here is that inductive bias is induced through the choice of network design. If we want a network to perform well on an image recognition task, then having convolution layers before our FC layers will force the FC layers to learn generalizable functions from the convolved data. Without this bias, our FC layers are prone to over fitting and will perform poorly on data that is unseen.

A good inductive bias is pivotal to the success of a wide range of deep learning models. We’ve covered the standard convolution based networks but there are also recurrent networks, attention based networks and … surprise surprise … tree convolution networks (TCNN). TCNNs shine for tasks that require learning representations from tree-like structures. For example, we can represent source codes as abstract syntax trees or SQL queries as logical plans. If we want a deep learning model that can associate a given query with certain runtime performances, TCNNs would be a good direction to explore.

The Theory

Mou et al. [1] presented an anchoring paper for TCNN. The uniqueness of TCNNs is in the use of triangular kernels, rather than rectangular kernels in image-based convolution, that are good for identifying hierarchical arrangements between parent and children nodes. Various works, such as Marcus et al. [2], have extended this concept by exploring how TCNNs can compliment reinforcement learning algorithms for a query plan selection. His paper has been the focal point of my recent work in developing pipelines for systematic query-resource forecasting and resource provisioning in the cloud.

In his paper [3], Marcus et al. demonstrated how TCNNs can be used for the task of cost prediction using **Neo**, an end-to-end pipeline that takes in an encoded query plan and returns a cost valuation.

Tree convolutions are nothing out of the ordinary. They share very similar principals with standard image-based or graph convolution. Each attempts to present the network with an inductive bias that teaches it to learn features from the spatial / relational patterns within the data. Yet TCNNs are especially important for models to learn good representations of query plans, since the choice of operator ordering and execution within a plan has very strong correlation to the final runtime performance of a query.

Implementing TCNN networks

More attention will be devoted towards explaining how we can implement TCNNs on popular Deep Learning frameworks. Literature on TCNNs are plenty and I’ve cited some under references. No point diving too deep in the theory here.

We know how TCNNs work … so … how do we get it to work for our model? Last I checked, popular Deep Learning frameworks such as Tensorflow and PyTorch do not have libraries for TCNN integration for model building. They do, however, provide support for standard convolution using rectangular kernels. That got me thinking if we could compliment such in-built features with proper restructuring of input data in order to bring to life TCNNs.

The biggest challenge here is in the fact that each convolution layer shrinks the overall size of the tensor. However, by definition in [1], we need to preserve the shape of the tensors after each convolution for TCNNs.

For now, I’ll present one such way that relies solely on Tensorflow’s tf.gather operator and some transformations of the data. We will use an N x K matrix as input representing a query plan of N nodes and K features. By virtue of TCNNs, each transformation must return a matrix of similar N x K shape.

Step 1: Label encode query plan

We assigns each node with a unique integer (right). The new tree has the same structure as the original (left)

Starting with the original tree structure, convert it to a binary tree by adding null nodes to each parent with < 2 children.

Thereafter, traverse through each node and assign each a unique integer. Reserve the value 0 for null nodes.

Step 2: Build the reshape array

Secondly, navigate through the tree and stack each node 3-way. Reserve the top 3 indices for 0. We end up creating a 3(N+1) x 1 vector.

Tensorflow has the following definition of tf.gather method call

tf.gather(tensor, array, …) -> tensor

This meant that by passing in an array of indices, we are able to reshape our input tensor into a new tensor that is prime for convolution. We build this array using a 3-way stacking process, in order to pass them through TCNN filters with kernel sizes of 3 (since they are triangular).

We first set the top 3 entries as 0. We then begin with the Root node #1 and create an entry of [1,2,0]. We then traverse node #1’s children. For node #2, we create a subsequent entry [2,4,0] (since 2 only has a left child, we add a 0 for its right child). We repeat this process for all nodes that are non null. As a result, we would expect out vector to be of shape 3(N+1) x 1, where N is the number of nodes in the original tree.

Step 3: Apply standard convolution with kernel size 3 & stride 3

Phase 1: Vertically stack embedding from the original tree in breadth first manner. Reserve the top row for 0s.

Phase 2: Apply tf.gather over tensor and reshape vector, as covered in Step 2. The reshape vector rearranges the original tensor based on each row’s index.

Phase 3: Apply standard convolution Kernel with stride of 3 and row size of 3. The resultant matrix should be condensed to the shape of the original tensor. The value f refers to the number of convolution filters used.

Step 4: Reuse the reshape vector for each convolution layer

Repeat Step 3 for multiple layers of convolution in the model. We can reuse the reshape vector for each instance of convolution.

Opensource codes

Lastly, I would like to share an implementation of TCNN via PyTorch. The implementation covered in this article is very similar to the implementation from the code base. Do check out the link for more information.

References

[1] Mou, L., Li, G., Zhang, L., Wang, T., & Jin, Z. (2014). Convolutional neural networks over tree structures for programming language processing. arXiv preprint arXiv:1409.5718.

[2] Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2020). Bao: Learning to Steer Query Optimizers. arXiv preprint arXiv:2004.03814.

[3] Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., & Tatbul, N. (2019). Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711.