# Poly-fit using TensorFlow and Go

## Problem Statement

Given a set of data points we want to find a polynomial that best describes it and do so using TensorFlow and Go.

## Description

There are several applications of polynomial fit (or poly-fit for short) in engineering and other disciplines. A typical scenario for using poly-fit is to identify trends in the data and make future predictions. It can also be used to de-trend the data by subtracting predicted value from the raw data points.

Poly-fit is very common and is supported by several higher level languages such as Matlab. This post, however, is about achieving the same using TensorFlow and Go. The primary reason to use TensorFlow is to leverage its strong matrix computation facilities. In conjunction with Go we can then achieve a binary that can be shipped with minimal dependencies.

Furthermore, many functions in TensorFlow seamlessly enable hardware acceleration and the system is portable across several platforms including edge-computing devices such as Raspberry PI.

Here is a matrix inversion benchmark performed using a binary produced by TensorFlow and Go. As you can see the GPU acceleration is automatically enabled and the same binary just runs faster in presence of a GPU.

Go has its own advantages for building software systems, however, I won’t go into those details here. Let’s now look at various components required to perform polynomial fit in TensorFlow and Go.

## TensorFlow Graph

One of the modes of operation in TensorFlow is the so-called deferred execution mode. A `graph`

is first built describing nodes and edges, where nodes denote the operation and edges denote the flow of the data between nodes. Execution of the graph is then done in the subsequent step, hence the name “deferred”.

This approach has several advantages. First, TensorFlow can run computations, defined in graph, in an optimized way depending on the hardware. Second, the graph is portable and language agnostic allowing one to design the graph in one language (say Python) and execute it in another (say Go).

So let’s define the graph with key inputs and output that we would need to perform polynomial fit. We can do this in Python since expressibility is high and makes designing graph easy. The math behind polynomial fit is described here. The `matlab`

code defined in the link easily translates to a TensorFlow graph. Let’s define key operations:

Matrix inverse:

import tensorflow as tf# inputs

buff = tf.placeholder(dtype=tf.float64, name="buffOp")

shape = tf.placeholder(dtype=tf.int64, name="shapeOp")# inverse of matrix

inv = tf.reshape(buff, shape=shape)

inv = tf.linalg.inv(inv)# output

inv = tf.reshape(inv, shape=[-1], name="invOp")

As you can see the steps above define matrix input in the form of `float64`

array along with the shape definition in the form of `int64`

array. Inverse is then defined over those inputs and finally an output is produced in the form of `float64`

array.

Similarly, we need a few more operations in the graph.

`QR`

decomposition:

# qr decomposition

qr = tf.reshape(buff_1, shape=shape_1)

q, r = tf.linalg.qr(qr, full_matrices=False)

q = tf.reshape(q, shape=[-1])

r = tf.reshape(r, shape=[-1])# output

tf.identity(q, name="qrdecomp_q_OP")

tf.identity(r, name="qrdecomp_r_OP")

Matrix multiplication:

# matrix multiply

mat_x = tf.reshape(buff_1, shape=shape_1)

mat_y = tf.reshape(buff_2, shape=shape_2)

mat_xy = tf.matmul(mat_x, mat_y)# output

tf.shape(mat_xy, out_type=tf.int64, name="mulShapeOp")

tf.reshape(mat_xy, shape=[-1], name="mulOp")

Finally, we can export such graph as a protocol buffer file:

`# finally save the graph to be used in Go code later`

graph = tf.Session().graph_def

tf.io.write_graph(graph, "./model", "graph.pb", as_text=False)

## Go Interface

Now that we have the graph, we can import it in Go code and deploy on systems as a binary. Let’s first define the interface:

`// Operator is a matrix operator`

**type **Operator **interface **{

io.Closer

New(buf []float64, size ...int) (*Mat, error)

Transpose(mat *Mat) (*Mat, error)

Mul(x, y *Mat) (*Mat, error)

Inv(mat *Mat) (*Mat, error)

Qr(mat *Mat) (*Mat, *Mat, error)

}

We define a 1:1 correspondence between nodes in the graph and the methods in Go interface. The implementation of `Operator`

interface will invoke TensorFlow Go API and feed the contents of `*Mat`

to the `placeholder`

values.

A matrix type is defined simply as:

**type **Mat **struct **{

Buf []float64

Size []int64

}

## Go Implementation

In order to implement the `Operator`

we need to access the nodes defined in the graph using `string`

literals. Accessing nodes via `string`

literals is often a source of nasty runtime errors, so it is best keep these private to a package and define once as constants.

**const **(

// inputs

*inputBuffer1 *= "buff1"

*inputShape1 *= "shape1"

*inputBuffer2 *= "buff2"

*inputShape2 *= "shape2"

// ops

*invOp *= "inv"

*qrOpQ *= "qrdecomp_q"

*qrOpR *= "qrdecomp_r"

*transposeOp *= "transposeOp"

*mulOp *= "mul"

*mulShapeOp *= "mulShape"

)

Now we have most of the pieces required to import and execute TF graph in Go. Various methods defined by `Operator`

interface can now be implemented using three steps:

- Convert input
`*Mat`

to tensors and prepare a`feed`

dictionary - Execute call to
`session`

run - Pack output into
`*Mat`

To convert `*Mat`

to tensors we use:

import(

tf "github.com/tensorflow/tensorflow/tensorflow/go"

)func(op *Op) Mul(x, y *Mat) (*Mat, error) {

bufT, err := tf.NewTensor(x.Buf)

iferr != nil {

returnnil, err

}

...

}

Matrices can then be collected using a map:

`feeds := make(`**map**[tf.Output]*tf.Tensor)

feeds[op.GetGraph().Operation(*inputBuffer1*).Output(0)] = bufT

And fed to TF session as follows:

`out, err := op.GetSession().Run(`

feeds,

[]tf.Output{

op.GetGraph().Operation(*mulOp*).Output(0),

op.GetGraph().Operation(*mulShapeOp*).Output(0),

},

nil,

)

The steps above were shown only to communicate the workflow. Full code can be found online here.

## PolyFit Implementation

Finally, we can now begin to put together higher level code for performing polynomial fit on input data. The function signature could look something as follows:

`// Fit fits a polynomial of order order on input data x and y.`

**func **Fit(x, y []float64, order int) ([]float64, error) {...}

In order to implement it we have to setup a system of linear equations based on the order of the fit. This is done by expanding input `x`

as a matrix `[x^n, x^n-1, ..., 1]`

for every observation in `x`

.

`m := make([]float64, len(x)*(order+1))`

k := 0

**for **i := **range **x {

**for **j := order; j >= 0; j-- {

m[k] = math.Pow(x[i], float64(j))

k++

}

}

We can then build a matrix `M`

from array `m`

. `op`

implements the `Operator`

interface.

`M, err := op.New(m, len(x), order+1)`

Then perform following operations to arrive at polynomial coefficients `C`

. I am ignoring errors for this post, but error checking should be performed at each step. You can relate these steps to the ones shown in this document.

`Y, _ := op.New(y, len(y), 1)`

Q, R, _ := op.Qr(M)

R, _ = op.Inv(R)

Q, _ = op.Transpose(Q)

Z, _ := op.Mul(Q, Y)

C, _ := op.Mul(R, Z)

## Evaluating Polynomial

Polynomial evaluation is very straightforward. We simply need to iterate over all the coefficients raising input elements to appropriate power.

`// Val evaluates input x over polynomial p.`

**func **Val(p, x []float64) []float64 {

y := make([]float64, len(x))

**for **i := **range **x {

**for **j := **range **p {

y[i] += p[j] * math.Pow(x[i], float64(len(p)-j-1))

}

}

**return **y

}

## Building Binary

Finally let’s look at a binary produced with all this code in it. We went through all these steps to be able to arrive at a binary that is portable, shippable and has as few dependencies as possible. Running `ldd`

on the binary shows following dependencies:

`ubuntu-64 $ ldd a.out`

linux-vdso.so.1 (0x00007ffca97c8000)

libtensorflow.so => /usr/local/lib/libtensorflow.so (0x00007f4e1beef000)

libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4e1bcd0000)

libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4e1b8df000)

libtensorflow_framework.so => /usr/local/lib/libtensorflow_framework.so (0x00007f4e1ac0e000)

libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4e1aa0a000)

libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4e1a66c000)

librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4e1a464000)

libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4e1a0db000)

libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4e19ec3000)

/lib64/ld-linux-x86-64.so.2 (0x00007f4e22a49000)

We achieved a binary that has external dependencies on two dynamically linked TensorFlow libraries: `libtensorflow.so`

and `libtensorflow_framework.so`

! As long as version of these `.so`

files matches the TF API used at the time of building the graph we are good.

Similarly, the system is now also portable on ARM based Raspberry PI for any potential edge-computing tasks:

`raspberry-pi $ ldd a.out`

linux-vdso.so.1 (0x7efcb000)

/usr/lib/arm-linux-gnueabihf/libarmmem.so (0x76ecb000)

libtensorflow.so => /usr/local/lib/libtensorflow.so (0x720ac000)

libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0x72083000)

libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x71f44000)

libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0x71f31000)

libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0x71eb2000)

librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0x71e9b000)

libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0x71d53000)

libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0x71d26000)

/lib/ld-linux-armhf.so.3 (0x76ee1000)

## Example Use Case

Here is an example of how these steps work to produce a 2-nd order fit on input scatter data.

## Summary

TensorFlow is great for building Go binaries with sophisticated data processing capabilities. In order to achieve it we have to do following:

- Define graph in Python
- Build an interface in Go to import such graph and feed data at runtime
- Ensure dynamic dependencies are satisfied on the target machine

I hope this post helped to communicate my workflow with TensorFlow and Go. I am passionate about edge computing and always looking for ways to build binaries that are lightweight, portable and can run on devices like Raspberry PI.

## Photo info: Rolling hills of California.

I captured this image a few years ago near Livermore California. That area has fascinating landscapes which aremostly dry and golden in color. However, winter months bring much needed rain and the landscape transforms to lush green… Captured using Canon 5D Mk II and a long lens, then stitched together in a panorama.