A data scientist’s take on Go

An Experience Report

Raanan Hadar
6 min readSep 23, 2020

Background, motivation and significance:

Since a lot of successful cloud software is written in Go, it has a huge potential for solving a very significant problem aptly called “the Infinite Loop of Sadness”[1]. I predict that if Go were to break out into data science, It would enable an extremely powerful synergy that can boost productivity between data scientists, data engineers, operations and business, not unlike what node.js did for full stack development.

The tragedy of my relationship with Go is that I love its design, philosophy and community but don’t use it as a major day to day language because most data scientists don’t use it as a tool of the trade. This is because Go has some issues which will be discussed here.

Some really smart people have put their time and minds into this[2][3][4]. So, others think its a significant problem too! I do believe that this report is worth your time and that in my exploration, I have singled out and discussed some subtle yet important details, that will better refine the problem, ultimately leading to a better solution.

Scope and purpose of this work:

The main area of focus was chosen to be low level numerical libraries for applications such as signal and image processing. This is where I believe Go has the best potential to shine as it can rapidly produce light, performant and easy to deploy software.

For now, I have chosen not to look into areas such as Deep Learning research and your run-of-the-mill Machine Learning (think scikit-learn), where python has too much of a head start to consider realistically.

Rather than proposing a design in one go, pardon the pun 😄, the purpose of this work is to identify and explain the problem[5]. My goal is to get the ball rolling. Understanding the problem better, is a worthwhile goal in itself.

What is previously known about the problem:

My initial suspect for the source of the problem was lack of a proper multi-dimensional slice data type in Go, similarly to what [2] and [3] suggested. In [4] Griesemer, one of the founders of Go, suggested that this is more of a syntax issue.

Approach

Throughout this investigation, I made special care to focus mostly on the problem and attempt to filter out any discussion of solutions until the very end. This will help in getting a more unbiased understanding of the problem.

I took the wisdom I gained from the community gods who tackled this problem before me, and went on to gain more experience and report about it: by implementing a toy matrix library similar to gonum and trying to simulate the most common use-cases that will arise from day to day work, borrowing heavily from [2]. Here is what I learned…

What does the user need?

The user will need to perform map/reduce/filter operations on data. The data has different internal implementation and memory layout, one common example is being multi-dimensional. Efficient and convenient access in multiple dimensions is the primary problem addressed by [2] and [3].

New to this experience report, is another critical requirement: it is extremely common in numerical computing to refer to a variable index by using multiple types:

  • By position: this classic method was the most discussed one by [2][3][4] and can be a multi-dimensional index.
  • By index mask — given a mask containing an array of a few integer indices, this allows us to operate only on the masked slice of the data.
  • By boolean mask — similar to the above, with the difference that the mask is a boolean and has length equal to the number of elements in the data structure. This allows to operate only on a masked slice of the data that is the result of a logical filtering operation.

This should be achieved while having a regular and readable syntax. This is a quintessential part of the user’s workflow and is done very frequently in algorithms. Go was designed to be a programming language that a CS major can pick up quickly and effectively. Most data scientists are graduating from languages such as MATLAB, Python and R and are expecting this behavior.

Illustrating index types, in Python:

Here is my attempt of achieving something similar in Go for a 2D matrix. For the sake of keeping this post short, I omitted the boolean mask, because it has the same implications as the index mask:

Lets use this in a simple program:

We can see that the syntax is very similar to gonum. If I wanted to achieve a single uniform interface, similar to other languages, this is perfectly feasible in Go by using an operator pattern by combining an interface{} and a variadic function. Its not pretty, but effective (Dear Go gods, this is an MVP, please don’t judge harshly 🙏):

  • This is a valid example where methods can benefit from additional type parameters, but currently type parameters for methods are not part of [6]. As you can see, I’ve managed without it.
  • In summary, we have implemented a getter/setter interface that met all of my requirements while maintaining a relatively compact signature: foo.At(pos) and foo.Set(pos,bar) for multiple types of inputs.
  • My interim conclusion is that I agree with Griesemer’s claim in [7]: As far as functionality goes, I can implement all of my requirements in Go: It is good enough. I do not think a new data type is needed.

So what is the problem?

My main problem is that writing code that implements such functionality, involves a different syntax which is inconsistent from the day to day Go syntax:

1. The difference in syntax effectively creates a different dialect of the language which is against Go’s design goal of having one good way to achieve something.

2. As the keywords used in foo[index] = bar are more specific and powerful, the increased semantic meaning leads to an efficient expression that requires less keywords and is more readable.

  • When I see the statement foo.SetMask(mask,bar), I parse it in my mind in the following way: what variable? foo -> what operator? -> SetMask -> whats does the first argument in SetMask do? -> where -> where? -> mask -> what does the second argument in SetMask do? -> what is assigned -> what is assigned? -> bar
  • Now, this interpretation comes in contrast to most Go code which has a different syntax all-together: foo[index] = bar which is parsed in the following way: what variable? -> foo -> where? -> index -> what is assigned? -> bar

Simpler right? This has a much more elegant syntax and readability. Notice that for such a simple statement, the first parsing requires 6 “questions” to parse, while the second one requires only 3.

Needless to say, this does not scale well for more complex algorithms. Based on [2], here is a simple illustration illustrating how the syntax becomes confusing for multi-dimensional cases:

Its interesting to observe that multi-dimensional slices and accessing the array by multiple parameter types, though different in functionality are two different facets of the very same problem: the syntax for accessing the index of variables, when using multiple arguments, is very different and less efficient than the standard Go syntax.

Are index operator methods “good enough”?

I originally started this exploration with the mindset of understanding the problem first, not pushing for a specific solution. Once I understood the problem well enough, I considered following with a proposal of my own, knowing it will probably crash and burn by the Go community 😆

In [4] Griesemer reflected on [2] and proposed an alternative solution: Index operator methods:

Griesemer’s solution seemed to click. He just needed confirmation that he was solving the right problem. As he said in the lecture:

I’ve found that these index operators were so effective and in fact so cheap to implement, syntactic sugar really, in addressing this specific problem, that I was wondering whether is it really maybe all that we need?

I did not originally plan this, but now, that I understand the problem better (thanks Russ Cox!), I think that Griesemer’s solution is sound. It solves the heart of the problem detailed in this experience report by providing a standard syntax for operating on indexes, that is coherent with the rest of the Go language. This experience also highlights a use-case where implementing type parameters on methods, can bring added benefits, but this is a different discussion related to generics. In summary, I will be extremely happy if it were implemented as it was proposed.

It is good enough for me.

Thank you

References

[1] Josh Wills, DataEngConf2016, “Data Engineering and Data Science: Bridging the Gap”, https://www.youtube.com/watch?v=EtYv7zPyS2A

[2] Brendan Tracey et al., (November 17, 2016), “Proposal: Multi-dimensional slices”, https://go.googlesource.com/proposal/+/master/design/6282-table-data.md

[3] Ian Lance Taylor, (November 14, 2015), “Proposal: spec: strided slices”, https://github.com/golang/go/issues/13253

[4] Robert Griesemer, dotGo2016, “Prototype your design!”, https://www.youtube.com/watch?v=vLxX3yZmw5Q

[5] Russ Cox, GopherCon2017, “The Future of Go”, https://www.youtube.com/watch?v=EtYv7zPyS2A

[6] Ian Lance Taylor and Robert Griesemer, (September 09, 2020), “Type Parameters — Draft Design”, https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md

[7] Robert Griesemer, (September 1, 2020), Github, https://github.com/golang/go/issues/41129

--

--