Index Operator Methods

A Go2 Proposal

Raanan Hadar
11 min readOct 2, 2020

Would you consider yourself a novice, intermediate, or experienced Go programmer?

The short answer: Sadly, I am not doing my day to day programming in Go. The long answer: I learned Go in 2009, read most good Go books and have been writing Go for the fun of it. The only programming convention I went to was a Go convention. I watched more Go talks than any other programming language and I keep recommending Go to other programmers. I have been a huge fan of the language, its design and its community for many years. In short, one can argue safely that I have experience with Go.

Unfortunately, most data scientists do not program in Go. One may infer that data scientists are bad Go programmers and their proposals are therefore irrelevant. I would argue that instead, the fact that most data scientists do not program in Go, is more of a reason that this proposal should be taken seriously.

What other languages do you have experience with?

My day to day languages are Python and Matlab. I have experience with a dozen more languages that I don’t use in my day to day. My favourite programming language is Go. Yes, more than the first two.

Would this change make Go easier or harder to learn, and why?

  • This change will improve readability.
  • This change will increase the number of data scientists using Go, so there will be more people to teach Go to others.

Hence, this will make Go easier to learn.

Has this idea, or one like it, been proposed before?

Yes. Unfortunately, never officially. The credit goes to Robert Griesemer[1]. There were also two other proposals: one by the Brendan Tracey and the Gonum team [2] and the other by Ian Lance Taylor [3]. The latter two suggested introducing new data types to the language.

If so, how does this proposal differ?

The original suggestion of index operator methods in 2016 by Griesemer was at a time when Go was not open to changes. It was suggested and not officially proposed.

Griesemer noted that he was surprised that index operator methods really helped with readability. I speculate that he was not sure if it solved the right problem or whether his solution was enough.

This proposal is the result of me going through a journey of trying to understand the problem better[4]. I started with past proposals, then, because I could not refute the arguments presented by Griesemer in [5], I put aside all notion of proposing a solution away and went to research the problem by writing an experience report[6]. Only after understanding the problem better, I understood that Griesemer’s direction was the best solution and that it was sufficient in satisfying all of the use cases which I’ve raised in my experience report given the additions below.

There are two additions compared to Griesemer: the argument to the index method in my proposal is not limited to integers but to a declared signature, just like any other Go method. Because slicing via index operator methods that utilize an integer slice or a boolean slice is also extremely powerful. Secondly, I have also added the new range data type which was not in Griesemer’s proposal. This addition allows to implement custom slicing operations.

I would like to state the obvious here: the suggested change to the language spec is originally Griesemer’s[5]. I could not have done a better job and I don’t care nor want to take credit. I’ve been asked to formalize a proposal so the Github issue will not to be closed, so here we are.

Who does this proposal help, and why?

  1. Devs, *Ops and Business:

Data scientists commonly writing bad software leads to friction with other roles and prevents Businesses from delivering high quality data products on schedule. This is a huge and significant problem which was aptly nicknamed the “Infinite Loop of Sadness”[7][8]. Go can solve the infinite loop of sadness in multiple ways:

  • Go is designed to create high quality software that scales.
  • Creating a unified programming language for all roles in the loop can create an extremely powerful synergy, break silos and improve cooperation. Similar to how node.js was the enabler that gave birth to full stack development.

2. Data scientists:

Data scientists don’t write bad software because they are lazy but because their current tools could be better:

  • When making their code ‘production worthy’, data scientists often resort to using specialized tools to accelerate their code. Numba or Matlab Coder are common examples. This requires manually rewriting the code in a specialized static dialect. Data scientists writing in Go can write simple and performant code once and effectively skip this step.
  • Languages such as Python and Matlab create huge packages that are complex to deploy and maintain and are easy to break. Often introducing breaking changes: Python 3.5 is commonly used which has many breaking changes compared to 3.6 and above. Version management in Python for data science often requires deployment of big and complicated package management platforms such as Anaconda. For stand-alone deployment, Matlab requires the entirety of the Matlab library. It is not uncommon for container images with Python and Matlab reaching GB scale and Matlab can often reach 10GB! This often makes such languages very challenging to deploy to hardware that has limited resources.
  • This can lead to long CI/CD iterations, which slows down every day development.
  • In summary, Go’s speed and ease of deployment, quick compile times and single binary deployment can be a huge boon for data scientist’s daily workflow and will result in better software.

3. The Go community:

  • There is an explosion of data scientists in 2020.
  • There is a huge amount of Go software that is used by data scientists everyday.
  • Sadly, most data scientists find it is difficult to contribute to open source Go projects because Go is not their day to day language. I argue that more data scientists using Go will create a huge influx of open source Go contributors.

Is this change backward compatible?

To my understanding, it is backward compatible.

Show example code before and after the change:

As these examples are in the realm of numerical computing, they might be unfamiliar to some of the readers. Each example is first introduced with a short explanation, allowing the reader to better understand the application:

  1. Beginning with a 2 dimensional tensor implemented in a standard numerical library in Go such as gonum/mat or gorgonia/tensor, accessing the element i,j is implemented as follows:
foo.At(i,j)// With an index operator method we can achieve
// something more idiomatic:
foo[i,j]

2. Another common method in numerical computing libraries is setting by a mask. Setting by a mask performs two steps consequentially:

  • Slicing using the mask indices: unlike a standard slice in Go, a mask is often not contiguous. For example, for a one dimensional vector, a mask can be defined as a simple integer slice: mask := []int{1,3,5}
  • We assign a new value to each sliced element.

This common use case enables a batch assignment to sparse elements in the matrix, using a single expression.

The end result is a basic method in Go. To better help readers who are less familiar with numerical computing, I have provided an example implementation in the following gist. The end result has the following syntax:

foo.SetMask(mask,bar)// Implementing this as an index operator method, 
// numerical computing libraries in Go will be able to achieve this // with the following syntax:
foo[mask] = bar

3. The last example is a multi-dimensional slicing operation for a 3 dimensional numerical matrix data type. This operation is also extremely common in numerical computing. Since slicing syntax is not extended to higher dimensions in Go for non-standard data types, numerical libraries in Go have to improvise an alternative syntax.

Chewxy, the author of gorgonia/tensor suggested in [5] an implementation with a slicing range type and a constructing method such as S(). For example S(1,5) is the equivalent of 1:5 and nil is the equivalent of :

When I first published the following example, it did not include this short exposition. Amusingly, the first response by a user in golang nuts deemed this example as invalid Go. This serves as principal evidence to support how distant this looks from what a Go programmer expects Go code to look like:

import (        G "gorgonia.org/gorgonia"        . "gorgonia.org/tensor")// foo is a 3 dimensional tensor in gorgonia/tensorfoo.Slice(G.S(1,2), nil, G.S(1,5))// versus:foo[1:2,:,1:5]

What is the cost of this proposal? (Every language change has a cost).

As you can quickly deduce, programmers are bound to overload the index operator method with multiple type parameters.

This will surely lead to a next generation Go numerical library, that will need to implement a complex type checking mechanism. Developers of this library will probably have a hard time at first. However as this set of parameters is very defined (integers, integer slices, boolean slices and possibly filter functions), this is will converge. Writing a function like fmt.Println() is similar: Its not trivial as there are many input types, but there is a lot of power in the end result. Ultimately, the users of these libraries can hide this complexity behind an elegant and powerful interface, with Go having its numpy moment.

Secondly, I argue that this is very similar to how interface{} can be abused as a function argument in Go1. This is syntactic sugar to an existing managed risk in Go’s design choices. As Go developers learn early on to respect and not to abuse the former, I predict that they will learn the same with the latter.

The biggest cost is the engineering cost required to implement the new range data type. It mostly involves changing the standard Go toolkit not to panic at the different syntax.

How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?

  • Since go methods have to start with letters, this will require changes to parsers to permit the [] and []= method names as valid.
  • Adding the range data type is probably going to affect vet and gofmt.

What is the compile time cost?

Most of the implementation is syntactic sugar, the parsing of an additional data type might be slightly more complicated but I believe this incurs little to no compile time cost relative to other features.

What is the run time cost?

Assuming that the programmer overloads the method in both scenarios: The implementation is syntactic sugar, so there is minimal runtime cost.

Can you describe a possible implementation?

Griesemer implemented an initial prototype in [1].

Do you have a prototype? (This is not required.)

I humbly refer you to the above.

How would the language spec change?

The first change is the introduction of Index operator methods. They are implemented by enabling the optional declaration of two additional methods [] and []= for example:

type Matrix[T Numeric] struct ...
func (m *Matrix[T]) [] (...) T ... // getter
func (m *Matrix[T]) []= (...) ... // setter

Now there is one big problem left to solve: we still need to enable slicing behaviour similar to foo[:] = bar while also supporting multiple slicing operations such as foo[1:5,:] = bar

My proposed solution is to propose a new data type in Go called range. It has the following qualities:

  1. Valid examples of range are arguments used today in Go slices. For example: 5, :, 1: and 1:5 are all valid examples of valid range values.
  2. It can only be used as an argument to an index operator method. This will prevent notation abuse and help maintain orthogonality (see discussion of orthogonality).
  3. range will have 2 methods: start() and end():
func (r range) start() int, error
func (r range) end() int, error

Which will perform as follows:

5.start() will return 5, nil
5.end() will return 5, nil
1:5.start() will return 1, nil
1:5.end() will return 5, nil
:.start() will return 0, Error
:.end() will return 0, Error
1:.start() will return 1, nil
1:.end() will return 0, Error
:5.start() will return 0, Error
:5.end() will return 5, nil

Optionally, it is debatable whether to add a len() method to range.

This will enable implementing slicing on custom data types, for example:

type Matrix[T Numeric] struct ...
func (m *Matrix[T]) [] (i, j range) T ... // getter
func (m *Matrix[T]) []= (i, j range, x T) ... // setter

4. Most importantly, with range being a reserved keyword in Go, we avoid the introduction of new types that may have been used in someone’s code and thus, this implementation maintains Go1 compatibility!

Orthogonality: how does this change interact or overlap with existing features?

I argue that this improves on Go’s orthogonality. I believe this is one of the main benefits resulting from this proposal. Currently, the common way to perform getting, setting and slicing in Go is:

foo = bar[i] // getting
foo[i] = bar // setting
foo[:] // slicing

For complex data types that require multiple arguments, such as numerical libraries, we have a different syntax:

foo.At(i,j) // getting
foo.Set(i,j,bar) // setting
foo.Slice(nil) // slicing

When it comes to readability, this doesn’t scale well for multiple dimensions and complex data types.

The difference in syntax effectively creates a different dialect of the language which is against Go’s design goal of having one good way to achieve something. As the keywords used in foo[i] = bar are more specific and powerful, the increased semantic meaning leads to an efficient expression that requires less keywords and is more readable.

Implementing index operator methods will encourage the use of a single syntax that is more expressive and efficient: it separates between index argument and functional arguments and will improve on Go’s orthogonality by ‘eliminating a correlated vector’ from the language space.

The addition of the range data type overlaps with the range keyword which is commonly used in for loops:

  • This has a huge benefit of permitting the implementation of custom slicing operations while allowing for Go1 compatibility.
  • One may argue that ‘overloading’ this keyword has its downsides as far as orthogonality.
  • I argue that the use in for loops and the use as index operator method argument is extremely confining syntax wise, meaning that the two uses cannot appear in the same statement and are too distinct to cause confusion.

In summary, I believe that that the advantages far outweigh the disadvantages and that the overall, this proposal improves the orthogonality of Go.

Is the goal of this change a performance improvement?

Yes. By improving readability you improve programmer performance, which is often the biggest bottleneck.

If so, what quantifiable improvement should we expect?

Development performance will improve in both efficiency (software will be written faster) and effectiveness (there will be more data scientists using Go).

How would we measure it?

A couple of easy suggestions: scrape Github for data science related Go projects and visualize over time, stackoverflow for questions regarding Go in data science, slack for users in channels such as #data-science in Gophers workspace. Also, monitor the usage of key packages such as Gonum and Gorgonia.

Does this affect error handling?

Not to my understanding.

Is this about generics?

This is not directly affecting Generics. However, this proposal can benefit from whatever the Generics implementation will be chosen for Go2. For example, the current Generics draft does not support type parameters for methods which index operator methods can benefit from.

What about full operator overloading?

Lastly, I am breaking the proposal template here, but I think its an important question that has been asked multiple times.

The core Go dev team said on multiple occasions that full operator overloading introduces too many complexities into the language to be realistically considered. Do I want it? yes, but ultimately, full operator overloading is nice to have. It is the author’s belief that this proposal introduces a good trade-off that addresses the crux of the problem.

The proposed solution is very similar in approach to what Go did with pointers: Is full pointer arithmetic beneficial? yes, but it would no longer be in the spirit of Go.

Thank you so much for your time!

References

[1] Robert Griesemer, dotGo2016, “Prototype your design!”, https://www.youtube.com/watch?v=vLxX3yZmw5Q

[2] Brendan Tracey et al., (November 17, 2016), “Proposal: Multi-dimensional slices”, https://go.googlesource.com/proposal/+/master/design/6282-table-data.md

[3] Ian Lance Taylor, (November 14, 2015), “Proposal: spec: strided slices”, https://github.com/golang/go/issues/13253

[4] Russ Cox, GopherCon2017, “The Future of Go”, https://www.youtube.com/watch?v=EtYv7zPyS2A

[5] Robert Griesemer, (September 1, 2020), Github, https://github.com/golang/go/issues/41129

[6] Raanan Hadar, (September 23, 2020), “A data scientist’s take on Go: An Experience Report”, https://medium.com/@rhadar/a-data-scientists-take-on-go-ed408c00ac45

[7] Josh Wills, DataEngConf2016, “Data Engineering and Data Science: Bridging the Gap”, https://www.youtube.com/watch?v=EtYv7zPyS2A

[8] Daniel Whitenack, (February 24, 2017) “Sustainable Machine Learning Workflows”, https://medium.com/pachyderm-data/sustainable-machine-learning-workflows-8c617dd5506d

--

--