Sum/Union/Variant Type in Go and Static Check Tool of switch-case handling

haya14busa
7 min readDec 6, 2016

--

I’ll introduce how to represent sum/union/variant(-like) type in Go first.
Then, I’ll introduce ‘gosumcheck’, which is a static lint tool which checks all possible cases of type-switch.

This is a post for Hatena Engineer Advent Calendar 2016 (Japanese).

Sum Type in Go

So, what is sum/union/variant type? From Wikipedia,

In computer science, a tagged union, also called a variant, variant record, discriminated union, disjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed, types.
https://en.wikipedia.org/wiki/Tagged_union

Example from Wikipedia: (binary tree)

datatype tree = Leaf
| Node of (int * tree * tree)

In this example, “tree” is sum type and it could be “Leaf” or “Node”. OK, then, how can we represent sum type in Go?

Take a look at Go FAQ about variant types.

We considered adding variant types to Go, but after discussion decided to leave them out because they overlap in confusing ways with interfaces. What would happen if the elements of a variant type were themselves interfaces?

Also, some of what variant types address is already covered by the language. The error example is easy to express using an interface value to hold the error and a type switch to discriminate cases. The syntax tree example is also doable, although not as elegantly.

— https://golang.org/doc/faq#variant_types

Yes, Go doesn’t have variant(sum) types, but we can use “interface” to represent sum-like types instead. Since FAQ mentions the syntax tree as an example, let’s take a look at go/ast implementation.

type “Node” represents AST Node type and all node types must implement the Node interface. “Node” interface requires “Pos() token.Pos” and “End() token.Pos” which returns position of node. So, to represent sum type as interface, add common methods to the interface. It’s useful for users and it prevents that unexpected type will be assigned to the interface.

It’s more interesting to see the implementation of “Expr” and “Stmt” type. They embed “Node” type to represent that they are one of “Node” type and they also have unexported “dummy” method to each interface (“exprNode()’” and “stmtNode()”) to represent each data type. So, if a sum type doesn’t have any common useful methods, you can use dummy methods.

“internal” interface and “public” interface

And one more thing! If an interface has unexported methods, external package cannot create types which implements the interface. For example, external packages cannot create their own “Expr” types. Let’s call this pattern as “internal” interface. For internal interface, we can list up every types which implements the interface because they must be in the same package.

On the other hand, “Node” type only has public method, so external packages can create their own node. Let’s call this pattern as “public” interface. In this case, if go/ast doesn’t expect node which defined in external package and there were public method which accepts “Node” type as an argument, such methods must be not safe and unexpected behavior may happen.

Go Playground: https://play.golang.org/p/dPZ5UQU98S

Just to be on the safe side, it might be better to add an internal method to sum types even though there are public common methods. In addition, the “internal” interface is useful to check every possible cases are handled. I’ll explain it later.

How to use sum types(, or interface type) in Go

We use “interface” to represent sum types, so you can use sum types effectively if you know some basic way to work with interface. Let me introduce some tips.

type switch

You can use “type switch” to discover the dynamic type of an interface variable. You can also use type assertion, but for sum types, “type switch” might be more useful in most cases.

go/ast/#Inspect Example:

// Inspect the AST and print all identifiers and literals.
ast.Inspect(f, func(n ast.Node) bool {
var s string
switch x := n.(type) {
case *ast.BasicLit:
s = x.Value
case *ast.Ident:
s = x.Name
}
if s != "" {
fmt.Printf("%s:\t%s\n", fset.Position(n.Pos()), s)
}
return true
})

Ensure a type implements expected interface at compile time

Sometimes, we may forget to add required methods to a type. In such cases, the type doesn’t implement the expected interface but compilation will succeed. To check types implements expected interface, we can use the following pattern.

// Ensure MyNode implements Node interface at compile time.
var _ Node = &MyNode{}
// You can also check by following line instead.
var _ Node = (*MyNode)(nil)

Go Playground: https://play.golang.org/p/2PKx_jLYGk

Decoding JSON of Sum Type

By the way, I am working at the Mackerel team. Mackerel(https://mackerel.io/) is server monitoring service. I’ll use Mackerel API client in Go (mackerelio/mackerel-client-go) as an example of sum type usage. But of course, it’s just a practical example, so you can ignore the details about Mackerel to understand it.

Sometimes, JSON structure for REST API contains “type” or similar fields. You have to see the value of “type” fields to grasp the structure. For such cases, we cannot decode JSON easily.

Mackerel has API for monitoring. (https://mackerel.io/api-docs/entry/monitors). There are several monitoring types, such as “connectivity”, “host metric”, etc… and “type” field represents monitor type. Monitoring types in Go is defined here.

type Monitor interface {
MonitorType() string
MonitorID() string
MonitorName() string
isMonitor()
}
// MonitorConnectivity represents connectivity monitor.
type MonitorConnectivity struct {
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Type string `json:"type,omitempty"`
IsMute bool `json:"isMute,omitempty"`
NotificationInterval uint64 `json:"notificationInterval,omitempty"`
Scopes []string `json:"scopes,omitempty"`
ExcludeScopes []string `json:"excludeScopes,omitempty"`
}
// ...

GET /api/v0/monitors returns a list of monitor configurations. Example:

{
"monitors": [
{
"id": "2cSZzK3XfmG",
"type": "connectivity",
"isMute": false,
"scopes": [],
"excludeScopes": []
},
{
"id" : "2cSZzK3XfmG",
"type": "host",
"isMute": false,
"name": "disk.aa-00.writes.delta",
"duration": 3,
"metric": "disk.aa-00.writes.delta",
"operator": ">",
"warning": 20000.0,
"critical": 400000.0,
"scopes": [
"SomeService"
],
"excludeScopes": [
"SomeService: db-slave-backup"
],
"notificationInterval": 60
}
]
}

The corresponding method of Go client is FindMonitors()

func (c *Client) FindMonitors() ([]Monitor, error)

There are 2 ways to decode JSON and get list of Monitor.

1) json.RawMessage

  • decode to a list of json.RawMessage
  • decode json.RawMessage to get only “type” value. var typeData struct { Type String 'json:"type"'}
  • decode json.RawMessage to monitor types using “type” value.
import "encoding/json"func decodeMonitorFromRawMessage(rawmes []byte) (monitorI, error) {
var typeData struct {
Type string `json:"type"`
}
if err := json.Unmarshal(rawmes, &typeData); err != nil {
return nil, err
}
var m monitorI
switch typeData.Type {
case monitorTypeConnectivity:
m = &MonitorConnectivity{}
case monitorTypeHostMeric:
m = &MonitorHostMetric{}
case monitorTypeServiceMetric:
m = &MonitorServiceMetric{}
case monitorTypeExternalHTTP:
m = &MonitorExternalHTTP{}
case monitorTypeExpression:
m = &MonitorExpression{}
}
if err := json.Unmarshal(rawmes, m); err != nil {
return nil, err
}
return m, nil
}

2) mitchellh/mapstructure

  • decode to list of map[string]interface{}
  • get “type” value
  • convert map[string]interface{} to each monitor types by using mitchellh/mapstructure
func decodeMonitorFromMap(mmap map[string]interface{}) (monitorI, error) {
typ, ok := mmap["type"]
if !ok {
return nil, errors.New("`type` field not found")
}
var m monitorI
switch typ {
case monitorTypeConnectivity:
m = &MonitorConnectivity{}
case monitorTypeHostMeric:
m = &MonitorHostMetric{}
case monitorTypeServiceMetric:
m = &MonitorServiceMetric{}
case monitorTypeExternalHTTP:
m = &MonitorExternalHTTP{}
case monitorTypeExpression:
m = &MonitorExpression{}
}
c := &mapstructure.DecoderConfig{
TagName: "json",
Result: m,
}
d, err := mapstructure.NewDecoder(c)
if err != nil {
return nil, err
}
if err := d.Decode(mmap); err != nil {
return nil, err
}
return m, nil
}

You can see full implementation and benchmark result here.

$ go test -v -run="^$" -bench=Monitor_JSON_ | prettybench
benchmark iter time/iter
--------- ---- ---------
BenchmarkMonitor_JSON_mapstructure-4 1000 1.98 ms/op
BenchmarkMonitor_JSON_rawmessage-4 1000 1.40 ms/op

Before running benchmark, I suspect that the first json.RawMessage method might be slow because it needs to decode JSON byte 3 times for each monitor JSON. However, benchmark shows that it’s faster than the second mapstructure method, though both are fast enough I guess. From the benchmark result and the fact that encoding/json is standard library, I decided to use the json.RawMessage method to decode monitors JSON.

gosumcheck — check all cases are handled in type-switch statically

I introduced that what is sum type and how to represent it in Go, useful idioms to work with sum type as interface type and how to work with JSON. However, we can go further.

One of the best part of sum types is that, in functional language like Haskell, Scala, etc… the compiler can verify that all possible cases are handled for pattern match of sum type. We may miss some cases and when we add a new type to sum type, we must find all pattern matches (or type-switch statement). It’s better to use computers to check these cases instead of carefully finding such cases by human.

Yes, I created a static analysis tool to do that for Golang!

Installation

go get -u github.com/haya14busa/gosum/cmd/gosumcheck

Example

The result contains false positive results, but it finds fairly possible bugs!

How It Works

The idea is basically same as “guru implements” and “implements” of godoc analysis. We can find types which implements the interface with the help of go/types package. gosumcheck gathers all possible types and outputs missing types which is not handles by “case” statements. It’s simple but it just works well!!

In the middle of this post, I introduced “public” interface and “internal” interface. “public” interface is basic interface type which only has public method and any external packages can define their own types which implements the interface. I guess “error” interface is most famous example. On the other hand, “internal” interface requires unexported methods, so all types which implement the interface must be in the same package.

You may notice it already. “internal” interface is suited for “gosumcheck” because all types which implement the interface are fixed and in one package.

But you can also use “gosumcheck” for “public” interface. Although any external package may add types which implement the interface, we can get all the dependent packages when running “gosumcheck” as a linter. So, we can list up possible types, though it may just unexpectedly implement the interface.

Heuristics for suppressing false positive results as much as possible

Since, type switch is not always used for sum types and there are cases that we don’t have to handle all types, the result might be messed up with false positive results. To address this problem, “gosumcheck” uses “cover rate” of type switch to calculate confidence of outputs. I assumed that we usually don’t miss lots of cases but miss just a few cases. So, if a cover rate is nearly 100%, gosumcheck reports rest of missing cases. On the other hand, a cover rate is low, e.g. under 50%, it assumes that the programmer intentionally didn’t list up lots of cases.

If you want gosumcheck to report more cases, please specify -min_confidence flag. (e.g. $ gosumcheck -min_confidence=0.1 ./...)

I guess we can improve outputs quality by using other heuristics, so if you have some idea, please open a pull-request or post idea to the issue tracker. https://github.com/haya14busa/gosum

--

--