Effective text parsing in golang

Tobias Schmidt
2 min readJun 10, 2017

--

Extracting a word from a sentence is an important task in programming. To resolve this task one could write a regexp or parse a string letter by letter. Golangs strings package makes this task a lot simpler via the Fields function.

The Fields function

func Fields(s string) []string

The Fields function takes a string and identifies words that are separated by white space characters. The function outputs all separated words in a slice of string.

Example usage

One great usecase for the Fields function is a http request line parser. A http
request line includes the http method followed by a request url and http
version e.g:

“GET /api/v1/endpoint HTTP/1.1”

To receive the http method (GET in the example above) via go — it is temping to just access the first three runes by slicing the http request line via:

s := “GET /api/v1/endpoint HTTP/1.1”
m := s[:3]

Unfortunatelly not all http methods consist of three letters. So this approach does not work with http methods like POST or DELETE. That is why it is necessary to parse the string in a way that it automaticlly detects the length of each word. This would require a regexp or parsing the string character by character until a whitespace occurs. Luckily golang’s strings package offers a solution to this problem.

Parsing the first word in aboves example is as simple as:

s := “GET /api/v1/endpoint HTTP/1.1”
ss := strings.Fields(s)
m := ss[0]

golang playground

Parsing words with different separators

Not all strings and encoding formats separate words via an empty space. This is especially true for the CSV (Comma Separated Values) file format in which every word or value is separated via a comma. Following a short example of a CSV:

“this,is,a,csv”

To receive the second value of the CSV (is) — it is not possible to use the Fields function. The fields function only separates via white space characters — so in this case it would interpret the whole string as one word. Golangs
FieldsFunc instead allows us to separate a sentence in its separate values while choosing a different separator character:

func FieldsFunc(s string, f func(rune) bool) []string

This is achieved via a second parameter that is a function with a bool return value. This function is called for each character of the string. If it returns true the position of the sentence is evaluated as a separator. That way
it is possible to even use multiple separators at once.

To receive the “a” from aboves comma separated string we could use:

s := “this,is,a,csv”
ss := strings.FieldsFunc(s, func(r rune) bool {
if r == ‘,’ {
return true
}
return false
})
a := ss[2]

golang playground

Summary

The Fields functions can be huge timesaver. Moreover they are very reliable and flexible due to optional conditional logic.

--

--