Go Walkthrough: bytes + strings packages

Ben Johnson
Aug 8, 2016 · 12 min read

In the but sometimes we need to work with bounded, in-memory byte slices instead. While working with a list of bytes seems simple enough, there are a lot of edge cases and common operations that make using the package worthwhile. We’re also going to lump in the package in this post since its API is nearly identical although for use with strings.

This post is part of a series of walkthroughs to help you understand the Go standard library better. While generated documentation provides a wealth of information, it can be difficult to understand packages in a real world context. This series aims to provide context of how standard library packages are used in every day applications. If you have questions or comments you can reach me at on Twitter.

A brief aside on bytes vs strings

Rob Pike has an excellent, thorough post on but for the sake of this post I’d like to provide more concise definitions from an application developer standpoint.

Byte slices represent a mutable, resizable, contiguous list of bytes. That’s a mouthful so let’s understand what that means.

Given a slice of bytes:

buf := []byte{1,2,3,4}

It’s mutable so you can update elements:

buf[3] = 5  // []byte{1,2,3,5}

It’s resizable so you can shrink it or grow it:

buf = buf[:2]           // []byte{1,2}
buf = append(buf, 100) // []byte{1,2,100}

And it’s contiguous so each byte exists one after another in memory:

1|2|3|4

Strings, on the other hand, represent an immutable, fixed-size, contiguous list of bytes. That means that you can’t update a string — you can only create new ones. This is important from a performance standpoint. In high performance code, constantly creating new strings adds a lot of load on the garbage collector.

From an application development perspective, strings tend to be easier to use when working with UTF-8 data, they can be used as map keys whereas byte slices cannot, and most APIs use strings for arguments containing character data. On the other hand, byte slices work well when you’re dealing with raw bytes such as processing byte streams. They are also good to use when you need to avoid allocations and can reuse them.

Adapting strings & slices for streams

One of the most important features of the bytes and strings packages is that it provides a way to interface in-memory byte slices and strings as io. and io..

In-memory readers

Two of the most underused tools in the Go standard library are the bytes. and strings. functions:

func NewReader(b []) *
func NewReader(s ) *

These functions return an io. implementation that wraps around your in-memory byte slice or string. But these aren’t just readers — they implement all the read-related interfaces in including io., io., io., io., io., io., & io..

I frequently see code where byte slices or strings are written to a bytes. and then the buffer is used as a reader:

var buf bytes.Buffer
buf.WriteString("foo")
http.Post("http://example.com/", "text/plain", &buf)

However, this approach incurs heap allocations which will be slow and use additional memory. A better option is to use the strings.:

r := strings.NewReader("foobar")
http.Post("http://example.com", "text/plain", r)

This approach also works when you have multiple strings or byte slices by using the io.MultiReader:

r := io.MultiReader(
strings.NewReader("HEADER"),
bytes.NewReader([]byte{0,1,2,3,4}),
myFile,
strings.NewReader("FOOTER"),
)

In-memory writer

The bytes package also includes an in-memory implementation of io. called . It implements nearly all the interfaces except io. & io.. There’s also a helper method called () for writing a string to the end of the buffer.

I use extensively in unit tests for capturing log output from services. You can pass it as an argument to log.() and then verify output later:

var buf bytes.
myService.Logger = log.(&buf, "", log.)
myService.Run()
if !strings.(buf.(), "service failed") {
t.("expected log message")
}

However, in production code, I rarely use . Despite its name, I don’t use it to buffer reads and writes since there’s a package called specifically for that purpose.

Package organization

At first glance the bytes and strings packages appear large but they are really just a collection of simple helper functions. We can group them into a handful of categories:

  • Comparison functions
  • Inspection functions
  • Prefix/suffix functions
  • Replacement functions
  • Splitting & joining functions

Once we understand how the functions group together, the large API seems much more approachable.

Comparison functions

When you have two byte slices or strings you may need to ask one of two questions. First, are these two objects equal? Second, which one comes before the other when sorted?

Equality

The () function answers our first question:

func Equal(a, b []) 

This function only exists in the bytes package because strings can be compared with the == operator.

Although checking for equality seems easy, one common mistake is to use strings.() to perform case-insensitive equality checks:

if strings.ToUpper(a) == strings.ToUpper(b) {
return true
}

This is flawed because it requires 2 allocations of new strings. A better approach is to use ():

func EqualFold(s, t []) 
func EqualFold(s, t )

The term “Fold” refers to . It encompasses regular uppercase & lowercase rules for A-Z as well as rules for other languages such as converting φ to ϕ.

Comparison

To determine the sort order for two byte slices or strings, we’ll use ():

func Compare(a, b []) 
func Compare(a, b )

This function returns -1 if a is less than b, 1 if a is greater than b, and 0 if a and b are equal. This function exists in the package only for symmetry with the bytes package. Russ Cox even calls out in the function’s comments that “” Instead, use the built-in < and > operators.

“Basically no one should use strings.Compare”, Russ Cox

Typically you’ll want to know if a byte slice is less than another byte slice for the purpose of sorting. The sort. requires this for its Less() function. To convert from the ternary return value of () to the boolean required by Less(), we simply check for equality with -1:

type ByteSlices [][]bytefunc (p ByteSlices) Less(i, j int) bool {
return bytes.Compare(p[i], p[j]) == -1
}

Inspection functions

The bytes & strings packages provide several ways to find data within your byte slices and strings.

Counting

If you are validating input from a user, it’s important to verify that certain bytes exist (or don’t exist). You can use the () function to check for existence of one or more subslices or substrings:

func Contains(b, subslice []) 
func Contains(s, substr )

For example, you may not allow input with certain off-color words:

if strings.Contains(input, "darn") {
return errors.New("inappropriate input")
}

If you need to obtain the exact number of times a subslice or substring was used, you can use ():

func Count(s, sep []) 
func Count(s, sep )

Another use for () is to return the number of runes in a string. By passing in an empty slice or blank string as the sep argument, () will return the number of runes + 1. This is different from () which will return the number of bytes. The distinction is important when dealing with multi-byte Unicode characters:

strings.Count("I ❤ ☃", "")  // 6
len("I ❤ ☃") // 9

The first line above may seem odd because there are 5 runes but remember that () returns the rune count plus one.

Indexing

Asserting contents is important but sometimes you’ll need to find the exact position of a subslice or substring. You can do this using the index functions:

Index(s, sep []byte) int
IndexAny(s []byte, chars string) int
IndexByte(s []byte, c byte) int
IndexFunc(s []byte, f func(r rune) bool) int
IndexRune(s []byte, r rune) int

There are multiple index functions for different use cases. () finds a multi-byte subslice. () finds a single byte within a slice. () finds a unicode code-point within a UTF-8 interpreted byte slice. () works like () but searches for multiple code-points at the same time. Finally, () allows you to pass in a custom function to evaluate each rune in your byte slice until a match.

There’s also a matching set of functions for searching for the first instance of the end of a byte slice or string:

LastIndex(s, sep []byte) int
LastIndexAny(s []byte, chars string) int
LastIndexByte(s []byte, c byte) int
LastIndexFunc(s []byte, f func(r rune) bool) int

I don’t use the index functions much because I find that I typically need to build something more complex such as a parser.

Prefixing, suffixing, & trimming

Working with content at the beginning and end of a byte slice or string is a special case of inspection but it’s a important enough to warrant its own section.

Checking for prefixes & suffixes

Prefixes come up a lot in programming. For example, HTTP paths are typically grouped by functionality with common prefixes. Another example is special characters at the beginning of a string such as “@” for mentioning a user.

The () and () functions allow you to check for these situations:

func HasPrefix(s, prefix []) 
func HasPrefix(s, prefix )
func HasSuffix(s, suffix [])
func HasSuffix(s, suffix )

These functions may seem too simple to bother with but one common mistake I see is when developers forget to check for zero length values:

if str[0] == '@' {
return true
}

This code looks simple enough but if str is blank then the program will panic. The () function includes this validation for you:

if strings.(str, "@") {
return true
}

Trimming

The term “trimming” in the and packages refers to removing bytes or runes from the beginning and/or end of a byte slice or string. The most generic function for this is ():

func Trim(s [], cutset ) []
func Trim(s , cutset )

This will remove any runes in cutset from the beginning and end of your string. You can also trim from just the beginning or just the end of your string using () and (), respectively.

But generic trimming isn’t very common. Most of the time you want to trim white space characters and you can use () for this:

func TrimSpace(s []) []
func TrimSpace(s )

You might think that trimming with a “ \n\t” cutset is enough but () will trim all Unicode defined white space. This includes not only the space, newline, and tab characters but also more unusual white space characters such as or .

() is actually just a around () which is a function for evaluating leading and trailing runes for trimming:

func TrimSpace(s string) string {
return TrimFunc(s, unicode.IsSpace)
}

This makes it simple to create your own whitespace trimmer for only trailing characters:

(s, unicode.IsSpace)

Finally, if you want to trim exact prefixes or suffixes instead of character sets, there are the () and () functions:

func TrimPrefix(s, prefix []) []
func TrimPrefix(s, prefix )
func TrimSuffix(s, suffix []) []
func TrimSuffix(s, suffix )

These can go hand in hand with the () and () functions if you want to replace a prefix or suffix. For example, I use this to implement Bash-style home directory completion for paths my config files:

// Look up user's home directory.
u, err := user.Current()
if err != nil {
return err
} else if u.HomeDir == "" {
return errors.New("home directory does not exist")
}
// Replace tilde prefix with home directory.
if strings.HasPrefix(path, "~/") {
path = filepath.Join(u.HomeDir, strings.TrimPrefix(path, "~/"))
}

Replacement functions

Basic replacement

Swapping out subslices or substrings is sometimes necessary. For the most simple cases, the () function is all you need:

func Replace(s, old, new [], n ) []
func Replace(s, old, new , n )

It swaps out any instance of old with new in your string. You can set n to a non-negative number to limit the number of replacements. This function is good if you have a simple placeholder in a user defined template. For example, you want to let users specify “$NOW” and have it replaced with the current time:

now := time.().(time.)
println(strings.(data, "$NOW", now, -1)

If you have multiple mappings then you’ll need to use strings.. This works by specifying old/new pairs to strings.():

r := strings.("$NOW", now, "$USER", "mary")
println(r.("Hello $USER, it is $NOW"))
// Output: Hello mary, it is 3:04PM

Case replacement

You may assume that casing is simple — upper & lower case — but Go works with Unicode and Unicode is never that simple. There are 3 types of casing: upper, lower, and title case.

Uppercase and lowercase are straight foward for most languages and you can use the () and () functions:

func ToUpper(s []) []
func ToUpper(s )
func ToLower(s []) []
func ToLower(s )

However, some languages have different rules for casing. Turkish, for example, uppercases its i as İ. For these special case languages, there are special versions of these functions:

strings.(unicode.TurkishCase, "i")

Next we have title case and the () function:

func ToTitle(s []) []
func ToTitle(s )

You may be surprised, however, when you use () and all your characters are uppercased:

println(strings.("the count of monte cristo"))// Output: THE COUNT OF MONTE CRISTO

That’s because in Unicode, title case is a specific type of casing and not a way to capitalize the first character in each word. For the most part, title case and upper case are the same but there are a few code points which have differences. For example, the code point (yes, that’s one code point) is uppercased as LJ but title cased as Lj.

What you’re probably looking for is the () function:

func Title(s []) []
func Title(s )

This outputs the expected result:

println(strings.("the count of monte cristo"))// Output: The Count Of Monte Cristo

Mapping runes

One other function for replacing data in a bytes slice or string is ():

func Map(mapping func(r ) , s []) []
func Map(mapping func(r ) , s )

This function lets you pass in a function to evaluate every rune and replace it. Admittedly, I didn’t even know this function existed until I started writing this post so I can’t give any personal anecdote.

Splitting & joining functions

Many times we have delimited strings that we need to break apart. For example, paths in Unix are joined with colons and the CSV file format is essentially just fields of data delimited by commas.

Substring splitting

For simple subslice or substring splitting, we have the Split() functions:

func Split(s, sep []) [][]
func SplitAfter(s, sep []) [][]
func SplitAfterN(s, sep [], n ) [][]
func SplitN(s, sep [], n ) [][]
func Split(s, sep ) []
func SplitAfter(s, sep ) []
func SplitAfterN(s, sep , n ) []
func SplitN(s, sep , n ) []

These break up byte slices or strings by a delimiter and return the subslices or substrings. The “After” functions include the delimiter at the end of the substrings. The “N” functions limit the number of splits that can occur:

strings.Split("a:b:c", ":")       // ["a", "b", "c"]
strings.SplitAfter("a:b:c", ":") // ["a:", "b:", "c"]
strings.SplitN("a:b:c", ":", 2) // ["a", "b:c"]

Splitting data is a very common operation, however, it’s typically done in the context of a file format such as CSV or in the context of path splitting. For these operations, I use the or packages instead.

Categorical splitting

Sometimes you want to specify delimiters as a set of runes instead of a series of runes. The best example of this is breaking apart words by variable-length whitespace. Simply calling () using a space delimiter will give you empty substrings if you have multiple contiguous spaces. Instead you can use the Fields() function:

func Fields(s []) [][]

This will consider consecutive whitespace characters a single delimiter:

strings.Fields("hello   world")      // ["hello", "world"]
strings.Split("hello world", " ") // ["hello", "", "", "world"]

The Fields() function is just a simple wrapper around FieldsFunc() which lets you pass a function to evaluate each rune as a delimiter:

func FieldsFunc(s [], f func() ) [][]

Joining

Instead of breaking apart delimited data, we can join it together using the Join() function:

func Join(s [][], sep []) []
func Join(a [], sep )

One common mistake I‘ve seen is when developers try to implement join by hand. It looks something like:

var output string
for i, s := range a {
output += s
if i < len(a) - 1 {
output += ","
}
}
return output

The flaw in this code is that you are creating a massive number of allocations. Because strings are immutable, each iteration is generating a new string for each append. The strings.() function, on the other hand, uses a byte slice buffer to build upon and converts it back to a string when it returns. This minimizes heap allocations.

Miscellaneous functions

There’s two functions I couldn’t find a category for so they’re lumped in here at the bottom. First, the () function allows you generate a repeated byte slice or string. Honestly, the only time I can remember using this is to make a line to separate content in the terminal:

println(strings.Repeat("-", 80))

The other function is () which returns a slice of all runes in a UTF-8 interpreted byte slice or string. I‘ve never needed to use this since the for loop over a string does the same thing but without the allocations.

Conclusion

Byte slices and strings are fundamental primitives in Go. They are the in-memory representations for series of bytes and runes. The bytes and strings packages provide a ton of useful helper functions as well as adapters to the io.Reader and io.Writer interfaces.

It’s easy to overlook many of the useful tools in these packages because of the API’s size but I hope this post has helped you to understand everything these packages have to offer.

Love the post? Hate it? Drop me a line at on Twitter.

Go Walkthrough

A series of walkthroughs to help you understand the Go standard library better.

Ben Johnson

Written by

Writing databases and distributed systems in Go.

Go Walkthrough

A series of walkthroughs to help you understand the Go standard library better.