Golicious
Published in

Golicious

Comparing ioutil.ReadFile and bufio.Scanner

I just finished my first review meeting in a Golang learning group at work. Today I want to share a very interesting discussion with you about the bufio.Scanner we had there.

I don’t want to dive deep into the learning group and what we do there. But to get you onboard, let me explain the task we had to solve:
Read text from a file

Sounds simple, right? For me, to be honest, there is only a “single solution” to use to solve the task. Using the ioutil.ReadFile function (note that ioutil is deprecated and moved to os.ReadFile since Go 1.16). But interestingly one of my colleagues uses bufio.Scanner to read the file content.

My first thought was “Hey, this looks handy. Nice idea”. But as soon as I thought about it, another colleague jumped in and said that she “misused” the bufio.Scanner and that the solution is probably more inefficient than reading the file with ioutil.ReadFile.

Why does he say it? Is it like that? Let’s find out!

Before I compare something, let me show you the two different solutions.

The ioutil.ReadFile solution:

fileContentByteSlice, _ := ioutil.ReadFile("/tmp/data")
fileContent := string(fileContentByteSlice)

The bufio.Scanner solution:

file, _ := os.Open("/tmp/data")
var fileContent string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fileContent = fileContent + scanner.Text()
}

At the end of the day, both solutions work and in both scenarios fileContent contains the content of the file.

Comparision

Because I don’t want to simply trust my gut feeling, I thought it makes sense to compare those with hard facts and use the BenchmarkResult. As a test text file I used a file with 10000 words, generated on this site.

func main() {
res := testing.Benchmark(ioutilReadFile)
fmt.Printf("ioutilReadFile: %s\n%#[1]v\n", res)
res = testing.Benchmark(bufioScanner)
fmt.Printf("bufioScanner: %s\n%#[1]v\n", res)
}
func ioutilReadFile(b *testing.B) {
fileContentArray, _ := ioutil.ReadFile("/tmp/data")
fileContent := string(fileContentArray)
fmt.Println(fileContent[0])
}
func bufioScanner(b *testing.B) {
file, _ := os.Open("/tmp/data")
var fileContent string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fileContent = fileContent + scanner.Text()
}
fmt.Println(fileContent[0])
}

The output is the following:

ioutilReadFile:
testing.BenchmarkResult{N:1000000000, T:87090, Bytes:0, MemAllocs:0x8, MemBytes:0x20750, Extra:map[string]float6
4{}}
bufioScanner:
testing.BenchmarkResult{N:1000000000, T:332705, Bytes:0, MemAllocs:0xdb, MemBytes:0x355070, Extra:map[string]flo
at64{}}

Nice numbers, right? But what does this tell us?

The T stands for the “total time taken”. ioutilReadFile took 87090 nanoseconds while the bufioScanner took 332705 nanoseconds. Even we talk about nanoseconds, this shows us that bufioScanner is slower!

The MemBytes is the “total number of bytes allocated” (in hexadecimal of course). This means the ioutilReadFile versions allocated 132952 bytes while the bufioScanner allocated 3494000 bytes. Without a conversation to a more human-readable format like kilobyte or megabyte, we immediately see that the bufioScanner version allocates way more bytes than the ioutilReadFile version!

Without comparing more numbers, we can say:
Using bufio.Scanner, as we do, is inefficient!

Okay, my colleague was right with that assumption. But he also claimed about the “misuse” of the bufio.Scanner in our case. Is there another, a better use case of using the bufio.Scanner?

When to use bufio.Scanner

There are two use cases I can think of.

The first one is if you are interested in each new line in a file. Imagine the file contains a key-value pair on each line and you want to put those pairs into a slice. Then it would make sense to use the bufio.Scanner. Read each line, put it into the slice, and do whatever you want to do with that slice afterward. But be aware that you can still use the ioutil.ReadFile and split each line by yourself.

The second one is if you are not interested in the whole file content but only in a single line! Same scenario as above. Imagine the file contains a key-value pair on each line and you want to find a specific key to print the value. In this case, it is not required to save each line (and therefore don’t allocate memory) but compare the returned line with a search request.

func bufioScanner(b *testing.B) {
file, _ := os.Open("/tmp/data")
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fileContent := scanner.Text()
if strings.Contains(fileContent, "search") {
// Do something with it
}
}
}

Still, there is more “overhead”, compared to the ioutil.ReadFile version. But way less than saving the whole content and doing the search afterward:

bufioScanner-Search:
testing.BenchmarkResult{N:1000000000, T:141055, Bytes:0, MemAllocs:0x6e, MemBytes:0x10308, Extra:map[string]floa
t64{}}

I found the discussion in our learning group very interesting and this was the reason why I thought it makes sense to write about it.

I hope I could give you an overview of the two different solutions and could explain when you should use what.

--

--

A blog about how I learn Golang

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store