Read files in Go

Introduction
My team members in Wiredcraft begin to use tools which are writing in Go or even using Go directly on our projects like building the voter registration for Myanmar elections. And as a fun of strongly-typed programming language, I think it is time to check out this young language. Go (also referred to as golang) likes other programming languages, offer many standard libraries to help users handler files by buffer, position or line. But not like Java or Node.js, Go doesn’t have any standard asynchronous I/O lib, actually you can easily use Go concurrency to implement non-blocking I/O. As a Go beginner, I gonna try to list some methods for Reading and catching wanted information in files by Go.
Examples
You can get demo code in my github repo query_file_demo.
- Ok, let us start with the the most simple way: using ioutil lib
func SimpleReader(path string) string {
f, err := ioutil.ReadFile(path)
CheckError(err)
lines := strings.Split(string(f), "\n")
re := regexp.MustCompile(`\bslowpoke\b`)
var result stringfor _, line := range lines {
if re.MatchString(line) {
result = line
}
}
return result
}Then strings.Split() can help to convert loaded content to a string array. And regexp.MatchString() will match the regex expression with each element of the array. But you should not read a file with ioutil.ReadFile() method when the file is too large to load in memory once, you can know why from the Go lib’s source code
func ReadFile(filename string) ([]byte, error) {...
return readAll(f, n+bytes.MinRead)
}
There n is size of the file, you will get bytes.ErrTooLarge if the file overflows the buffer.
2. Then you can try to use bufio lib:
func Scanner(path string) string {
f, err := os.Open(path)
CheckError(err)
defer f.Close()var result string
scanner := bufio.NewScanner(f)
re := regexp.MustCompile(`\bslowpoke\b`)
for scanner.Scan() {
s := scanner.Text()
if re.MatchString(s) {
result = s
}
}
return result
}
Again, you can read the source code of NewScanner function:
const (
// MaxScanTokenSize is the maximum size used to buffer a token
// unless the user provides an explicit buffer with Scan.Buffer.
// The actual maximum token size may be smaller as the buffer
// may need to include, for instance, a newline.
MaxScanTokenSize = 64 * 1024
startBufSize = 4096 // Size of initial allocation for buffer.
)
func NewScanner(r io.Reader) *Scanner {
return &Scanner{
r: r,
split: ScanLines,
maxTokenSize: MaxScanTokenSize,
}
}
There MaxScanTokenSize is the max size for each line, but it is reading one line per time, only if you line size is above MaxScanTokenSize, then you can read the file as big as you want. If you do not want the way that reads a file one by one line, you can change to file.Read(), IO.copy(), these apis allow you to define a buffer size when read a file.
3. Still if you want to use Go concurrency to help you to make the job faster. You can using Go channel and goroutines.
func ChannelReader(path string) string {
works := 10
f, err := os.Open(path)
CheckError(err)
defer f.Close()
jobs := make(chan string)
results := make(chan string)
complete := make(chan bool)go func() {
scanner := bufio.NewScanner(f)
for scanner.Scan() {
jobs <- scanner.Text()
}
close(jobs)
}()for i := 0; i <= works; i++ {
go grepLine(jobs, results, complete)
}for i := 0; i < works; i++ {
<-complete
}
return <-results
}func grepLine(jobs <-chan string, results chan<- string, complete chan bool) {
re := regexp.MustCompile(`\bslowpoke\b`)
for j := range jobs {
if re.MatchString(j) {
results <- j
}
}
complete <- true
}This code create 10 goroutines for the grep information job, when every goroutine finished, the complete channel with get all true value, then blocking code <-complete can pass.
Bechmark
Go Testing Suit `testing` not only support automated testing of Go packages, but also contains benchmark tools. Just write your test case and run the command, turn on the -benchmem flag add the result with memory consumption.
go test -bench=. -benchmem
Here is the result from my compute, cause i am using a small file to test, so the results and the bench directory in my repo is just telling the way to benchmark your function.
testing: warning: no tests to run
PASS
BenchmarkChannelReader-4 grep line: 79,slowpoke,79,12,360,63,98,1
1000 1712159 ns/op 490096 B/op 1149 allocs/op
BenchmarkScanner-4 79,slowpoke,79,12,360,63,98,1
1000 1766001 ns/op 77130 B/op 846 allocs/op
BenchmarkSimpleReader-4 grep line: 79,slowpoke,79,12,360,63,98,1
1000 1791945 ns/op 112651 B/op 37 allocs/op
ok github.com/chopperlee2011/query_file_demo/bench 5.816s
conclusion
I am glad to know that Go support convenient and fast lib for doing such concurrency works, which can make developer breaking through some technic bottleneck when builded some apps by other languages, and the performance of Go tools or framework, such as Gin, NSQ really impressive me. If you want to dig more into Go, I think gopher academy website is a good place, and you can meet more Go gopher in their Slack channel, hope you will enjoy there.