Effortless CSV Parsing in Golang: A Hands-On Approach

Shailesh B
6 min readApr 9, 2024

--

Handling big CSV files can take a lot of time. But with Golang, it becomes much quicker. Unlike traditional methods, Golang makes the process more efficient, cutting down the time needed to deal with large CSV files. Golang achieves this efficiency through its optimized performance and concurrent processing capabilities, helping developers simplify data tasks and improve overall speed. Using Golang turns the once burdensome job of managing extensive CSV datasets into a smoother and more time-saving process. It’s a valuable tool for developers looking to increase productivity in data-heavy workflows.

1. Setting up the project and Installing the required packages

Let’s first start with the setting up the go project. First we will initializes a new module in the current directory using the go mod init moduleName command. The module name can be your repository name. I am going to run the following command.

go mod init github.com/shaileshhb/go-file-parser

Once the command is executed two files will be create go.mod and go.sum. Now we will install the package that we are going to use for reading through the csv file.

go get -u github.com/gocarina/gocsv

Now we are ready to start writing our code. Create a main.go file where we will write our code.

2. Opening and Reading a CSV File

For reading the csv file we will have to open the file first. There are multiple approaches to open a file and we will be using OpenFile function from the os package

readFilePath := "process.csv"

// Open the CSV readFile
readFile, err := os.OpenFile(readFilePath, os.O_RDONLY, os.ModePerm)
if err != nil {
panic(err)
}
defer readFile.Close()

OpenFile method takes three parameters:

  • readFilePath: The path to the file you want to open.
  • os.O_RDONLY: Flag indicating that the file should be opened for reading only.
  • os.ModePerm: File permission mode, indicating that the file should have the default permissions for its type (e.g., 0666 for a regular file).

OpenFile function returns file and error object. Here since OpenFile returns us error we need to handle the error. Here the program will panic and stop the program execution but in real world scenario you will mostly return the error. And at last we need to close the file once our execution is completed.

3. Defining Structs to Match CSV Columns

Since we are using github.com/gocarina/gocsv package it has certain rule for defining the struct which will be filled while parsing the file.
When defining the struct each field has a tag called csv which will have the name of the column from the csv file.
For example: Say we have a csv with a column named 'Full Name'. For this we will define the struct as follow

type User struct {
FullName string `csv:"Full Name"`
}

The csv file that I am going to use has following columns:
Organization Name, LinkedIn, Website, Total Funding Amount, Total Funding Amount Currency, Headquarters Location. So my struct will look something like this

type Industry struct {
CompanyName string `csv:"Organization Name"`
LinkedIn string `csv:"LinkedIn"`
Website string `csv:"Website"`
TotalFundingAmount int `csv:"Total Funding Amount"`
TotalFundingAmountCurrency string `csv:"Total Funding Amount Currency"`
HeadquartersLocation string `csv:"Headquarters Location"`
}

4. Parsing CSV Data into Structs using the UnmarshalToChan() Function

UnmarshalToChan enables memory-efficient processing of large datasets by streaming records into channels instead of loading them all at once.
Calling the UnmarshalToChan is very simple. It just takes 2 arguments, first one is the file and second parameter the the channel. The channel has to be the type of the struct we just created.

readChannel := make(chan Industry, 1)
err := gocsv.UnmarshalToChan(file, c)
if err != nil {
panic(err)
}

Here we create readChannel where we will find our data. Once each record is read the entire data is pushed into the channel. And file is the file that we had opened previously. The UnmarshalToChan function returns the error so we need to handle it once again.
The package provides us option to set our csv reader wherein we can set few options for reading the csv file. It can be done as follows

gocsv.SetCSVReader(func(r io.Reader) gocsv.CSVReader {
reader := csv.NewReader(r)
reader.Comma = ','
reader.FieldsPerRecord = -1
return reader
})

Breakdown of the function is as follow
gocsv.SetCSVReader: This function is used to set a custom CSV reader for parsing CSV data.

  1. func(r io.Reader) gocsv.CSVReader:
  • The function takes an io.Reader as an argument and returns a gocsv.CSVReader.
  • It defines a function literal (anonymous function) that specifies how the CSV reader should be configured.

2. csv.NewReader(r):

  • Creates a new CSV reader using the csv package from the Go standard library, and it takes an io.Reader as its parameter.

3. reader.Comma = ',':

  • Sets the comma used for field separation in the CSV file. In this case, it’s set to the standard comma (,).

4. reader.LazyQuotes = true:

  • Configures the CSV reader to allow lazy quotes. Lazy quotes allow quotes in a field to span multiple lines.

5. reader.FieldsPerRecord = -1:

  • FieldsPerRecord is the number of expected fields per record. If FieldsPerRecord is positive, Read requires each record to have the given number of fields. If FieldsPerRecord is 0, Read sets it to the number of fields in the first record, so that future records must have the same field count. If FieldsPerRecord is negative, no check is made and records may have a variable number of fields.
  • Sets FieldsPerRecord to -1, indicating that each record in the CSV file may have a different number of fields. This is useful for handling irregular CSV files where records might have varying numbers of fields.

The entire part up till here should something like this

package main

import (
"encoding/csv"
"fmt"
"io"
"os"
"time"

"github.com/gocarina/gocsv"
)

type Industry struct {
CompanyName string `csv:"Organization Name"`
LinkedIn string `csv:"LinkedIn"`
Website string `csv:"Website"`
TotalFundingAmount int `csv:"Total Funding Amount"`
TotalFundingAmountCurrency string `csv:"Total Funding Amount Currency"`
HeadquartersLocation string `csv:"Headquarters Location"`
}

func main() {
readChannel := make(chan Industry, 1)

readFilePath := "process.csv"

// Open the CSV readFile
readFile, err := os.OpenFile(readFilePath, os.O_RDONLY, os.ModePerm)
if err != nil {
panic(err)
}
defer readFile.Close()

count := 0
readFromCSV(readFile, readChannel)
}

func readFromCSV(file *os.File, c chan Industry) {
gocsv.SetCSVReader(func(r io.Reader) gocsv.CSVReader {
reader := csv.NewReader(r)
reader.Comma = ','
reader.LazyQuotes = true
reader.FieldsPerRecord = -1
return reader
})

// Read the CSV file into a slice of Record structs
go func() {
err := gocsv.UnmarshalToChan(file, c)
if err != nil {
panic(err)
}
}()
}

Here I have extracted the code which is going to read from the csv file and UnmarshalToChan is wrapped inside anonymous go routine function, expect it everything looks the same.
Now the only missing part is reading from the channel which could be done using the for range as follow

for r := range readChannel {
fmt.Println("========================================")
fmt.Println(r)
fmt.Println("========================================")
fmt.Println()
}

Here I am just printing out the record that has been read but in your case you can do process the record based on your requirement.

The entire code is below

package main

import (
"encoding/csv"
"fmt"
"io"
"os"
"time"

"github.com/gocarina/gocsv"
)

type Industry struct {
CompanyName string `csv:"Organization Name"`
LinkedIn string `csv:"LinkedIn"`
Website string `csv:"Website"`
TotalFundingAmount int `csv:"Total Funding Amount"`
TotalFundingAmountCurrency string `csv:"Total Funding Amount Currency"`
HeadquartersLocation string `csv:"Headquarters Location"`
}

// 5.57ms -> 600 records (read)
func main() {
now := time.Now()
readChannel := make(chan Industry, 1)

readFilePath := "process.csv"

// Open the CSV readFile
readFile, err := os.OpenFile(readFilePath, os.O_RDONLY, os.ModePerm)
if err != nil {
panic(err)
}
defer readFile.Close()

count := 0
readFromCSV(readFile, readChannel)

// Print the records
for r := range readChannel {
fmt.Println("========================================")
fmt.Println(r)
fmt.Println("========================================")
fmt.Println()

count++
}

fmt.Println(time.Since(now), count)
}

func readFromCSV(file *os.File, c chan Industry) {
gocsv.SetCSVReader(func(r io.Reader) gocsv.CSVReader {
reader := csv.NewReader(r)
reader.Comma = ','
reader.LazyQuotes = true
reader.FieldsPerRecord = -1
return reader
})

// Read the CSV file into a slice of Record structs
go func() {
err := gocsv.UnmarshalToChan(file, c)
if err != nil {
panic(err)
}
}()
}

I have pushed this code here you can use it for reference. The repository also has a sample csv file which you can use for testing.
Here is the link for the package that I have used
Go package: https://pkg.go.dev/github.com/gocarina/gocsv
Github: https://github.com/gocarina/gocsv

Thanks for reading through the blog. Please let me know if I have missed something or something that I have done is incorrect so that I could make those changes and learn from my mistakes.

--

--