How to chunk files into smaller files in macOS
Problem Statement: I have a very large CSV file containing millions of records which need to be chunked into smaller files. These files would be later ingested into a database
Options
- Create a program to read the file line by line and create a new file
- Use third-party tools to split the file
- Use the MacOs native tool
split
to chunk the files
Decision
Option #3: Use the MacOs native tool split
to chunk the files
Rationale
- Creating a program to split the file involves cost & effort
- Third-party tools, not my favourite.
SPLIT command
usage:
split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]
The size of the pricedata.csv file is approximately 4.41GB containing ~26 million records. For ingestion to the database, I would like to split the files into 1 million per file and then process it.
split -l 1000000 pricedata.csv
In less than 2 minutes it splits the file into 26 small files. It by default select the xa series number for the chunked files.
In case you want to assign a different name use the following syntax
passing the parameter ppd- will make the split command use the file name pattern and add alphabetical suffixes next to the chunked files