Analyze Large CSV Files in Minutes With Docker Compose
Imagine you have to analyze a massive CSV file. What do you do if Excel or similar tools get a nervous breakdown handling the huge file size? With small files, investigating or comparing CSV files is a walk in the park. But with huge files, it’s a pain if not impossible.
Recently, I had to compare two CSV files with different information but a common identifier, each of them with ~500k entries.
My developer’s heart said: Shouldn’t I use a database?
My head said: Installing a full DB just for that use case?
I mean, I just wanted to check a few small things and then get rid of it again.
The solution — Docker
If you swiftly need to build a small and temporary infrastructure, Docker might be your friend. To get started, just pull the latest database container (Postgres in my case), and you are ready to go.
Part 1 — Docker Compose
Install Docker Compose.
Save the following snippet as docker-compose.yml
.
# Use postgres/example user/password credentials
version: '3.1'services:
db:
image: postgres
container_name: postgres-test
environment:
POSTGRES_PASSWORD: example
ports:
- 15432:5432
Part 2 — Start the Container
Run a docker-compose up
.
The container starts running:
Have the database running:
Part 3 — Access the Database and Load Your CSV
I can recommend DBeaver for that task.
Create a new connection to your dockerized database.
Important: Use the Port which is defined in the docker file: 15432
The username is: postgres
The password is (see docker-compose.yml
) : example
Part 4 — Import Your CSV Files
DBeaver offers a nice feature called “Flat files” connections.
Create one by pointing to the folder that contains your CSV files.
In the DBeaver UI, you should now be able to see the connection and all CSV files listed under „Tables“. It should look similar to this:
To import these files into your Docker database:
- Create a new table within the “Public” schema without any columns (Make sure it is persisted!).
- Right-click on that table and choose “Import Table Data”.
- Select the “Flat files” connection and choose your file(s) from there.
- Adapt the mapping in the next step if necessary.
- Finish the import.
That’s it. Enjoy analyzing your data!