Initializing a PostgreSQL Database with a Dataset using Docker Compose: A Step-by-step Guide

Asuarezaceves
4 min readJun 25, 2023

--

Deploying and initializing a PostgreSQL database doesn’t always have to be a manual process. With Docker Compose, we can not only automate the process but also make it reproducible and portable. This guide will walk you through setting up a PostgreSQL database with a predefined dataset using Docker Compose. A more complicated example can be found here:

Prerequisites:

  1. Docker installed on your local machine or server.
  2. Docker Compose installed on your local machine or server.
  3. Download the predefined dataset in CSV format.
  4. Basic knowledge of Docker and SQL.

Starting our project

  1. Create a new directory

Let’s call the directory postgres_csv_init. You can create it using the following command:

# Create a new directory and cd into it
mkdir postgres_csv_init
cd postgres_csv_init

# Create a Docker Compose file
touch docker-compose.yml

#Create a .env file
touch .env

# Create a Docker Compose file
touch docker-compose.yml
#

Add Environment Variables to .env file

POSTGRES_PASSWORD=initexample
POSTGRES_USER=initexample
POSTGRES_HOST=postgres
POSTGRES_DB=initexample

Step 1: Create Your Docker Compose File

First, create a docker-compose.yml file in your project's root directory. This file will define the services that make up your application. In this case, we're going to use the official PostgreSQL Docker image.

Your docker-compose.yml file should look something like this:

version: '3.1'
services:
db:
image: postgres
restart: always
env_file:
- .env
volumes:
- ./dataset:/docker-entrypoint-initdb.d
ports:
- 5432:5432

This configuration tells Docker Compose to:

  • Use the official postgres image from Docker Hub.
  • Restart the PostgreSQL container if it fails.
  • Use the .env we created earlier.
  • Mount the ./dataset directory from the host machine to the /docker-entrypoint-initdb.d directory inside the Docker container.
  • Map port 5432 inside Docker as port 5432 on the host machine.

Step 2a: Put your CSV in the correct place

As we saw in the first step, we are mounting ./dataset, lets put a file into the ./dataset directory. It will be accessible to the PostgreSQL container at the path /docker-entrypoint-initdb.d/inflation.csv.

# create the data directory and cd into it
mkdir dataset && cd dataset
# Pull the example file
git clone curl -O -L https://raw.githubusercontent.com/alfredosa/cube-js/main/dataset/inflation.csv

Step 2b: Create Your Database Initialization Script

Based on the information in that file, let’s set up our DDL and our Copy command. Next, create a SQL script that will create your database table and import your dataset.

In the ./dataset directory, create a new file named init.sql:

CREATE TABLE inflation_data (
RegionalMember TEXT,
Year INT,
Inflation DECIMAL,
Unit_of_Measurement TEXT,
Subregion TEXT,
Country_Code TEXT
);

COPY inflation_data
FROM '/docker-entrypoint-initdb.d/inflation.csv'
DELIMITER ','
CSV HEADER;

This script will run when the container starts because PostgreSQL’s Docker image is configured to run any .sql files in the /docker-entrypoint-initdb.d directory.

Step 4: Start Your PostgreSQL Container

Once your setup is ready, start your PostgreSQL container with Docker Compose:

docker compose up

Docker Compose will start a PostgreSQL container, run your init.sql script, and import your inflation.csv dataset into your inflation_data table.

Conclusion:

That’s it! You have successfully spun up a PostgreSQL database container and initialized it with a predefined dataset using Docker Compose. This approach provides a reproducible and portable way of setting up databases, ideal for development, testing, and continuous integration environments.

OPTIONAL: Visualize your data with Metabase

In your docker-compose.yml file, add this extra bit:

version: '3.8'
services:
postgres:
image: postgres:latest
env_file:
- .env
volumes:
- ./data:/var/lib/postgresql/data
networks:
- initexample

metabase:
image: metabase/metabase:latest
depends_on:
- postgres
ports:
- 3000:3000
networks:
- initexample

networks:
initexample:
driver: bridge

In the docker-compose.yml file above, we added the Metabase service. The depends_on option ensures that Metabase will not start until the PostgreSQL service is started.

We also created a network called initexample and added both services to it. This allows the services to communicate with each other.

Now, you can start your Docker Compose using the following command:

docker-compose up

Access Metabase

Once the Docker Compose is up and running, you can access Metabase at http://localhost:3000 in your web browser.

Connect Metabase to PostgreSQL

When you first open Metabase, it will ask you to setup a connection to your database. Here are the settings you need to use:

  • Database type: PostgreSQL
  • Host: postgres
  • Port: 5432
  • Database name: initexample
  • Username: initexample
  • Password: initexample

Please note that you should replace ‘initexample’ with whatever you set for POSTGRES_DB, POSTGRES_USER, and POSTGRES_PASSWORD in your .env file.

Now, you should be able to explore your PostgreSQL data using Metabase!

Constantly working on things like this, to impove and learn granularity of services.

--

--