Mastering PostgreSQL with Docker: A Step-by-Step Tutorial

Jihwan
4 min readOct 30, 2023

--

PostgreSQL + Docker. Image by author.

What is Docker?

docker is a powerful container management tool. It allows developers to package code, operation systems, and dependencies into a single unit called a “container”. This container can be run consistently across various environments. Instead of manually setting up software like databases, servers, or programming languages, you can simply download pre-packaged Docker images to get started quickly.

Why choose PostgreSQL?

when considering databases, two popular options come to mind: MySQL and PostgreSQL. Both have their merits, but it’s essential to understand their differences to determine the best fit for your project.

MySQL

  • Known for its ease of installation and lightweight nature, MySQL is a good choice for smaller projects or applications that don’t require advanced database features.
  • Concurrency: It uses lock-based concurrency. It was used for some write operations, potentially reducing write performance in high concurrency scenarios.
  • JSON Support: It offers JSON support, but it’s not as robust as PostgreSQL’s

PostgreSQL

  • If your system demands more intricate database relationships or you frequently use complex join queries, PostgreSQL stands out. It boasts an array of join functions, including the efficient hash join for combining large datasets.
  • Concurrency: It uses MVCC(Multi-Version Concurrency Control) to enhance concurrent performance, allowing readers to not block writers and vice-versa.
  • JSON Support: It offers advanced JSON support and JSONB data type, allowing indexing of JSON data.
Compare with MySQL and PostgreSQL. Image by author.

Executing PostgreSQL with Docker

Process

Step 1: Pull the PostgreSQL Image

retrieve the official PostgreSQL image from Docker Hub.

$ docker run -d -p 5432:5432 --name postgres -e POSTGRES_PASSWORD=$USER_PASSWORD postgres

Step 2: Verify the Container

Ensure the PostgreSQL container is running.

$ docker ps -a 

Step 3: Access the Container Shell

Use the ‘-it’ flag to interact with the container. This will start a new shell inside the running container.

$ docker exec -it postgres /bin/bash

Step 4: Set up PostgreSQL Inside the Container

# Access the PostgreSQL prompt:
root@32e33cbec30c:/# psql -U postgres

# Create a new user named 'admin'
postgres=# CREATE USER admin PASSWORD 'admin' SUPERUSER;

# Create a new database owned by 'admin'
postgres=# CREATE DATABASE test OWNER admin;

# Connect to the 'test' database
postgres=# \c test admin

# Define and create a table to store stock data
test=# CREATE TABLE stock (
id serial PRIMARY KEY,
name VARCHAR(32),
symbol VARCHAR(32),
date date,
close_price float,
high_price float,
low_price float
);

# Confirm that the table has been created
test=# \d stock

Uploading Data from Polygon to Database

In this guide, we’ll fetch US stock data from the Polygon website and upload it to a PostgreSQL database. First, make sure you’ve obtained an API key from the Polygon website.

Step 1: Fetch Stock Data from Polygon

import pandas as pd
from datetime import datetime, date, timedelta
import requests
# Initialize your Polygon API key
POLYGON_API_KEY=""

def get_stocks(symbols, start_date, end_date):
# Initializing an empty dataframe with columns
columns = ['symbol', 'name', 'high_price', 'low_price', 'close_price', 'date']
df = pd.DataFrame(columns=columns)

for symbol in symbols:
url = f"https://api.polygon.io/v2/aggs/ticker/{symbol['symbol']}/range/1/day/{start_date}/{end_date}?adjusted=true&sort=asc&apiKey={POLYGON_API_KEY}"
res = requests.get(url).json()

if 'results' in res: # Ensure 'results' key exists in the response
for stock in res['results']:
date_str = datetime.fromtimestamp(int(stock['t'])/1000).strftime("%Y-%m-%d")

# Appending data to the dataframe
df.loc[len(df)] = [res['ticker'], symbol['name'], stock['h'], stock['l'], stock['c'], date_str]

return df

start_date = date.today() - timedelta(days=7)
end_date = date.today()

df = get_stocks([{'symbol': "TSLA", "name": "Tesla"},
{'symbol': "AAPL", "name": "Apple"},
{'symbol': "META", "name": "Facebook"},
{'symbol': "AMZN", "name": "Amazon"},
{'symbol': "NFLX", "name": "Netflix"}],
start_date,
end_date)

Step 2: Connect to the Database using SQLAlchemy

import psycopg2
import pandas as pd
from sqlalchemy import create_engine

conn_string = 'postgresql://admin:admin@localhost:5432/postgres'

db = create_engine(conn_string)
conn = db.connect()

Step 3: Write Data to the Database

Push the fetched stock data (stored in the dataframe) to your PostgreSQL database

df.to_sql('stock', con=conn, if_exists='replace', index=False)

[Optional] Mounting a Volume in PostgreSQL with Docker

If you stop a container, your data remains intact. However, if you remove the container, you risk losing all your data. To solve this, it’s wise to create a volume to store your database data. This ensures data persistence even if the container is deleted. There are two primary methods to handle data persistence:

1.Save Data to a Docker Volume

Using Docker’s built-in volume management, you can store PostgreSQL data:

# stop and remove the current container
$ docker stop postgres
$ docker rm postgres

# create a New Docker Volume
$ docker volume create pgdata

# run postgreSQL container with volume attached
$ docker run -d -p 5432:5432 --name postgres -e POSTGRES_PASSWORD=$USER_PASSWORD -v pgdata:/var/lib/postgresql/data postgres

To find out where the Docker Volume is located:

$ docker volume list
$ docker volume inspect pgdata

2. Save Data to your Local Machine

For those who prefer not to use Docker’s volume system and would rather map the data directly to a folder on the host machine:

# create a directory on Host
$ mkdir pgdata

# run postgreSQL container with Local Directory attached
$ docker run -d -p 5432:5432 --name postgres -e POSTGRES_PASSWORD=$USER_PASSWORD -v ~/pgdata:/var/lib/postgresql/data postgres

# inspect local directory
$ cd pgdata
$ ls -l

--

--

Jihwan

Interested in Machine Learning, Data Science, AI. I enjoy learning and applying new knowledge.