Docker quick-start for non-engineers

Published in

Rock Your Data

5 min readJan 6, 2023

If you are not a software/data engineer, you may consider Docker as something “for more tech-heavy roles.” But in practice, non-engineering roles can easily benefit from some of the features Docker provides too!

Docker may be handy when you need:

Quicky run a sandbox Linux CLI to test something “highly experimental” without the risk of destroying the whole environment;
An instance of a specific version of a specific DB to run SQL for testing, debugging, or adding a new feature. Or maybe upload raw data and do a brief SELECT using your favorite db?
Or even develop an MVP of ETL solution locally with db and orchestrator such as Airflow or Prefect? Are you sure you aren’t an engineer?
In general, to have a bunch of popular data products up and running locally and secured against destruction you may cause by your experiments.

Important things to know

Image

The blueprint of the future Docker Container. An Image consists of an Operation System and layers of additional apps. Layers are described as a code using a text file called Dockerfile.

Dockerfile should always start with the description of a base image: FROM.
Then you may describe additional layers of an image; for example: set up a work directory using WORKDIR, copy files to this work directory COPY, run CLI commands RUN. An example of a Dockerfile:

FROM ubuntu:20.04
WORKDIR /app
COPY . .
RUN apt-get update && apt-get upgrade

Container

An app ran in an isolated environment and has been previously built using the Image.

Volume

The directory you connect to the container. The Volume may be connected directly to the specific container or as a shared folder for multiple containers. You need a Volume to store config files with sensitive data, database files, or case-specific files you don't want to propagate to all containers.

The Volume’s files won’t be deleted after stopping a container, so you can support a long-term environment you can change over time.

Network

A virtual local network enables you to connect containers with your computer and between each other. There are 3 main modes you can use Network:

bridge — containers can interact with each other like web-server and databases;
host — to access your computer’s local network;
none — no network access for containers.

Docker Hub

Official Docker volume registry. Published volumes are stored in Docker Hub. There are also other public Docker registries:

Docker hands-on

The first thing you need to do is install the Docker engine

Run Linux CLI example

Let’s try something simple: run a container with Linux

docker run -it ubuntu bash

-it means we want to run the container in interactive mode;
ubuntu is the image you run;
bash is the entry point of a container (command you are running after initialization of the container).

From here, you can do everything you usually do in a Linux CLI, like installing new libs using pip, managing files and folders, or even deleting all the Ubuntu’s files, including the whole OS using the commandrm -rf / . Please try such fun inside a container only but NOT inside your primary OS.

Run DB example

Let’s try something more complex such as running DB. We will need to use a volume here to store DB files.

docker run -it \
  -e POSTGRES_USER="root" \
  -e POSTGRES_PASSWORD="root" \
  -e POSTGRES_DB="dockerized_db" \
  -v $(pwd)/dockerized_db:/var/lib/postgresql/data \
  -p 5431:5432 \
  postgres:13

-e {VAR_NAME}parameters like POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DBallow us to set up environment variables we use later to access the DB from a DB app like DBeaver or DataGrip;
-e {PATH}is for setting up a volume we use to store essential DB files;
-p {PORT}allows us to route the local machine’s port to the container’s port and therefore access the DB.
postgres:13 — the name of the container with the version specified in the tag part using the container_name:tag format

Run the Docker using the multiline command above, and then access it using the described credentials like in the following example.

Example connection window to the ran container from the DataGrip — Example connection to the connterized DB from the DataGrip

Docker compose

Running one app is great, but what if we need more? Opening multiple terminals and using them to run multiple docker containers doesn’t seem right.

Let's look at the way of running multiple interconnected containers — Docker compose. We will run Postgres and PgAdmin together to access the DB from the web interface of PgAdmin.

First, we need to create docker-compose.yaml file. It will store instructions on how to build all the containers we need:

services:
  pgdatabase:
    image: postgres:13
    environment:
      - POSTGRES_USER=root
      - POSTGRES_PASSWORD=root
      - POSTGRES_DB=dockerized_db
    volumes:
      - "./dockerized_db:/var/lib/postgresql/data:rw"
    ports:
      - "5432:5432"
  pgadmin:
    image: dpage/pgadmin4
    environment:
      - PGADMIN_DEFAULT_EMAIL=root@root.com
      - PGADMIN_DEFAULT_PASSWORD=root
    ports:
      - "8080:80"

In the first part (services: pgdatabase) we described the same configuration we already used to run Postgres.

In the second part (services: pgadmin) we described the config of the PgAdmin: credential and, again, port routing.

To run these containers together, we first need to stop the previous container we ran in the terminal. Use ctrl+C to stop it in the terminal.

PS: You can always check the list of all active containers using the docker ps command. To stop container use docker stop {CONTAINER_ID}.

PS: Alternatively, you can see all the containers and run/stop them using Docker Desktop UI.

Then run our two containers from the YAML config withdocker-compose up command.

Before running this, you should be in the folder with the YAML file so your command could find the config file

Open PgAdmin in your browser at localhost:8080 and use it to connect to the DB with the credentials described in the YAML file.

Links

CC BY-NC-SA 4.0