紀錄 docker swarm node 清理細節 - Learn or Die - Medium

Ray Lee | 李宗叡

Published in

Learn or Die

Aug 13, 2024

--

# 前言

紀錄 docker swarm node 清理細節

# 問題點

某天發現 Swarm 當中的某一個 node 上面沒有 container，但理論上該有 1 個 container
使用 docker service update serviceName，發現有一個 container update 失敗，error message 為 service update paused: update paused due to failure or early termination of task db7p7ej7wname6kepoexn0qpn
查詢後得到關鍵訊息，很有可能是 node 沒有足夠空間了
進而使用 df -h，果然發現 disk usage 被吃滿了
查詢一番後，發現是每次新的 commit 都會產生新的 image & 部署新的 container，而舊的 image 並不會被自動清除，所以把 disk 吃滿了

# Solution

使用 docker system prune --force --all，可以把以下的資料清
- stopped containers
- unused images
- build cache
- unused networks
如果要每次都 ssh 到每一個 node 來執行 prune 太沒效率，因此可以啟動一個 service 專門幹這事

system-prune:
  image: alpinelinux/docker-cli
  ## 使 container 可以跟 host 的 docker process 溝通
  volumes:
    - "/var/run/docker.sock:/var/run/docker.sock"
  ## 清理的 command
  command: docker system prune --all --force
  deploy:
    ## global mode 會在每一個 node 都啟動一個 container
    mode: global
    restart_policy:
      ## 每 24H 啟動一次
      delay: 24h

Ray Lee | 李宗叡

Written by Ray Lee | 李宗叡

Editor for

Learn or Die

It's Ray. I do both backend and frontend, but more focus on backend. I like coding, and would like to see the whole picture of a product.

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams