Inspired by scyto’s gist, Drallas’s gist Funky Penguin’s Cookbook, and Brandon Lee’s blog
VM setup
- 3 nodes
- Ubuntu 24.04
- 2 vCPU
- 4G RAM
- 84G OS drive
- 16G /var/lib/docker
- 60G Ceph drive
- Extra vNIC on vxlan network for backend stuff (Netplan Static IP)
network: version: 2 ethernets: ens18: dhcp4: false addresses: - 192.168.1.21/24 routes: - to: default via: 192.168.1.1 nameservers: addresses: - 192.168.1.1 match: macaddress: bc:24:11:3a:90:49 set-name: ens18 ens19: dhcp4: false addresses: - 10.0.0.21/24 mtu: 1450 # reduced mtu because vxlan match: macaddress: bc:24:11:88:45:24 set-name: ens19
- Add private interface IPs to /etc/hosts for each host, and include
swarm
VIP as well192.168.1.20 swarm 10.0.0.21 swarm1 10.0.0.22 swarm2 10.0.0.23 swarm3
- Firewall rule for cluster network
sudo ufw allow from 10.0.0.0/24 to any comment "cluster comms"
- Enable required kernel modules (some of which are blocked on my hardened Ubuntu24 template):
modules="br_netfilter ceph ip_vs netfs overlay" for module in $modules; do sudo rm -f /etc/modprobe.d/$module-blacklist.conf sudo modprobe $module echo $module | sudo tee -a /etc/modules-load.d/docker-swarm.conf done
Microceph
- All nodes:
# install ceph sudo snap install microceph sudo snap refresh --hold microceph # config firewall sudo ufw allow from 192.168.1.0/24 to any proto tcp port 3300,6789,6800:7300,7443 comment "microceph public"
swarm1
sudo microceph cluster bootstrap --cluster-network 10.0.0.0/24 --public-network 192.168.1.0/24 # generate/capture join tokens sudo microceph cluster add swarm2 sudo microceph cluster add swarm3
- other nodes:
sudo microceph cluster join <token>
- all nodes:
sudo microceph disk add --all-available --wipe
- cephfs
- on one node:
# check status sudo ceph -s # create pools sudo ceph osd pool create cephfs_meta sudo ceph osd pool create cephfs_data # create cephfs share sudo ceph fs new cephfs cephfs_meta cephfs_data sudo ceph fs ls
- on all clients
# install mount handler sudo apt install ceph-common # link keyring and config sudo ln -s /var/snap/microceph/current/conf/ceph.keyring /etc/ceph/ceph.keyring sudo ln -s /var/snap/microceph/current/conf/ceph.conf /etc/ceph/ceph.conf # create mountpoint and mount fs sudo mkdir /cephfs echo ":/ /cephfs ceph name=admin,_netdev 0 0" | sudo tee -a /etc/fstab sudo systemctl daemon-reload sudo mount -a
- on one node:
Docker setup
- Install Docker on each node
- Initialize Swarm
- On
swarm1
:sudo docker swarm init --advertise-addr 10.0.0.21 sudo docker swarm join-token manager
- On the other two:
sudo docker swarm join --token [TOKEN] 10.0.0.21:2377
- On
swarm1
, verify the nodes are joinedsudo docker node ls
- On
KeepaliveD setup
- All nodes:
sudo apt-get install keepalived # configure firewall sudo ufw allow proto vrrp from 192.168.1.0/24 to 224.0.0.18 comment "keepalived"
- Create monitoring script on all (managing) nodes:
#!/usr/bin/env -S bash -eu # /usr/local/bin/node_active_ready_check.sh status=$(docker node ls --format "{{.Status}} {{.Availability}}") if [[ "$status" == *"Ready"* && "$status" == *"Active"* ]]; then echo "Node is active and ready." exit 0 else echo "Node is not active or not ready." exit 1 fi
- Make it executable
sudo chmod 755 /usr/local/bin/node_active_ready_check.sh
- Create
/etc/keepalived/keepalived.conf
onswarm1
global_defs { router_id DOCKER_INGRESS vrrp_startup_delay 5 max_auto_priority script_user root } vrrp_track_process track_docker { process dockerd } vrrp_script node_active_ready_check { script "/usr/local/bin/node_active_ready_check.sh" interval 2 timeout 5 rise 3 fall 3 } vrrp_instance docker_swarm { state MASTER interface ens18 virtual_router_id 20 priority 100 advert_int 1 authentication { auth_type PASS auth_pass hunter2 } virtual_ipaddress { 192.168.1.20/24 } track_process { track_docker weight -25 } track_script { node_active_ready_check } }
- Copy same file to
swarm2
, but changestate MASTER
tostate BACKUP
, and changepriority 100
topriority 90
. Repeat forswarm3
, but drop priority to 80. - Starting from
swarm1
,sudo systemctl enable --now keepalived
.
Portainer agent
(Assuming that Portainer is being managed from another system)
- Create
portainer-agent.yml
:services: portainer_agent: image: portainer/agent:2.21.4 ports: - "9001:9001" networks: - portainer_agent_network volumes: - /var/run/docker.sock:/var/run/docker.sock - /var/lib/docker/volumes:/var/lib/docker/volumes deploy: mode: global networks: portainer_agent_network: driver: overlay attachable: true
- Deploy (from one node):
sudo docker stack deploy -c ./portainer-agent.yml portainer-agent
- Register the new environment in portainer using
192.168.1.20:9001
as the endpoint.