Implementing High Availability and Persistent Storage with Docker Swarm and GlusterFS

Docker Swarm, while enabling the deployment of stacks, lacks built-in volume management. This can be a challenge when you need to distribute and synchronize assets, such as a website, across multiple nodes to prevent discrepancies between replicas. GlusterFS (GFS) comes to the rescue by providing a solution for maintaining synchronized volumes across a Docker Swarm.

Enabling Docker Swarm

To set up a Docker Swarm, follow these steps:

1. Initialize Swarm on the manager node (e.g., Raspberry Pi):

   docker swarm init

2. Add worker nodes to the swarm using the token obtained from the previous step:

   docker swarm join --token <long token> <manager-ip:port>

3. Check the swarm members:

   docker swarm ls

Setting Up GlusterFS

GlusterFS ensures synchronized volumes across nodes:

1. Install GlusterFS on each node:

   sudo add-apt-repository ppa:gluster/glusterfs-11
   sudo apt update && sudo apt install glusterfs-server -y

2. Start and enable the GlusterFS service:

   sudo systemctl start glusterd && sudo systemctl enable glusterd

3. Check the peer status and pool list:

   sudo gluster peer status
   sudo gluster pool list

4. Create a local directory on each machine for GlusterFS to write data:

   sudo mkdir -p /mnt/nodirectwritedatahere/gfsbrick

5. Link the directories with GlusterFS volumes:

   sudo gluster volume create replicated_volume replica 2 \
   srv00.facundoitest.space:/mnt/nodirectwritedatahere/gfsbrick \
   srv03.facundoitest.space:/mnt/nodirectwritedatahere/gfsbrick force

6. Start the GlusterFS volume:

   sudo gluster volume start replicated_volume

7. Check the volume status:

   sudo gluster volume status

Mounting GlusterFS Volumes

Mount the GlusterFS volume to a local directory on each worker node:

1. Mount the volume:

   sudo mount -t glusterfs srv00.facundoitest.space:/replicated_volume /mnt/swarm/

2. Edit `/etc/fstab` to ensure the volume is mounted at startup:

   srv00.facundoitest.space:/replicated_volume   /mnt/swarm   glusterfs   defaults,_netdev   0   0

Esto viene a resolver que fstab no deja listo el volumen replicado de glusterfs porque systemd procesa a fstab antes de cargar el servicio. Entonces docker se queja de que no existe el directorio y sin embargo mount -a funciona perfecto. Tal vez otra manera viable es con sleep && mount -a en el crontab. No probé.

# systemd mounts glusterFS replicated volume after loading the server service
# The ‘noauto’ option means that autofs won’t touch the mount and systemd will have control over it.
srv04.facundoitest.space:/replicated_volume							/mnt/swarm	glusterfs	defaults,_netdev,noauto,x-systemd.automount	0	0

Deploying Services

Now, you can deploy services using the synchronized volume:

1. Create a `docker-compose.yml` file for the desired service (e.g., homepage):

   version: "3.3"
   services:
     homepage:
       image: ghcr.io/benphelps/homepage:latest
       restart: always
       ports:
         - 3080:3000
       volumes:
         - /mnt/swarm/homepage:/app/config
       deploy:
         replicas: 2
         restart_policy:
           condition: on-failure
         placement:
           constraints: [node.role != manager]

2. Deploy the service:

   docker stack deploy -c docker-compose.yml myservice

Additional Resources

For more in-depth information and guides on implementing Docker Swarm with GlusterFS, refer to these resources:

- [Tutorial: Create a Docker Swarm with Persistent Storage Using GlusterFS](https://thenewstack.io/tutorial-create-a-docker-swarm-with-persistent-storage-using-glusterfs/)

- [Setup 3-Node High Availability Cluster with GlusterFS and Docker Swarm](https://medium.com/running-a-software-factory/setup-3-node-high-availability-cluster-with-glusterfs-and-docker-swarm-b4ff80c6b5c3)

- [Deploying a Docker Stack Across a Docker Swarm Using a Docker Compose File](https://towardsaws.com/deploying-a-docker-stack-across-a-docker-swarm-using-a-docker-compose-file-ddac4c0253da)

- [How to Create a Redundant Storage Pool using GlusterFS on Ubuntu](https://www.cyberciti.biz/faq/howto-glusterfs-replicated-high-availability-storage-volume-on-ubuntu-linux/)

- [GlusterFS Brick Naming Conventions](https://docs.gluster.org/en/main/Administrator-Guide/Brick-Naming-Conventions/#set-up-a-gluster-volume)

- [Creating a Redundant Storage Pool Using GlusterFS on Ubuntu 20.04](https://www.digitalocean.com/community/tutorials/how-to-create-a-redundant-storage-pool-using-glusterfs-on-ubuntu-20-04#step-2-setting-up-software-sources-on-each-machine)

- [Adding New Bricks to an Existing GlusterFS Replicated Volume](https://www.cyberciti.biz/faq/howto-add-new-brick-to-existing-glusterfs-replicated-volume/)

By combining Docker Swarm and GlusterFS, you can achieve high availability and synchronize volumes effectively for your distributed applications.

Adding a New Node (Brick) to an Existing Replicated Volume

To extend your GlusterFS volume with additional storage capacity, you can add a new node (brick) to the existing replicated volume. Follow these steps:

1. Install GlusterFS:

Make sure GlusterFS is installed on the new node and is using the same version as the existing nodes.

2. Create the Working Directory:

Create a directory on the new node where GlusterFS will write data. This directory should be located in a path similar to the existing bricks.

3. Test Connectivity:

Ensure that the new node can communicate with the GlusterFS daemon on the other nodes by testing the peer connection:

   sudo gluster peer probe newNodeName.mydomain.tld

4. Add the New Brick:

Add the new brick to the existing replicated volume using the following command:

sudo gluster volume add-brick replicatedVolume replica n newNodeName.mydomain.tld:/nodirectwritedata/gfs_vol

Replace `replicatedVolume` with the name of your existing volume, `n` with the desired replica count (including the new node), and `newNodeName.mydomain.tld` with the hostname or IP address of the new node.

Resources: https://www.cyberciti.biz/faq/howto-add-new-brick-to-existing-glusterfs-replicated-volume/

Removing an Inactive, Unsynced Brick

Removing an inactive brick that hasn't been synchronized can be useful when a node becomes unavailable. Here's how to do it:

1. Unsynced Brick Removal:

To remove an unsynced brick from the volume, use the following command:

 sudo gluster volume remove-brick replicated_volume replica 3 srv00.facundoitest.space:/mnt/nodirectwritedatahere/gfsbrick force

In this example, replicated_volume is the name of the volume, replica 3 indicates the new desired replica count, srv00.facundoitest.space is the hostname of the inactive node, and /mnt/nodirectwritedatahere/gfsbrick is the path to the unsynced brick. Use the force flag to proceed.

By following these steps, you can efficiently expand your GlusterFS volume with new nodes or remove inactive bricks as needed.

El original sin formatear:

El orquestador de docker swarm habilita el despliegue de stacks, pero no tiene integrado el manejo de los volúmenes. Entonces si hay que pompartir, por ejemplo los assets de un sitio web y mantenerlo actualizado entre todos los nodos (para que no se hagan cambios en una répica y el resto quede desactualizada), swarm no lo maneja por sí mismo y es necesario usar, por ejemplo NFS o (como en este caso) GlusterFS (GFS).

Una vez instalado el engine en todas las máquinas:

habilitar swarm en el manager (raspberry pi)

docker swarm init

agregar hosts al swarm con el script que devuelve

docker swarm join --token <token largo> <ip.del.swarm.manager:puerto>

ver los miembros del swarm

docker swarm ls

Los 'workers' pueden ser de distintas arquitecturas, en ese caso hay que usar tags y manifestos para hacerle enteneder al manager que no todos llevan la misma imagen. Algunos usarán AMD64, otros ARM64, otros x86, otros armv7, etc. En este caso en particular, el manager es un armhf/armv7 (raspberry pi 2) y los workers son VMs AMD64.

En este punto ya estaría listo el swarm en sí. Ahora queda instalar GlusterFS para mantener sincronizados los volúmenes. De esta manera en cada worker habría una carpeta con exactamente el mismo contenido, uno al declarar el deployment apuntaría el volumen del contenedor/servicio ahí y no habría discrepancias entre cada réplica del servicio.

Instalar glusterfs

sudo add-apt-repository ppa:gluster/glusterfs11 (ver las versiones que mantienen en los ppa acá: https://launchpad.net/~gluster y las versiones en https://docs.gluster.org/en/main/release-notes/)

sudo apt update && sudo apt install glusterfs-server -y

lanzar y habilitar

sudo systemctl start glusterd && sudo systemctl enable glusterd

sudo gluster peer status
sudo gluster pool list

crear el directorio local en todas las máquinas para que glusterfs escriba ahí

sudo mkdir -p /mnt/nodirectwritedatahere/gfsbrick

ahora la magia, una vez que están los directorios en cada worker, hay que vincularlos con el volumen

sudo gluster volume create <volume name> replica <# of replicas> worker01:/path/to/local/dir worker02:/path/to/local/dir force

en el caso real fue

sudo gluster volume create replicated_volume replica 2 srv00.facundoitest.space:/mnt/nodirectwritedatahere/gfsbrick srv03.facundoitest.space:/mnt/nodirectwritedatahere/gfsbrick force

arrancar el volumen

sudo gluster volume start replicated_volume

comprobar el estado

sudo gluster volume status

para modificar el heartbeat

sudo gluster volume set replicated_volume network.ping-timeout "5"

montar el volumen virtual gfs en el directorio local en cada worker

sudo mount -t glusterfs srv00.facundoitest.space:/replicated_volume /mnt/swarm/

editar fstab así se monta solo al inicio

srv00.facundoitest.space:/replicated_volume 			/mnt/swarm 				glusterfs 	defaults,_netdev 	0	0

si se crean archivos en el directorio donde se monta el volumen gFS, deberían aparecer en el resto de los workers. Probar creando 10 archivos sin nada.

sudo touch /mnt/swarm/homepage/test_{0..9}.txt

por último solamente queda desplegar algun servicio usando como volumen la carpeta sincronizada con glusterfs. Por ejemplo, homepage de Ben Phelps (https://github.com/benphelps/homepage)

el docker-compose.yml quedaría así: ´´´ facundo@raspberrypi:~/homepage $ cat docker-compose.yml version: “3.3” services:

homepage:
  image: ghcr.io/benphelps/homepage:latest
  restart: always
  ports:
    - 3080:3000
  volumes:
    - /mnt/swarm/homepage:/app/config # Make sure your local config directory exists. app/config es esa carpeta, no app. O sea todo lo que aparece en config va a crearse ahí en swarm/homepage
  deploy:
    replicas: 2
    restart_policy:
      condition: on-failure
    placement:
      constraints: [node.role != manager] # esta condición es para que no lo haga correr en un swarm manager

´´´

ahora si uno hace workerX:puerto, debería responder el servicio, y si hago manager:puerto también, ya que hace de load balancer el mismo swarm manager.

Recursos: este lo encontré tarde, pero tiene casi todo https://thenewstack.io/tutorial-create-a-docker-swarm-with-persistent-storage-using-glusterfs/

https://medium.com/running-a-software-factory/setup-3-node-high-availability-cluster-with-glusterfs-and-docker-swarm-b4ff80c6b5c3

buen artículo firmado por Brandi nosecuanto, de medium https://towardsaws.com/deploying-a-docker-stack-across-a-docker-swarm-using-a-docker-compose-file-ddac4c0253da

uno de nixCraft https://www.cyberciti.biz/faq/howto-glusterfs-replicated-high-availability-storage-volume-on-ubuntu-linux/

docs.gluster https://docs.gluster.org/en/main/Administrator-Guide/Brick-Naming-Conventions/#set-up-a-gluster-volume

digitalocean sobre gluster https://www.digitalocean.com/community/tutorials/how-to-create-a-redundant-storage-pool-using-glusterfs-on-ubuntu-20-04#step-2-setting-up-software-sources-on-each-machine

agregar nuevos bricks a un gfs existente, por nixCraft https://www.cyberciti.biz/faq/howto-add-new-brick-to-existing-glusterfs-replicated-volume/