Kubernetes + Postgres Cluster From Scratch on Rocky 8
Co-authored by Brian Pace
I was excited to hear that Kubernetes 1.22 was recently released with better support for cgroup-v2 and has support for Linux swap. These changes potentially resolve two of my chief complaints about running Postgres under Kubernetes. Obviously it will take some time before we see uptake in the wild on these features, but I wanted to become familiar with them.
For what it's worth, I also want to eventually play with the new alpha seccomp support in Kubernetes v1.22, but will have to save that for another day. In the meantime, if you are interested in Seccomp with PostgreSQL, you can see my presentation from FOSDEM that discusses a Postgres extension I wrote, unsurprisingly called pgseccomp.
Disclaimer
I am not a Kubernetes Admin by profession, but rather an old database-curmudgeon(tm), so please be careful if you try this at home, and don't flame me if I don't approach everything in the canonical way.
The Goal
Specifically, I set out to build a three node v1.22 Kubernetes cluster with one main control-plane node and two worker-only nodes. However, I found that virtually every "recipe" I came across for doing this (using equivalent distributions, e.g. CentOS 8) would result in a non-working cluster, even when not trying to enable swap and/or cgroup v2.
And of course, once the cluster is up and running, my next desire was to install the Crunchy Data Operator v5 and deploy PostgreSQL starting from the examples.
So I enlisted some help from my friend and colleague Brian Pace, and documented my own successful recipe below.
The Journey
First, I started with a mostly vanilla Virtual machine image created with Rocky Linux 8.4 installed from ISO.
- Mostly defaults
- Server install, with desktop
- Enable networking
- Enable setting time from network
- Create local user as an admin
After initial setup and rebooting into a working Rocky 8 base instance, I shut down the VM and made three copies of the qcow2 images. From there I created three VMs, each using one of the base-image copies.
On each of my three kube node VMs, I noted the MAC address for the network interface, and set up DHCP Static Mappings for the kube nodes by MAC address. I also set the desired hostnames -- kube01, kube02, and kube03.
Left TODO
Note that, in addition to seccomp, I have also punted on enabling the firewall, and on running with selinux in enforcing. I hope to tackle both of those later.
The Recipe
Without further ado, what follows is my recipe.
Caution: steps below with the preceding comment "... on main control-plane node only" should only be run on the control-plane node (in my case, kube01), and the ones with the preceding comment "... from worker-only nodes" should only be run on the worker-only nodes. Also note that this setup is in a lab as it is never recommended to run a single node control-plane configuration for production or other critical environments.
Basic Node Setup
Unless otherwise stated, each of the steps should be performed on each host with the necessary host name, ip, etc. modifications. Start from Rocky Linux 8 fresh server install as outlined above, and note the MAC address for the network interface. Set up DHCP Static Mappings for the kube nodes by MAC address, and then change the following variable values to suit your setup:
### variables setup ###
# IP Address for main control-plane node
MC_IP=<your-node-IP-here>
# POD network subnet
POD_NETWORK_CIDR="10.244.0.0/16"
# Hostname for the current node
MYHOSTNAME=kube01
#MYHOSTNAME=kube02
#MYHOSTNAME=kube03
# My username
LCLUSER=jconway
Local user setup
Next install ssh public key for my local username:
mkdir /home/${LCLUSER}/.ssh
vi /home/${LCLUSER}/.ssh/authorized_keys
# paste desired ssh public key and save
# ssh will not be happy if permissions are not correct
chmod 700 /home/${LCLUSER}/.ssh
chmod 600 /home/${LCLUSER}/.ssh/authorized_keys
Node setup
Update the system to get the latest fixes and fix the hostname:
sudo dnf update
# If needed, reset the desired hostname
# This may be required, for example, if the current
# host VM was cloned from a base image
sudo hostnamectl set-hostname ${MYHOSTNAME}
Kubernetes specific setup
Allow running with swap on
OUTFILE=/etc/sysconfig/kubelet
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
EOF'
Put selinux in permissive for now (but should be fixed later!)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config
Setup required sysctl params -- these persist across reboots
OUTFILE=/etc/sysctl.d/k8s.conf
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF'
sudo sysctl --system
systemd doesn't use cgroup v2 by default; configure the system to use it by
adding systemd.unified_cgroup_hierarchy=1
to the kernel command line
sudo dnf install -y grubby && \
sudo grubby \
--update-kernel=ALL \
--args="systemd.unified_cgroup_hierarchy=1"
Turn on controllers that are off by default; cpu controller, at least, seems to be required for the kubelet service to function
OUTFILE=/etc/systemd/system.conf
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultIPAccounting=yes
DefaultBlockIOAccounting=yes
EOF'
Seems that reboot is the only way to make the /etc/systemd/system.conf
changes
take effect and reboot is of course needed for the kernel command line args
change anyway
sudo reboot
After the reboot we need to redo the variables setup
# IP Address for main control-plane node
MC_IP=<your-node-IP-here>
# POD network subnet
POD_NETWORK_CIDR="10.244.0.0/16"
Verify setup
# show swap is on
swapon --show
# check for type cgroup2
mount -l|grep cgroup
# check for cpu controller
cat /sys/fs/cgroup/cgroup.subtree_control
Install docker-ce
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf -y remove runc
sudo dnf -y install docker-ce --nobest
Tell docker to use systemd for cgroup control
sudo mkdir /etc/docker
OUTFILE=/etc/docker/daemon.json
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF'
Enable and start the docker service
sudo systemctl enable --now docker
sudo systemctl status docker
Disable firewall for now (but fix later!): ports needed can be seen here and here.
sudo systemctl stop firewalld
sudo systemctl disable firewalld
Create kubernetes repo file (note that the "el7" is intentional)
OUTFILE=/etc/yum.repos.d/kubernetes.repo
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF'
Install Kubernetes
sudo dnf -y install kubelet kubeadm kubectl --disableexcludes=kubernetes
Make systemd the kubelet cgroup driver
sudo mkdir -p /var/lib/kubelet
OUTFILE=/var/lib/kubelet/config.yaml
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
EOF'
Enable and start the kubelet service (note that the kubelet service fails until the init/join step is run)
sudo systemctl enable --now kubelet
sudo systemctl status kubelet
Init Kubernetes on main control-plane node only. Don't forget to capture the "kubeadm join ..." output
sudo kubeadm init --pod-network-cidr=${POD_NETWORK_CIDR} --apiserver-advertise-address=${MC_IP} --kubernetes-version stable-1.22 --ignore-preflight-errors="Swap"
Enable root to run kubectl on main control-plane node only
sudo bash -c 'mkdir -p $HOME/.kube'
sudo bash -c 'cp -i /etc/kubernetes/admin.conf $HOME/.kube/config'
Install networking on main control-plane node only
sudo kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Join kube cluster from worker-only nodes. Note that you should use the actual
token emitted during the init on the main control-plane node. If you forgot to
record it, run the following command on the main control-plane node:
kubeadm token create --print-join-command
sudo kubeadm join ${MC_IP}:6443 --token wze441.4v1fexq9ak1ew8eq \
--discovery-token-ca-cert-hash sha256:44bda0ab055d721ae00a1ab8f0d45b0bf5690501209c26a810bf251688891f84 \
--ignore-preflight-errors="Swap"
View state of play on main control-plane node
sudo kubectl get nodes
sudo kubectl get deployment,svc,pods,pvc,rc,rs --all-namespaces
sudo kubectl get deployment,svc,pods,pvc,rc,rs --all-namespaces -o wide|less
Installing the Crunchy pgo Operator
At this point, you should have a fully functional Kubernetes cluster ready for you to enjoy. The next step we want to take is to install something useful, so I am going to start with the Crunchy pgo operator.
Install the kube cluster configuration on your local machine (in my case, my desktop)
scp ${MC_IP}:/home/jconway/admin.conf $HOME/.kube/kube01.config
export KUBECONFIG=$HOME/.kube/kube01.config
kubectl get nodes
The output of the last command there should look something like this
NAME STATUS ROLES AGE VERSION
kube01 Ready control-plane,master 88d v1.22.2
kube02 Ready <none> 88d v1.22.2
kube03 Ready <none> 88d v1.22.2
Grab the pgo operator examples repo from github
cd ${HOME}
git clone git@github.com:CrunchyData/postgres-operator-examples.git
cd postgres-operator-examples
kubectl apply -k kustomize/install
The output of the last command there should look something like this
namespace/postgres-operator unchanged
customresourcedefinition.apiextensions.k8s.io/postgresclusters.postgres-operator.crunchydata.com configured
serviceaccount/pgo configured
clusterrole.rbac.authorization.k8s.io/postgres-operator configured
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator configured
deployment.apps/pgo configured
Install an appropriate storage class
kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
kubectl patch storageclass openebs-hostpath -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Deploy a Crunchy PostgreSQL Pod
kubectl apply -k kustomize/postgres
kubectl get pods -n postgres-operator
The output of the last command there should look something like this
NAME READY STATUS RESTARTS AGE
hippo-instance1-pd9w-0 3/3 Running 0 10m
hippo-repo-host-0 1/1 Running 0 10m
pgo-69949584b9-65bqw 1/1 Running 0 10m
Exec into the Postgres pod to explore as desired
kubectl exec -it -n postgres-operator -c database hippo-instance1-pd9w-0 -- bash
The operator does create a default user that is the same name as the cluster (hippo in your case). To get that password you can execute the following
export PGPASSWORD=$(kubectl get secret hippo-pguser-hippo -n postgres-operator -o jsonpath={.data.password} | base64 --decode)
Create a NodePort service to allow for connectivity from outside of the Kubernetes cluster.
cat hippo-np.yaml
apiVersion: v1
kind: Service
metadata:
name: hippo-np
spec:
type: NodePort
selector:
postgres-operator.crunchydata.com/cluster: hippo
postgres-operator.crunchydata.com/role: master
ports:
- protocol: TCP
port: 5432
targetPort: 5432
nodePort: 30032
kubectl apply -f hippo-np.yaml -n postgres-operator
Finally connect to PostgreSQL
$ export PGSSLMODE=require
$ psql -h kube02 -p 30032 -U hippo
psql (13.3, server 13.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
hippo=> select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
Cleanup postgres instance only (CAUTION!)
kubectl delete -k kustomize/postgres
Complete cleanup (CAUTION!)
Note: Before removing the Operator, all postgres clusters that were created by the Operator should be deleted first. Failing to do so will cause the finalizer to create some issues.
kubectl get postgrescluster --all-namespaces
kubectl delete -k kustomize/install
Summary
In this article I showed you how to create a Kubernetes cluster from scratch, including getting it to run with cgroup v2 and swap turned on.
I also showed the basic deployment of the Crunchy Data pgo operator and a PostgreSQL pod.
I hope in the future to run some tests to understand how this setup behaves versus the more common (but in my opinion likely less resilient) environment with cgroup v1 and swap turned off. Stay tuned!