Multi-Cluster Postgres Federation on Kubernetes with Postgres 14 Using GitOp
One of the reasons why I like deploying Postgres on Kubernetes with PGO, the open source Postgres Operator, is that it makes it easier to set up distributed database systems. One example is using Postgres for edge computing, and you can build an edge computing architecture with both Postgres and Debezium on Kubernetes.
You can also do some fun things with logical replication and many read/write nodes. I posted an example of doing this last year about how you could set up an active-active federated Postgres cluster on Kubernetes. With the release of version 5 of open source PGO, I thought it'd be neat to update the example to show how you can create an active-active federated Postgres cluster using GitOps.
Even better, we'll see how to do this with Postgres 14 and so we can take advantage of some of its logical replication improvements.
Set Up the Postgres Clusters
Similar to
the previous example,
let's create three Postgres clusters called hippo-east
, hippo-central
,
and hippo-west
. We can do so with the following manifests:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-east
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
users:
- name: postgres
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-central
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
users:
- name: postgres
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-west
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
users:
- name: postgres
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
Note that in each of the manifests, we are telling PGO to create a Kubernetes
Secret postgres
superuser. This allows us to login as postgres
. This is for
convenience: in a production cluster you would lock this down a bit further. We
also
enable monitoring
to ensure that certain Kubernetes introspection functions are available
via pgnodemx
.
For convenience, let's set the password for the postgres
Postgres user to be
the same in each of these clusters. PGO can
set custom passwords
so long as the verifier
attribute in the user Secret is set
(setting password
is for convenience).
Let's set the password to datalake
. For convenience, here is an example
password/verifier encoded in base64 that can be used to authenticate using
password datalake
:
PASSWORD='ZGF0YWxha2U='
VERIFIER='U0NSQU0tU0hBLTI1NiQ0MDk2OkZBblNrVXJMM2pINkJwMTdQNUZUdVE9PSRCZWNMVTBZWHpCVXduVGszYjNnaElaNy96UzJYdkQ4d1g1MEh6NUpMMHE0PTpweHVFdG96WVRacEFjNndJTlY1M3NDcjRBZnhrOExMZm9yNU12a1lwMjFzPQ=='
You can then set the postgres
user to use password datalake
by executing the
following commands:
kubectl patch secret -n postgres-operator hippo-east-pguser-postgres -p \\
"{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
kubectl patch secret -n postgres-operator hippo-central-pguser-postgres -p \\
"{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
kubectl patch secret -n postgres-operator hippo-west-pguser-postgres -p \\
"{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
You can test the password was set by connecting to one or all of the Postgres clusters. Using the port-forward technique, you can test this by doing the following:
PG_CLUSTER_NAME="hippo-east"
PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \\
-l postgres-operator.crunchydata.com/cluster=$PG_CLUSTER_NAME,postgres-operator.crunchydata.com/role=master)
kubectl -n postgres-operator port-forward "${PG_CLUSTER_PRIMARY_POD}" 5432:5432
In a separate terminal, run the following:
PGPASSWORD="datalake" psql -h localhost -U postgres -c 'SELECT version();'
which should yield something similar to:
version
-----------------------------
PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
We now need to provide a way for each Postgres cluster to identify itself. This can be accomplished using some of the helper functions available in pgnodemx, which comes with PGO. For PGO v5, the helper function to accomplish this looks like this:
CREATE EXTENSION IF NOT EXISTS pgnodemx;
CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
SELECT val FROM kdapi_setof_kv('labels') WHERE key\='postgres-operator.crunchydata.com/cluster';
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;
If you have trouble adding it, try dropping and recreating
the pgnodemx
extension.
You will need to run the following in the postgres
database
on hippo-east
, hippo-central
, and hippo-west
. You can verify the function
works by executing hippo.get_node_name()
, for example on hippo-east
:
SELECT hippo.get_node_name();
which should yield:
get_node_name --------------- hippo-east(1 row)
(Just for fun, if you're on Crunchy Bridge you can use the equivalent function:
CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
SELECT regexp_replace(pg_read_file('/etc/hostname'), '\\n', '');
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;
).
With the basis of the Postgres clusters set up, let's now create the schema that lets us build out an active-active federated cluster!
Creating the Schema
We will use a similar schema that we created in
the previous example,
though to add a Postgres 14 twist, we'll include
a multirange data type
(and a Postgres 13 twist of the gen_random_uuid()
being available without
the pgcrypto
extension):
CREATE TABLE hippos (
id uuid DEFAULT gen_random_uuid() NOT NULL,
node_name text,
value datemultirange,
created_at timestamptz
) PARTITION BY LIST (node_name);
CREATE TABLE hippo_default PARTITION of hippos (PRIMARY KEY (id)) DEFAULT;
CREATE TABLE hippo_east PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-east');
CREATE TABLE hippo_central PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-central');
CREATE TABLE hippo_west PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-west');
CREATE OR REPLACE FUNCTION add_node_name()
RETURNS trigger AS $$
BEGIN
UPDATE hippos
SET node_name \= get_node_name()
WHERE node_name IS NULL;
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER add_node_name
AFTER INSERT ON hippos
FOR EACH STATEMENT
EXECUTE FUNCTION add_node_name();
What do we have here? Each partition represents the data that is stored in each
Postgres cluster. This means that hippo-east
Postgres cluster will only write
data to the hippo-east
partition, the hippo-central
cluster will only write
to hippo-central
partition, and the hippo-west
cluster will only write to
the hippo-west
partition.
You can actually bootstrap this data with PGO from a SQL file, but for the purposes of this example I wanted to spell it out.
By setting up our table in this way, we will ensure that data in each active instance is only written to its designated area locally, and then can be distributed to the other nodes. Let's look at how we can distribute the data.
Distributing Writes
We can distribute the data to each instance using logical replication. Unlike physical replication, Postgres logical replication allows for subsets of data to be sent between Postgres instances, and also allows data to be sent between writable nodes.
To use logical replication, you need to set up publications on the source servers, and subscriptions on the destinations.
Let's set up the publications. First on hippo-east
, create the following
publication:
CREATE PUBLICATION pub_hippo_east FOR TABLE hippo_east;
Next, set up the publication on hippo-central
:
CREATE PUBLICATION pub_hippo_central FOR TABLE hippo_central;
Finally, set up the publication on hippo-west
:
CREATE PUBLICATION pub_hippo_west FOR TABLE hippo_west;
We can now set up the subscriptions between all of the instances. We're going to do something special with the connections: because PGO v5 comes with TLS enabled by default, we're going to use stronger TLS protection for our connections.
First, let's create the subscriptions on hippo-east
:
CREATE SUBSCRIPTION sub_hippo_east_hippo_central
CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_central;
CREATE SUBSCRIPTION sub_hippo_east_hippo_west
CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_west;
Note the sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full
section. We need
the TLS CA to perform a full verification of the source instances identity
(verify-full
).
The TLS CA is mounted to /pgconf/tls/ca.crt
, so we can reference it from
there. We can get the host name both from
the user Secret
or from the list of Kubernetes
standard FQDNs.
Let's set up the remaining subscriptions. First, set them up on hippo-central
:
CREATE SUBSCRIPTION sub_hippo_central_hippo_east
CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_east;
CREATE SUBSCRIPTION sub_hippo_central_hippo_west
CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_west;
And finally on hippo-west
:
CREATE SUBSCRIPTION sub_hippo_west_hippo_east
CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_east;
CREATE SUBSCRIPTION sub_hippo_west_hippo_central
CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_central;
With the partitions, publications, and subscriptions in place, now we can start the fun part: adding in the data!
Multiranges on Multi-Nodes!
The schema in place uses Postgres 14's new multirange data type. In essence, a multirange is an ordered list of noncontiguous ranges. For example:
SELECT datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
may yield something like:
datemultirange
---------------------------------------------------
{[2021-10-12,2021-10-14),[2021-10-17,2021-10-20)}(1 row)
Postgres will detect if your ranges overlap, and will concatenate them within the multirange. For example:
SELECT datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 1, CURRENT_DATE + 8)
);
may yield something like:
datemultirange
---------------------------
{[2021-10-12,2021-10-20)}(1 row)
As a final bonus, you can access the individual ranges using
the unnest
function:
SELECT unnest(datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
));
may yield something like:
unnest
-------------------------
[2021-10-12,2021-10-14) [2021-10-17,2021-10-20)(2 rows)
Anyway, I digress. Let's now insert some actual data into our instances. First,
log into hippo-east
and run a query like this:
INSERT INTO hippos (value, created_at)
SELECT
datemultirange(
daterange(CURRENT_DATE + n, CURRENT_DATE + n + (100 \* random())::int),
daterange(CURRENT_DATE + n + 100, CURRENT_DATE + n + 100 + (200 \* random())::int)
),
clock_timestamp()
FROM generate_series(1,25) n;
This creates a bunch of randomly generated datemultirange
types. Notice the
use of clock_timestamp()
-- this returns the actual clock time,
vs. CURRENT_TIMESTAMP
which returns the time of the start of the transaction.
Go ahead and execute this SQL on hippo-central
and hippo-west
.
Log back into hippo-east
. Let's see if the data propagated:
TABLE hippos;
You should see all 75 results! For a more consolidated viewing:
SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;
should yield:
node_name | count
---------------+-------
hippo-east | 25
hippo-central | 25
hippo-west | 25
(3 rows)
Run the query on hippo-central
and hippo-west
and see how your writes have
propagated throughout your active-active cluster!
Scaling Up: Adding A New Instance
Let's say we want to add an additional node to our cluster. No problem. Let's
call this node hippo-south
. First, create hippo-south
from a Kubernetes
custom resource:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-south
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
users:
- name: postgres
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
Similar to before, set the password of the postgres
account to datalake
:
PASSWORD='ZGF0YWxha2U='
VERIFIER='U0NSQU0tU0hBLTI1NiQ0MDk2OkZBblNrVXJMM2pINkJwMTdQNUZUdVE9PSRCZWNMVTBZWHpCVXduVGszYjNnaElaNy96UzJYdkQ4d1g1MEh6NUpMMHE0PTpweHVFdG96WVRacEFjNndJTlY1M3NDcjRBZnhrOExMZm9yNU12a1lwMjFzPQ=='
kubectl patch secret -n postgres-operator hippo-south-pguser-postgres -p \\
"{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
Once hippo-south
is initialized, log in and add the partitioning function:
CREATE EXTENSION IF NOT EXISTS pgnodemx;
CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
SELECT val FROM kdapi_setof_kv('labels') WHERE key\='postgres-operator.crunchydata.com/cluster';
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;
Create the hippos
data structure on the hippo-south
instance. Note the
additional of the hippo-south
partition:
CREATE TABLE hippos (
id uuid DEFAULT gen_random_uuid() NOT NULL,
node_name text,
value datemultirange,
created_at timestamptz
) PARTITION BY LIST (node_name);
CREATE TABLE hippo_default PARTITION of hippos (PRIMARY KEY (id)) DEFAULT;
CREATE TABLE hippo_east PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-east');
CREATE TABLE hippo_central PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-central');
CREATE TABLE hippo_west PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-west');
CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');
CREATE OR REPLACE FUNCTION add_node_name()
RETURNS trigger AS $$
BEGIN
UPDATE hippos
SET node_name \= get_node_name()
WHERE node_name IS NULL;
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER add_node_name
AFTER INSERT ON hippos
FOR EACH STATEMENT
EXECUTE FUNCTION add_node_name();
Now, set up hippo-south
so it can be added to the cluster. This involves
creating a publication on hippo-south
and subscribing it to the current nodes:
CREATE PUBLICATION pub_hippo_south FOR TABLE hippo_south;
CREATE SUBSCRIPTION sub_hippo_south_hippo_east
CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_east;
CREATE SUBSCRIPTION sub_hippo_south_hippo_central
CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_central;
CREATE SUBSCRIPTION sub_hippo_south_hippo_west
CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_west;
We can test that hippo-south
is in the cluster. Try running our verification
query:
SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;
This should yield:
node_name | count
---------------+-------
hippo-east | 25
hippo-central | 25
hippo-west | 25
(3 rows)
Excellent. Let's insert some data into hippo-south
:
INSERT INTO hippos (value, created_at)
SELECT
datemultirange(
daterange(CURRENT_DATE + n, CURRENT_DATE + n + (100 \* random())::int),
daterange(CURRENT_DATE + n + 100, CURRENT_DATE + n + 100 + (200 \* random())::int)
),
clock_timestamp()
FROM generate_series(1,25) n;
Now we have one final step: making the other instances aware of hippo-south
.
Log into hippo-east
and execute:
CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');
CREATE SUBSCRIPTION sub_hippo_east_hippo_south
CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_south;
On hippo-central
:
CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');
CREATE SUBSCRIPTION sub_hippo_central_hippo_south
CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_south;
And on hippo-west
:
CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');
CREATE SUBSCRIPTION sub_hippo_west_hippo_south
CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
PUBLICATION pub_hippo_south;
You can verify that hippo-east
, hippo-central
, and hippo-west
are
receiving writes from hippo-south
by executing:
SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;
which should yield:
node_name | count
---------------+-------
hippo-east | 25
hippo-south | 25
hippo-central | 25
hippo-west | 25
(4 rows)
Next Steps
The above guide should get you on your way to building out Postgres clusters that can accept writes across multiple nodes and share them with each other. By using PGO, the open source Postgres Operator, you can also set up high availability, backups, and monitoring to make your cluster production ready.
Further more, you can bootstrap a lot of the schema work by telling PGO to set up a database according to a SQL file you provide. This makes setting up such clusters reproducible across environments, which is the GitOps way!
Kubernetes and PGO make it easier to set up distributed Postgres clusters and allow for designing and scaling different topologies for managing your data, from active-active federation to edge computing.