PostgreSQL 14 on Kubernetes (with examples!)
Last week PostgreSQL 14 became generally available to the public. The release contains a lot of cool stuff, including features that I can appreciate from my application development background like better JSON and multirange types. There are also a lot of little things that make PostgreSQL 14 a really cool release, and I encourage you to explore the release notes to find a feature that will make your life easier.
Crunchy Bridge, our managed service available on Amazon, Microsoft, and Google, made PostgreSQL 14 available only several hours after it was announced, and we wanted to bring the PostgreSQL 14 experience as quickly as possible to PGO, the open source Postgres Operator for Kubernetes, and we're pleased to announce that it is already generally available.
With some of the distributed computing improvements to PostgreSQL 14, I thought going through a quick demo of getting PostgreSQL 14 up and running on Kubernetes would be a great way to show off these new features!
Setting up PostgreSQL 14
After installing PGO (I do recommend the quickstart!), you can get up and running with PostgreSQL 14 on Kubernetes with the following manifest:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
However, let's create three separate Postgres clusters. You can use the manifests below, or build off of a manifest from the Postgres Operator examples repo:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: rhino1
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
users:
- name: postgres
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: rhino2
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 5Gi
users:
- name: postgres
You should have your PostgreSQL 14 clusters up and running fairly quickly! This
manifest does allow for exposing the postgres
superuser from the rhino
Postgres cluster. This is mainly for convenience to show off part of our
example: in a production environment, you should be sure to set appropriate
roles and privileges when using foreign data wrappers.
Anyway, we can confirm that we're running PostgreSQL 14 with this quick command
(I'm assuming you've deployed your clusters to the postgres-operator
namespace):
kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o name) -- \
psql -c 'SELECT version();'
which should yield something similar to:
version
--------------------------------------------------------------------------------------------------------
PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)
Cool.
There are many ways you can
connect to your database,
but for this example, we will use the kubectl exec
pattern like above.
Example: PostgreSQL 14 Async Foreign Table Queries
One of the many really cool features of PostgreSQL 14 is the ability to perform
queries on multiple [foreign tables] in parallel. This capability is enabled by
setting the async_capable
option to 'true'
on either the
foreign server
or a
foreign table definition.
This is already implemented for the
postgres_fdw
.
We create two Postgres clusters named rhino1
and rhino2
. These are going to
contain data that we will query from the hippo
Postgres cluster. First, we
need to add some data to each of the rhino
Postgres clusters.
Log into rhino1
:
kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=rhino1,postgres-operator.crunchydata.com/role=master' -o name) -- psql
and execute the following SQL:
CREATE SCHEMA IF NOT EXISTS rhino1;
CREATE TABLE IF NOT EXISTS rhino1.stats (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO rhino1.stats (data, created_at)
SELECT
json_build_object('sample', x, 'node', 'rhino1', 'stat', random()),
clock_timestamp()
FROM generate_series(1,2000000) x;
This creates a schema with some randomly generated data. Let's do something
similar on rhino2
:
kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=rhino2,postgres-operator.crunchydata.com/role=master' -o name) -- psql
Execute the following SQL on rhino2
:
CREATE SCHEMA IF NOT EXISTS rhino2;
CREATE TABLE IF NOT EXISTS rhino2.stats (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO rhino2.stats (data, created_at)
SELECT
json_build_object('sample', x, 'node', 'rhino2', 'stat', random()),
clock_timestamp()
FROM generate_series(1,2000000) x;
Great! Now before we set up hippo
to be able to perform parallel foreign table
queries, we need to capture the password for the postgres
user on rhino1
and
rhino2
(I will also add the obligatory "in production, you would not use the
postgres
user but rather set up a special purpose user for accessing the
foreign tables"). You can grab the password and store it to a couple of
environmental variables:
postgres_rhino1=$(kubectl -n postgres-operator get secrets rhino1-pguser-postgres -o jsonpath='{.data.password}' | base64 -d)
postgres_rhino2=$(kubectl -n postgres-operator get secrets rhino2-pguser-postgres -o jsonpath='{.data.password}' | base64 -d)
# if you want to display the passwords on your terminal, uncomment the lines below
# echo $postgres_rhino1
# echo $postgres_rhino2
Alright, let's log into the hippo
Postgres cluster:
kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o name) -- psql
First, let's ensure that the postgres_fdw
is enabled:
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
Next, let's set up the definitions for how we can connect to the rhino
clusters:
CREATE SERVER IF NOT EXISTS rhino1
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'rhino1-primary.postgres-operator.svc', dbname 'postgres', sslmode 'verify-full', sslrootcert '/pgconf/tls/ca.crt', async_capable 'true');
CREATE SERVER IF NOT EXISTS rhino2
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'rhino2-primary.postgres-operator.svc', dbname 'postgres', sslmode 'verify-full', sslrootcert '/pgconf/tls/ca.crt', async_capable 'true');
There are a couple of things to note here:
- We know the name of our host based on how PGO sets up services.
- This example also demonstrates how to use
Postgres' sslmode
verify-full
when connecting between Postgres instances.verify-full
provides the strongest level of protection for connecting to a Postgres instance, and PGO lets you do this out-of-the-box! - The other key bit is the
async_capable 'true'
option. This enables parallel scans on foreign tables across all foreign tables created for that server. This is both for convenience for the fact that this option is not enabled by default in PostgreSQL 14.
Next, create the
"user mappings"
that tell Postgres how hippo
can authenticate into rhino1
and rhino2
:
CREATE USER MAPPING IF NOT EXISTS FOR postgres
SERVER rhino1
OPTIONS (user 'postgres', password '$postgres_rhino1');
CREATE USER MAPPING IF NOT EXISTS FOR postgres
SERVER rhino2
OPTIONS (user 'postgres', password '$postgres_rhino2');
Note that $postgres_rhino1
and $postgres_rhino2
are placeholders for your
actual credentials.
Now we can start creating foreign tables. For conveniences, let's use the
IMPORT FOREIGN SCHEMA
functionality:
CREATE SCHEMA IF NOT EXISTS rhino1;
IMPORT FOREIGN SCHEMA rhino1
FROM SERVER rhino1
INTO rhino1;
CREATE SCHEMA IF NOT EXISTS rhino2;
IMPORT FOREIGN SCHEMA rhino2
FROM SERVER rhino2
INTO rhino2;
Now, we can finally test the ability for executing queries against foreign tables in parallel! Here is a simple query to test the functionality (the example is a bit contrived):
EXPLAIN ANALYZE SELECT avg(x.stat) FROM (
SELECT data['stat']::float AS stat FROM rhino1.stats
UNION ALL
SELECT data['stat']::float AS stat FROM rhino2.stats
) x;
In my Kubernetes environment, this yield the following plan:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=424.80..424.81 rows=1 width=8) (actual time=13700.998..13701.000 rows=1 loops=1)
-> Append (cost=100.00..412.00 rows=5120 width=8) (actual time=2.937..13410.193 rows=4000000 loops=1)
-> Async Foreign Scan on stats (cost=100.00..193.20 rows=2560 width=8) (actual time=1.394..5337.504 rows=2000000 loops=1)
-> Async Foreign Scan on stats stats_1 (cost=100.00..193.20 rows=2560 width=8) (actual time=1.760..4807.722 rows=2000000 loops=1)
Planning Time: 1.019 ms
Execution Time: 13704.199 ms
Cool -- you can see the PostgreSQL 14 feature on the lines that say "async foreign scan"!
Next Steps
As mentioned at the top of the blog post, there are a lot of features to unpack in PostgreSQL 14. You can easily explore Postgres 14 in Kubernetes with PGO while having Postgres clusters that are production ready. I definitely encourage you to try out Postgres 14 with PGO or Crunchy Bridge and see which features help make your data management easier!