Upgrading PostgreSQL from 9.4 to 10.3 with pglogical
I recently helped a customer upgrade a PostgreSQL instance from 9.4 on RHEL to
10.x on Ubuntu. While it initially sounded daunting, the use of
pglogical
and some planning
actually made it rather straightforward. While there’s nothing new or original
in this post, I still felt compelled to write it up for anyone else that might
find the info useful as an example in their own endeavors.
pglogical
is a logical replication system implemented entirely as a PostgreSQL
extension. Fully integrated, it requires no triggers or external programs. This
makes it faster than Slony,
Londiste,
et al. It is also (roughly) the basis upon which
logical replication
in PostgreSQL 10 core is built.
Installing pglogical
Instructions for installing pglogical and the needed packages can be found here. It will need to be installed on both the source (old Pg version) and destination servers (new Pg version).
Configuring pglogical
Tweaking the Cluster Configuration
You will need to adjust the
postgresql.conf
file to accommodate pglogical
. On both the source and destination servers,
execute the following commands:
echo "include 'pglogical.conf'" >> $PGDATA/postgresql.conf
echo "wal_level = 'logical'" >> $PGDATA/pglogical.conf
echo "max_worker_processes = 10" >> $PGDATA/pglogical.conf
echo "max_replication_slots = 10" >> $PGDATA/pglogical.conf
echo "max_wal_senders = 10" >> $PGDATA/pglogical.conf
echo "shared_preload_libraries = 'pglogical'" >> $PGDATA/pglogical.conf
NOTE: If you already have one or more values in shared_preload_libraries
,
simply append pglogical
to the list of values already there.
Ensure the Presence of Primary Keys
Logical replication doesn’t work without primary keys. The below SQL commands will help you identify which schema/table pairs do not have a primary key:
SELECT
n.nspname as schema,
c.relname as table
FROM
pg_class c
JOIN
pg_namespace n
ON n.oid = c.relnamespace
WHERE
c.relkind = 'r'
AND NOT EXISTS (
SELECT 1
FROM pg_constraint con
WHERE con.conrelid = c.oid
AND con.contype = 'p'
)
AND n.nspname <> ALL (
ARRAY [
'pg_catalog',
'sys',
'dbo',
'information_schema'
]
);
Create the pglogical Extension
On both the source and destination Pg instances, create the pglogical
extension in every database you wish to replicate by running the following
command as a database superuser:
CREATE EXTENSION pglogical;
NOTE: On PostgreSQL 9.4 only you will first need to
CREATE EXTENSION pglogical_origin;
Running pglogical
Ensure Global Objects Are Copied
The pglogical
tool runs at the database level which means that global objects
like roles are not copied. Therefore, you need to ensure these objects are
created yourself.
On the source PostgreSQL server:
pg_dumpall -g -f globals.sql
Then copy globals.sql
to the destination server and run:
psql -f globals.sql
Prepare the Destination Schema
At this time, pglogical
does not replicate DDL, so it is necessary to ensure
that both the source and destination have matching schema object definitions
before attempting to replicate.
As such, for each source database that you want to replicate, you need to run a ‘schema only’ dump:
pg_dump -Fc -s -f dbname_schema.dmp dbname
Now copy the dbname_schema.dmp
file(s) to the destination server, and run for
each database:
pg_restore -d dbname dbname_schema.dmp
Create a Replication User
We’ll need a user that has the replication
permission for this all to work.
create the following user on both the source and destination PostgreSQL
instances:
CREATE ROLE pglogical LOGIN REPLICATION SUPERUSER ENCRYPTED PASSWORD 'secret';
Tweak the pg_hba.conf
on both the source and destination Pg instances to
allow the replication user to connect:
local replication pglogical trust
host replication pglogical 0.0.0.0/0 md5
local dbname pglogical trust
host dbname pglogical 0.0.0.0/0 md5
NOTE: Make sure to edit 0.0.0.0/0
to match your actual CIDR or IP address and
dbname
to match the db you wish to replicate.
Create your Publication
Now, we’re ready to actually setup and start the replication. First, we need to SIGHUP the postmaster so it sees all the config changes we made on both the source and target Pg instances:
pg_ctl -D $(ps -efw|grep -- "[p]ost.*-D"|cut -d\- -f2|cut -d" " -f2) reload
On the source Pg instance, we need to create a publication to "push" the data to the new instance:
SELECT pglogical.create_node(node_name := 'dbname_provider', dsn := 'host=127.0.0.1 port=5432 dbname=test user=pglogical');
Adjust the port=
and dbname=
parameters to match your source Pg
instance. If replicating more than one database, repeat this command for each
database, changing dbname
and dbname_provider
accordingly.
Add Your Tables to the Publication
Now that we have a publication channel, we need content to publish. Let’s add that now:
-
Add all of your tables:
SELECT pglogical.replication_set_add_all_tables('default', '{public}'::text[]);
-
Add all of your sequences:
SELECT pglogical.replication_set_add_all_sequences(set_name := 'default', schema_names := '{public}'::text[], synchronize_data := true )
You should change public
in both the above if you are using a different schema
for your objects. If you are using multiple schemas, simply repeat the above and
change public
appropriately.
NOTE: The nextval
of sequences will be synced roughly every 60 to 70 seconds.
Create your Subscription
Now that we have a publication channel and its content defined, we need to setup a subscriber on the target PostgreSQL instance to consume the channel:
SELECT pglogical.create_node(node_name := 'subscriber', dsn := 'host=127.0.0.1 port=5432 dbname=test user=pglogical');
Adjust the dbname=
parameter to match your target Pg instance. If
replicating more than one database, repeat this command for each database.
Now, tell the subscriber what to subscribe to:
SELECT pglogical.create_subscription(subscription_name := 'subscription', provider_dsn := 'host=172.28.173.18 port=5432 dbname=test user=pglogical', replication_sets := '{default}'::text[] );
Adjust host=
, port=
, and dbname=
parameters to match your source
PostgreSQL instance. If replicating more than one database, repeat this command
for each database, changing dbname
and subscription_name
accordingly.
Conclusion
At this point, data should be replicating and (if not already) it will catch up
to ‘current’ quickly. Once caught up, replication will maintain sync between the
source and target instances in almost real time. You can easily determine the
current state of pglogical
by issuing this SQL on the subscriber:
SELECT subscription_name, status FROM pglogical.show_subscription_status();
If the query returns initializing
then it is copying the original source data
to the destination. If the query returns replicating
then the initial
synchronization has completed and replicating is now happening in real time as
data changes.
When ready, you can simply stop any applications pointing at the source Pg instance, wait a few minutes to ensure replication drains any outstanding items, force an update of your sequences:
SELECT pglogical.synchronize_sequence( seqoid ) FROM pglogical.sequence_state;
and then re-point your applications at the target instance.
Post-upgrade, if you wish to clean everything up, you can use the following steps:
-
Remove the subscription:
SELECT pglogical.synchronize_sequence( seqoid ) FROM pglogical.sequence_state;
-
Remove the subscriber:
SELECT pglogical.drop_node('subscriber', true);
-
Remove the extension:
DROP EXTENSION pglogical CASCADE;
-
Remove the user:
DROP ROLE pglogical;
-
Remove any
pglogical
lines inpg_hba.conf
-
Remove
$PGDATA/pglogical.conf
-
Reload PostgreSQL
-
Remove the OS packages using
yum
orapt
The original copy of this article can be found at https://hunleyd.github.io/posts/upgrading-postgresql-from-9.4-to-10.3-with-pglogical/