Synchronous Replication in PostgreSQL
PostgreSQL has supported streaming replication and hot standbys since version 9.0 (2010), and synchronous replication since version 9.1 (2011). Streaming replication (and in this case we're referring to "binary" streaming replication, not "logical") sends the PostgreSQL WAL stream over a network connection from primary to a replica. By default, streaming replication is asynchronous: the primary does not wait for a replica to indicate that it wrote the data. With synchronous replication, the primary will wait for any or all replicas (based on synchronous replication mode) to confirm that they received and wrote the data.
Depending on your business requirements, you may only need the default asynchronous behavior, or you may need to configure one or more synchronous replicas. Fortunately, PostgreSQL lets you choose and provides options for tuning the consistency and performance (latency) behavior depending on your requirements.
See the PostgreSQL documentation for more details on streaming replication.
Preparation
Name your instances. Life will be simpler. You do that by setting a
configuration parameter in each instance's postgresql.conf
file. We'll see why
it makes things easier later using the pg_stat_replication
table's contents on
the primary. Since you use the name of a replica to configure it as synchronous,
if each replica has a unique name, you can configure individual replicas as
synchronous or asynchronous.
For example, on one of the replicas in postgresql.conf
(Note: If your cluster is managed or created by Patroni or
Crunchy HA PostgreSQL,
it will manage the contents of postgresql.conf
, so make the changes in the
Patroni config, which will generate the postgresql.conf
file that is used by
the servers in your cluster)
cluster_name = 'replica2' # added to process titles if nonempty
Creating a Replica
The first step in creating a replica is to clone the primary. There are several
ways to do that, but the straightforward way is to use pg_basebackup
, which
clones a running primary PostgreSQL instance. For the simple test case for this
article, to create two replicas, as user postgres
/usr/lib/postgresql/12/bin/pg_basebackup -Xs -D ~/12/replica1 -R -p 5433 -h localhost -U replicant
/usr/lib/postgresql/12/bin/pg_basebackup -Xs -D ~/12/replica2 -R -p 5443 -h localhost -U replicant
These connect to the primary as a client as the user replicant which must exist
as a user in the primary DB instance and have REPLICATION
privileges.
Note the -Xs
argument which will open a second connection to the primary to
capture WAL changes to the DB as it operates normally. The -R
argument will
tell pg_basebackup
to create the recovery configuration in the data directory
for the new replica.
Other ways to clone a primary are:
- pgBackrest to backup the primary and restore to the new replica location.
- OS level backups or snapshots, but you must be certain that you get a consistent copy of the primary. A safe way to do this is to stop the instance first.
Example replica settings
(Note that synchronous replication is independent of replication slots. Either can be used with or without the other.)
On a replica:
Note that on the replica's configuration, including the recovery section, there is no indication that it's sync or async. Whether a replica is synchronous or async is determined by the primary's configuration.
This was generated by pg_basebackup
to clone the primary and create a replica.
cat /var/lib/postgresql/12/replica2/postgresql.auto.conf
Do not edit this file manually!
- It will be overwritten by the
ALTER SYSTEM
command. - On PostgreSQL versions prior to 12, this information is stored in the
recovery.conf
file - Recovery settings generated by pgBackRest restore on 2020-02-13 13:11:08
standby_mode
on # only on pg versions <12. Replaced by thestandby.signal
file in pg12
recovery_target_timeline 'latest'
recovery_target_action = 'promote'
primary_conninfo = 'user=replication_user passfile=''/var/lib/postgresql/.pgpass'' port=5433 host=''localhost'' user=''replicant'''
Make a replica synchronous
Once you have streaming replication working, on the primary add a replica name
to synchronous_standby_names
in postgresql.conf
:
synchronous_standby_names = 'replica2'
and tell the postgresql primary to reload the config (e.g.
- pg_ctl reload -D $PGDATA
or the method your HA support uses). Now the
instance named replica2
is a synchronous replica. That's all. Really.
Changing a replica from synchronous back to the default asynchronous is similar.
Just remove that replica's name from the list in synchronous_standby_names
in
the primary's config and tell the primary to reload its configuration.
Reviewing and checking the current replication cluster
Back on the primary:
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
28488 | 10 | replicant | walreceiver | 127.0.0.1 | | 43221 | 2020-08-25 08:25:22.658642-07 | | streaming | 0/2E000060 | 0/2E000060 | 0/2E000060 | 0/2E000060 | | | | 0 | async | 2020-08-25 08:25:22.400688-07
15936 | 24794 | replicant | replica2 | 127.0.0.1 | | 50772 | 2020-08-25 08:25:06.760228-07 | | streaming | 0/2E000060 | 0/2E000060 | 0/2E000060 | 0/2E000060 | | | | 0 | sync | 2020-08-25 08:25:56.915357-07
(2 rows)
- Note that we have a sync and async replica.
- Note that one of the replicas has
cluster_name
unset/defaulting, sowalreceiver
is its name. - Recall this setting in
postgresql.conf
on the primary.
synchronous_standby_names = 'replica2'
Life will be much easier if the replicas don't all default to cluster_name
unset, where all the replicas will have the default name walreceiver
.
Changing a replica to/from synchronous
Changing a replica from synchronous to asynchronous, or vice versa, is easy.
Just add the replica name to synchronous_standby_names
in the primary's
postgresql.conf
and tell PostgreSQL to reload the configuration; no DB restart
needed.
How synchronous is it? Waiting for storage
How synchronous is it? The value of synchronous_commit
on the primary
determines this.
synchronous_commit = on # the default
In order of increasing "safety" (durability), and increasing latency, values of
synchronous_commit
on the primary:
- When we set
synchronous_commit = off
, aCOMMIT
does not wait for the transaction record to be flushed to disk. - When we set
synchronous_commit = local
, aCOMMIT
waits until the transaction record is flushed to the local disk. - When we set
synchronous_commit = on
, aCOMMIT
will wait until the server(s) specified bysynchronous_standby_names
confirm that the transaction record was safely written to disk.- Note: When
synchronous_standby_names
is empty, this setting behaves same assynchronous_commit = local
.
- Note: When
- When we set
synchronous_commit = remote_write
, aCOMMIT
will wait until the server(s) specified bysynchronous_standby_names
confirm write of the transaction record to the operating system but has not necessarily reached the disk on the replica. - When we set
synchronous_commit = remote_apply
, aCOMMIT
will wait until the server(s) specified bysynchronous_standby_names
confirm that the transaction record was applied to the replica's database.
How much data might not be replicated in a case where the replica loses
connectivity with the primary, with the faster and less durable options? That
depends on more PostgreSQL settings - wal_writer_delay
and
wal_writer_flush_after
. The first one flushes WAL after a specified time
period (200ms default), the second one flushes if the specified number of WAL
files are created since the last flush. If you set synchronous_commit
to
off
, then these two settings will limit how much WAL remains uncommitted.
Setting synchronous behavior in a session (client)
Since the synchronous commit behavior is related to a transaction, it can be changed by a client for the session and during a session, so a client can set different values for each statement. You can set the synchronous behavior at any of these levels:
- Single statement / transaction -
SET LOCAL synchronous_commit =
- Session - `SET synchronous_commit = ``
- User -
ALTER USER someuser SET synchronous_commit =
- Database -
ALTER DATABASE SET synchronous_commit =
- And of course cluster wide by updating
postgreql.conf
Adding priority or quorum to the list of synchronous replicas
In addition to specifying how synchronous a remote replica, you can also create
a list of synchronous replicas by priority - FIRST
, or a quorum of replicas -
ANY
.
Quorum is an important aspect of distributed computing. You may already know what it is, but if not, here’s a simplified explanation. In this case we are concerned with consistency of the DB data across multiple DB nodes. When a number of nodes - that you choose - all have the same data committed, then the cluster is considered to be in a consistent state. The number of nodes can be all the nodes in the cluster, or it can be a subset of the nodes. The number of nodes is the quorum number and is chosen depending on your business requirements for data consistency. In this case, the nodes “vote” for quorum by replying to the primary that they have received and applied the replicated data. It’s common to have an odd number of nodes in a cluster and then define quorum as the majority of nodes with consistent copies of the DB data (e.g. - 2 of 3 nodes or 3 of 5 nodes). Quorum is used for other purposes in distributed computing. A common case is electing a new primary from a cluster of nodes when the current primary fails or is unavailable.
There are more options in the primary's postgresql.conf
setting of
synchronous_standby_names
to support priority and simple quorum.
For the priority case, the FIRST
keyword:
synchronous_standby_names = ‘FIRST num (standby_name [, …])’
The synchronous commit will wait for a reply from at least num number of standbys listed in the order of priority.
For the quorum case, the ANY
keyword:
synchronous_standby_names = ‘ANY num (standby_name [, …])’
The same rules as above apply. So, for example setting
synchronous_standby_names = 'ANY 2 (\*)'
will cause synchronous commit to wait
for reply from any 2 standby servers. Double check your syntax and test that the
settings implement your business rules for consistency.
Configure so you don't wait forever
With synchronous replication, you've built in a dependency that a transaction is not committed on the primary until it's written to the synchronous replica, so depending on the configuration options above, your primary can hang forever if the replica (or quorum of replicas) is not reachable by the primary. Obviously, you're dependent on the connection between primary and replicas.
If you have a single synchronous replica and it is unavailable, your primary
will wait for it to return, and will block until it does. To avoid that you want
to have at least two replicas and use the FIRST
or ANY
options to
synchronous_standby_names
described above. You could disable synchronous
replication by commenting out synchronous_standby_names
but then of course,
you don't have a synchronous replica.
The TL;DR
To convert a streaming binary replica to synchronous, add its name to the
primary's postgresql.conf
setting synchronous_standby_names
, and reload the
primary.
To convert a synchronous replica to asynchronous, remove its name from the
primary's postgresql.conf
setting synchronous_standby_names
, and reload the
primary.
These will be much easier if you've added a unique name to the replica's
cluster_name
setting in its postgresql.conf
References
As always, the PostgreSQL documentation is the place to look for more information - https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION