PostgreSQL/Replication
< PostgreSQLReplication is the process of transfering data changes from one or many databases (master) to one or many other databases (standby) running on one or many other nodes. The purpose of replication is
- High Availability: If one node fails, another node replaces him and applications can work continuously.
- Scaling: The workload demand may be too high for one single node. Therefore it is spread over several nodes.
Concepts
PostgreSQL offers a bunch of largely mutually independent concepts for use in replication solutions. They can be picked up and combined - with only few restrictions - depending on the use case.
Events
- With Trigger Based Replication a trigger (per table) starts the transfer of changed data. This technique is outdated and not used.
- With Log Based Replication such information is transfered, which describes data changes and is created and stored in WAL files anyway.
Shipping
- Log-Shipping Replication (or File-based Replication) denotes the transfer of completely filled WAL files (16 MB) from master to standby. This technique is not very elegant and will be replaced by Streaming Replication over time.
- Streaming Replication denotes the transfer of log records (single change information) from master to standby over a TCP connection.
Primary parameter: 'primary_conninfo' in recovery.conf on standby server.
Format
- In Physical Format the transfered WAL records have the same structure as they are used in WAL files. They reflect the strucure of database files including block numbers, VACUUM information and more.
- The Logical Format is a decoding of WAL records into an abstract format, which is independent from PostgreSQL versions and hardware platforms.
Primary parameter: 'wal_level=logical' in postgres.conf on master server.
Synchronism
- In Asynchronous Replication data is transfered to a different node without waiting for a confirmation of its receiving.
- In Synchronous Replication the data transfer waits - in the case of a COMMIT - for a confirmation of its successful processing on the standby.
Primary parameter: 'synchronous_standby_names' in postgres.conf on master server.
Standby Mode
- Hot: In Hot Standby Mode the standby server runs in 'recovery mode', accepts client connections, and processes their read-only queries.
- Warm: In Warm Standby Mode the standby server runs in 'recovery mode' and doesn't allow clients to connect.
- Cold: Although it is not an offical PostgreSQL term, Cold Standby Mode can be associated with a not running standby server with log-shipping technique. The WAL files are transfered to the standby but not processed until the standby starts up.
Primary parameter: 'hot_standby=on/off' in recovery.conf on standby server.
Architecture
In opposite to the above categories, the two different architectures are not strictly distinct to each other, eg: if you focus to atomic replication channels of a Multi-Master architecture, you will see a Master/Standby replication.
- The Master/Standby (or Primary/Slave) architecture denotes a situation, where one or many standby nodes receive change data from one master node. In such situations standby nodes may replicate the received data to other nodes, so they are master and standby at the same time.
- The Multi-Master architecture denotes a situation, where one or many standby nodes receive change data from many master nodes.
Configuration
This configuration is done in the 'postgres.conf' file (some on the master site, others on the standby site), whereas security configuration is stored in 'pg_hba.conf' (master site), and some important decisions are derived from the existance of 'recovery.conf' (standby site) and its values. The great number of possible combinations of concepts and their correlation to values within the config files may be confusing at the beginning. Therefore we reduce our explanations to the minimal set of values.
Shipping: WAL vs. Streaming
As they are necessary for recovery after a crash, WAL files are generated anyway. If they are used to shipp information to a standby server, it is necessary to add some more information to the files. This is activated by choosing a higher value for wal_level.
# WAL parameters on MASTER's postgres.conf wal_level='archive' | 'hot_standby' # choose one archive_mode='on' # activate the feature archive_command='scp ...' # the transfer-to-standby command
When you switch the shipping technique to streaming instead of WAL you must not deactivate WAL generating and transfering. For safety reasons you may want to transfer WAL files anyway (to a platform different from the standby server). Therefore you can retain the above parameters in addition to streaming replication parameters.
The streaming activities are initiated by the standby server. When he finds the file 'recovery.conf' during its start up, he assumes that it's neccessay to perform a recovery. In our case of replication he uses nearly the same techiques as in the recovery-from-crash situation. The parameters in 'recovery.conf' advice him to start a so-called WAL receiver process within its instance. This process connects to the master server and initiates a WAL sender process over there. Both exchange information in an endless loop whereas the standby server keeps in 'recovery mode'.
The authorization at the operating system level shall be done by exchanging ssh keys.
# Parameters in the STANDBY's recovery.conf standby_mode='on' # activates standby mode # How to reach the master: primary_conninfo='user=<replication_dbuser_at_master> host=<IP_of_master_server> port=<port_of_master_server> sslmode=prefer sslcompression=1 krbsrvname=...' # This file can be created by the pg_basebackup utility, see below
On the master site there must be a privileged database user with the special role REPLICATION:
CREATE ROLE <replication_dbuser_at_master> REPLICATION ...;
And the master must accept connections from the standby in general and with a certain number of processes.
# Allow connections from standby to master in MASTER's postgres.conf listen_addresses ='<ip_of_standby_server>' # what IP address(es) to listen on max_wal_senders = 5 # no more replication processes/connections than this number
Additionally, authentication of the replication database user must be possible. Please notice that the key word ALL for the database name does not include the authentication of the replication activities. 'Replication' is a key word of its own and must be noted explicitely.
# One additional line in MASTER's pg_hba.conf # Allow the <replication_dbuser> to connect from standby to master host replication <replication_dbuser> <IP_of_standby_server>/32 trust
Now you are ready to start. First, you must start the master. Second, you must transfer the complete databases from the master to the standby. And at last you can start the standby. Just as the replication, the transfer of the databases is initiated at the standby site.
pg_basebackup -h <IP_of_master_server> -D main --xlog-methode=stream --checkpoint=fast -R
The utility pg_basebackup transfers everythink to the directory 'main' (shall be empty), in this case it uses the streaming methode, it initiates a checkpoint at the master site to enforce consistency of database files and WAL files, and due to the -R flag it generates previous mentioned recovery.conf file.
Format: Physical vs. Logical
The decoding of WAL records from their physical format to a logical format was introduced in PostgreSQL 9.4. The physical format contains - among others - block numbers, VACUUM information and it depends on the used character encoding of the databases. In contrast, the logical format is independent from all these details - conceptually even from the PostgreSQL version. Decoded records are offered to registered streams for consuming.
This logical format offers some great advantages: transfer to dabases at different major release levels, at different hardware achitectures, and even to other writing master. Thus multi-master-architectures are possible. And additionally it's not necessary to replicate the complete cluster: you can pick single database objects.
In release 9.5 the feature is not delivered with core PostgreSQL. You must install some extentions:
CREATE EXTENTION btreee_gist; CREATE EXTENSION bdr;
As the feature is relative new, we don't offer details and refer to the documentation. And there is an important project Bi-Directional Replication, which is based on this technique.
Synchronism: synchron vs. asynchron
The default behaviour is asynchronuous replication. This means that transfered data is processed at the standby server without any synchronization with the master, even in the case of a COMMIT. In opposite to this behaviour the master of a synchronuous replication waits for a successfull processing of COMMIT statements at the standby before he confirms it to its client.
The synchronuous replication is activated by the parameter 'synchronous_standby_names'. Its values identify such standby servers, for which the synchronicity shall take place. A '*' indicates all standby server.
# master's postgres.conf file synchronous_standby_names = '*'
Standby Mode: hot vs. warm
As long as the standby server is running, he will continuously handle incomming change information and store it in its databases. If there is no necessity to process requests from applications, he shall run in warm standby mode. This behaviour is enforced in the recovery.conf file.
# recovery.conf on standby server hot_standby = off
If he shall allow client connections, he must start in hot standby mode. In this mode read-only access from clients are possible - write actions are denied.
# recovery.conf on standby server hot_standby = on
To generate enough information on the master site for the standby's hot standby mode, its WAL level must also be hot_standby.
# postgres.conf on master server wal_level = hot_standby
Typical Use Cases
We offer some typical combinations of the above mentioned concepts and show its advantages and disadvantages.
Warm Standby with Log-Shipping
In this situation a master sends information about changed data to a standby using completely filled WAL files (16 MB). The standby continuously processes the incomming information, which means that the changes made on the master are seen at the standby over time.
To build this scenario, you must perform steps, which are very similar to Backup with PITR:
- Take a physical backup exactly as described in Backup with PITR and transfer it to the standby.
- At the master site
postgres.conf
must specifywal_level=archive;archive_mode=on
and a copy command to transfer WAL files to the standby site. - At the standby site the central step is the creation of a
recovery.conf
file with the linestandby_mode='on'
. This is a sign to the standby to perform an 'endless recovery process' after its start. -
recovery.conf
must contain some more definitions:restore_command, archive_cleanup_command
With this parametrisation the master will copy its completely filled WAL files to the standby. The standby processes the received WAL files by copying the change information into its database files. This behaviour is nearly the same as a recovery after a crash. The difference is, that the recovery mode is not finish after processing the last WAL file, the standby waits for the arrival of the next WAL file.
You can copy the arrising WAL files to a lot of servers and activate warm standby on each of them. Doing so, you get a lot of standbys.
Hot Standby with Log-Shipping
This variant offers a very valuable feature in comparision with the warm standby scenario: applications can connect to the standby and send read requests to him while he runs in standby mode.
To achive this situation, you must increase wal_level
to hot_standby
at the master site. This leads to some additional information in the WAL files. And on the standby site you must add hot_standby=on
in postgres.conf
. After its start the standby will not only process the WAL files but also accept and response to read-requests from clients.
The main use case for hot standby is load-balancing. If there is a huge number of read-requests, you can reduce the masters load by delegating them to one or more standby servers. This solution scales very good across a great number of parallel working standby servers.
Both scenarios cold/hot with log-shipping have a common shortage: The amount of transfered data is always 16 MB. Depending on the frequency of changes at the master site it can take a long time until the transfer is started. The next chapter shows a technique which does not have this deficiency.
Hot Standby with Streaming Replication
The use of files to transfer information from one server to another - as it is shown in the above log-shipping scenarios - has a lot of shortages and is therefore a little outdated. Direct communication between programs running on different nodes is more complex but offers significant advantages: the speed of communication is incredible higher and in much cases the size of transfered data is smaller. In order to gain these benefits, PostgreSQL has implemented the streaming replication technique, which connects master and standby servers via TCP. This technique adds two additional processes: the WAL sender process at the master site and the WAL receiver process at the standby site. They exchange information about data changes in the master's database.
The communication is initiated by the standby site and must run with a database user with REPLICATION privileges. This user must be created at the master site and authorized in the master's pg_hba.conf file. The master must accept connections from the standby in general and with a certain number of processes. The authorization at the operating system level shall be done by exchanging ssh keys.
Master site: ============ -- SQL CREATE ROLE <replication_dbuser_at_master> REPLICATION ...; # postgres.conf: allow connections from standby to master listen_addresses ='<ip_of_standby_server>' # what IP address(es) to listen on max_wal_senders = 5 # no more replication processes/connections than this number # make hot standby possible wal_level = hot_standby # pg_conf: one additional line (the 'all' entry doesn't apply to replication) # Allow the <replication_dbuser> to connect from standby to master host replication <replication_dbuser> <IP_of_standby_server>/32 trust Standby site: ============= # recovery.conf (this file can be created by the pg_basebackup utility, see below) standby_mode='on' # activates standby mode # How to reach the master: primary_conninfo='user=<replication_dbuser_at_master_server> host=<IP_of_master_server> port=<port_of_master_server> sslmode=prefer sslcompression=1 krbsrvname=...' # postgres.conf: activate hot standby hot_standby = on
Now you are ready to start. First, you must start the master. Second, you must transfer the complete databases from the master to the standby. And at last you start the standby. Just as the replication activities, the transfer of the databases is initiated at the standby site.
pg_basebackup -h <IP_of_master_server> -D main --xlog-methode=stream --checkpoint=fast -R
The utility pg_basebackup transfers everythink to the directory 'main' (shall be empty), in this case it uses the streaming methode, it initiates a checkpoint at the master site to enforce consistency of database files and WAL files, and due to the -R flag it generates the previous mentioned recovery.conf file.
The activation of the 'hot' standby is done exactly as in the previous use case.
An Additional Tool
If you have to manage a complex replication use case, you may want to check the open source project 'repmgr'. It supports you to monitor the cluser of nodes or perform a failover.