Sunday, October 3, 2010

Setting up Streaming Replication and Hot-Standby features of Postgres 9.0 on Indian Kanoon

Postgres is a database software that powers a large number of production databases around the world including Indian Kanoon. Postgres 9.0 was recently released after more than a year and half of development. The version number was jumped from 8.4 (the previous release) to 9.0 considering the inclusion of two important features namely Streaming Replication and Hot Standby.

Streaming replication, as the name implies, allows a database administer to replicate a primary database on multiple secondary nodes. So when the primary database dies due to hardware failure or software error, one of the secondary nodes can become the primary database to serve production queries. Previously, to replicate a postgres database, one has to copy the base directoy and then periodically copy the Write-Ahead-Log (WAL) of the database. Then the secondary node would run in a continuous recovery process reading the WAL. As the secondary database was still recovering, it could not be used for querying. As such previous standby nodes were called "Warm-Standby".

The WAL shipping happened at time outs or when a WAL segment was full. And so the secondary database could be significantly behind the master in seeing the WAL. Streaming replication fills that gap now. It does not wait anymore for the filling of WAL segments as it "streams them directly" to the secondary server on a TCP connection. Note that currently streaming replication only allows asynchronous transaction, which means that a database transaction need not wait for the replication to finish. So there is a small window of time in which a transaction committed on primary database is not replicated on the secondary and the primary crashes.

The Hot-Standby feature allows read-only queries on a standby node. Previously, a standby was not good for anything other than waiting for the primary to crash. Now the standby can serve production queries as well while keeping the secondary database up to date with the primary. This is tremendously useful as read-only database queries can be load balanced on primary as well as secondary.

I had lot of fun trying to play around with making these two features work for Indian Kanoon databases. I followed the wiki doggedly for setting up streaming replication. However, unable to replicate a dog, I would miss something or the other and the thing will not turn up. In the end the Indian Kanoon database was successfully replicated in a hot-standby mode on an another server in the same data center. It has to be noted that data transfer with in the data-center was clocking close to half a gigabit per second that reminds me of the old 100Mbps networking cards that are obsolete now.

Here are the final steps I took:

1. Modified postgresql.conf on the primary node with following changes:
a. wal_level = hot_standby # sets up wal replaying on a TCP socket
b. max_wal_senders = 1 # max number of walsender processes
c. wal_keep_segments = 128 # in logfile segments, 16MB each
d. listen_addresses = '*' # opens up port 5432 for postgres database server

2. Modify pg_hba.conf to include the ACL for secondary node. Add the following line:

host replication postgres secondary_node/32 trust

3. Copying the base database. On the primary node:
a. psql -c "SELECT pg_start_backup('label', true)"
b. rsync -avz --progress ${PGDATA} secondary:/home/postgres/
c. psql -c "SELECT pg_stop_backup()"

4. Setting the secondary node:
a. Modify postgresql.conf to include "hot_standby = on"
b. Created a recovery.conf on the secondary node with following changes:
i) standby_mode = 'on'
ii) primary_conninfo = ''
iii) trigger_file = '/path_to/trigger'
c. Start the database and check the server log