Some findings around replication following an earlier blog post.
I have a simple streaming replication setup with 1 primary and 1 standby DB server. My reference DB is about 42GB in size.
A full vacuum will break the replication. Remember, PostgresSQL is recreating every table in the full mode. This wont fit through any default sized xlogs.
A regular vacuum works fine though.
In a complete disaster scenario where both primary and standby are corrupted I have to rely on the nightly pg_dump.
The import on the primary DB server (AWS m2.4xl instance) takes about 3 hours to restore with the default parameters.
We can significantly speed it up by utilizing more than 1 default CPU for the data and index creation using the j parameter. Constraints cannot be created in parallel though. sudo -u postgres pg_restore -j 8 -v -d dbname dumpfilename
This way I could reducte the restore time to 70 minutes.
I still have to get the standby in sync with pg_basebackup, takes another 60 minutes. This time I could reduce to 30 minutes by tar and copying the complete DB files over to the standby.
(But pg_basebackup works while the primary server is in use, provided you have enough xlogs to cover the until you want to link the standby)
The system I work on we deploy almost solely on the Amazon AWS platform. Even I try to design the architecture in a way not to be locked-in too much into Amazon, I make use of the Amazon tools and products as much as possible (EC2, VPC, S3, SNS). PostgreSQL is our reference DB and the only DB product in production environments, still we run dedicated instances with PostgreSQL. I am quite delighted about AWS recent offering staring RDS with PostgreSQL. While is is still in BETA and I did not started yet with a conclusive test and migration plan, I need to maintain our existing instances.
There are plenty of books and tutorials about setting up PostgreSQL replication with on-board tools, without going into the details I share the express setup in this tutorial based on Streaming Replication which is part of PostgreSQL since version 9.0. I highly recommend to review the parameters and settings from the below tutorial as your project might have different requirements.
In part 1 of this tutorial we created an EJB using Netbeans and Visual Paradigm, in part 2 a little ZK application to read data using the EJB. In part 3 we will move away from the Derby DB to PostgreSQL and Oracle DB and challenge ourselves with identifier more than 30 characters, which is an issue for Oracle (yes, it is 2011). We will add columns with more than 30 characters and play with a few different column types (the ones showing up in a normal DB layout).
Add new fields to the ERD this_is_a_very_long_remark_field floatcol numbercol
pgAdmin is the best GUI you can use to administrate PostgreSQL, unffortunately the Ubuntu default packages still offer only PostgreSQL 8.4 and an older version of pgAdmin III that does not support PostgreSQL 9.0.x. Thanks to Martin Pitt who maintains the latest packages you can run and maintain the latest PostgreSQL versions.
If you run Maverick:
sudo apt-get remove pgadmin3 (if any old versions, make sure you dont create trouble with existing PostgreSQL 8.x versions)
I didnt notice for 7 days that the latest version of PostgreSQL DB was released on September 20th 2010 ! Congratulations and a big Thank You to the PostgreSQL team ! Is is wonderful to have such a great product out there.