33 hours to restore a Postgres DB

Having migrated our DB to a new machine, I was left with a pretty good machine unused. I decided to rebuild it and try out Postgres 9.0.

I downloaded and installed CentOS 6.0 x86_64, built Postgres 9.0.4 from SRPMS (makes me feel better building it myself), installed it and did initdb, dumped the primary DB to our NAS using psql 9.0 (pg_dump [db] | gzip > /path/to/file.gz) and began the restore.

Not much of a story but it took 33 hours to restore the DB, with the bulk of the time spent building indices. When the restore was complete I did an analyze verbose which took another 2 hours, for a total of 35 hours, not counting the time it took to create the dump file – about 14 hours, but this was with the DB under heavy load, made heavier by the dump, so I don’t know how long it would really take. I’m glad I didn’t have to do this to migrate, but I’m not relishing the idea of doing a real upgrade to 9.0 if it’s going to take this long. For the restore I disabled autovacuum and set fsync = off. It’s an ext4 filesystem on a 24-disk RAID 10 of 10krpm SAS drives.

DB size on disk after a clean restore was 1156 GB, as per pg_database_size(), versus 1237 GB on disk for the one that was migrated via bit-copy, so there’s a sizable chunk of crud accumulated in there.

Also, in doing this I discovered the logger command. One of the things that I’ve always found annoying about doing a pg restore is that the output is just a bunch of statements like this:

SET
SET
SET
CREATE SCHEMA
ALTER SCHEMA
CREATE SCHEMA
ALTER SCHEMA
SET
CREATE TABLE
ALTER TABLE
CREATE FUNCTION
ALTER FUNCTION

This is just what’s returned from the SQL commands, but it’s not very informative, and most importantly it doesn’t show you the timestamp so you have no idea how long anything’s taking. With logger, you can just pipe the output to it and it’ll be logged with syslog. So my restore command looked like this:

zcat /nfs/db/pgdump-90.sql.gz | psql -Upguser db 2> ~/restore.20110723-1916.log | logger -t PGRESTORE

This made it trivial to get the messages out of /var/log/messages with grep:

[root@link log]# grep PGRESTORE messages | head
Jul 24 03:36:43 link PGRESTORE: ALTER TABLE
Jul 24 03:36:43 link PGRESTORE: SET
Jul 24 03:36:44 link PGRESTORE: ALTER TABLE
Jul 24 03:36:44 link PGRESTORE: ALTER TABLE
Jul 24 03:38:45 link PGRESTORE: ALTER TABLE
Jul 24 03:39:13 link PGRESTORE: ALTER TABLE
Jul 24 03:39:13 link PGRESTORE: ALTER TABLE
Jul 24 03:39:13 link PGRESTORE: ALTER TABLE
Jul 24 03:39:13 link PGRESTORE: SET
Jul 24 03:39:13 link PGRESTORE: ALTER TABLE

In addition to having the timestamps it eliminates the accumulation of the random log files that accumulate like droppings from myriad one-off scripts. I modified about 20 cron jobs to pipe their output to logger rather than junk files in /tmp or my homedir. I can’t believe I never heard of logger before! Hooray for logger!

One Reply to “33 hours to restore a Postgres DB”

Comments are closed.

%d bloggers like this: