Exchange (OWA) CAS crashes with 503 error – again

This just started happening again, with these errors appearing in the event viewer:

Log Name: System
Source: Microsoft-Windows-WAS
Date: 9/18/2011 11:16:33 AM
Event ID: 5011
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: exch2010fe1
Description:
A process serving application pool 'MSExchangeOWAAppPool' suffered a
fatal communication error with the Windows Process Activation Service.
The process id was '3760'. The data field contains the error number.

Log Name: System
Source: Microsoft-Windows-WAS
Date: 9/17/2011 6:47:07 AM
Event ID: 5009
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: exch2010fe1
Description:
A process serving application pool 'MSExchangeOWAAppPool' terminated
unexpectedly. The process id was '3108'. The process exit code was
'0x800703e9'.

Log Name: Application
Source: Application Error
Date: 9/17/2011 6:46:30 AM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: exch2010fe1
Description:
Faulting application name: w3wp.exe, version: 7.5.7600.16385, time
stamp: 0x4a5bd0eb
Faulting module name: KERNELBASE.dll, version: 6.1.7600.16385, time
stamp: 0x4a5bdfe0
Exception code: 0xe053534f
Fault offset: 0x000000000000aa7d
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13

After reviewing the IIS logs and the event logs, I think it has to do with the WebReady document viewer – the thing in OWA that renders and lets you view .doc attachments within the browser rather than forcing you to open Word or Excel. I think users were attempting to open corrupted files and that was causing it to crash. I’ve disabled Webready in EMC (Server Config -> CAS) and I’ll see what happens.

Go Daddy $12.99 SSL Sale!

64-bit Firefox 6 on CentOS 6 x86_64 (or Fedora) with Flash 11

So I’ve gone back to Linux from Mac, due to the SSD issues I had with my Macbook Pro basically making VMware unusable. A Win7 VM would grind the guest and host to a halt on the 7200 RPM SATA OEM drive, and the SSD wouldn’t work, so I put the SSD back in my HP and installed CentOS 6 x86_64. Not really ideal, but at least it works.

CentOS 6 ships with Firefox 3.6.9, which is really old by now. Fine for a server, but I wanted FF6. I grabbed the Firefox 6.0.1 bz2 from VoxCast, unzipped it and copied it to /usr/lib64/firefox-6. Then I ran yum remove firefox to remove 3.6.9 and avoid any issues. I tried the instructions from my older post on this subject but for whatever reason it didn’t work – I’m guessing because I’m not using the distro’s Firefox RPM.

I followed these instructions to get the Flash 11 64-bit plugin installed. It still wasn’t working though.

I ended up creating a symlink:

ln -s /usr/lib64/mozilla/plugins/libflashplayer.so /home/evan/.mozilla/plugins/libflashplayer.so

When I started up Firefox after creating the symlink, Flash worked.

Additionally, to set Firefox 6 as the default browser, run gnome-default-applications-properties, select Custom, and paste /usr/lib64/firefox-6/firefox %s in the Command: field.

Update: I upgraded my home desktop PC from FC12 to FC14 last night and used the above procedure to install Firefox 6.0.2 on it with Flash 11. So, in case anyone was wondering, the above works for Fedora as well.

Macbook Pro locks up with SSD installed.

A few weeks ago I switched from my trusty old HP nc8430 to a Macbook Pro (MC118LL/A) that was left spare when another employee left. I mostly enjoyed using Linux but I was tired of dealing with weird quirks like having X lock up, essentially forcing me to do a hard reboot.

To transition, I copied my documents from Linux to Mac, then turned off the Linux laptop. Surprisingly I found I didn’t need to turn Linux back on at all.
Continue reading “Macbook Pro locks up with SSD installed.”

Logging RT username in Apache access_log

RT has its own internal accounting & tracking system for logging activity, but I was interested in even more granular stuff, like seeing who looked at which tickets. I figured it wouldn’t be that hard to log this in Apache. Well, I was kind of right, in that it wasn’t “hard,” but it took me a long time to find the right place to do it. I did finally get it though.
Continue reading “Logging RT username in Apache access_log”

Benchmarking disk IO on ext3 vs ext4 vs xfs with fio

With the old database phased out, I figured this was a good time to benchmark IO on it before either repurposing it or mothballing it. In the past I’ve used dd for elementary sequential read/write testing, but I recently found fio which is much more versatile. Apparently developed by FusionIO, fio lets you control concurrency, blocksize and many other parameters and reports lots more data, most notably IOPS.

I created 4 fio config files, basically the same as the one below, for direct / buffered disk access and sequential / random writes. All have 8k blocksize since that’s what Postgres uses by default.


[root@link ~]# cat random-write-test.fio
[random-write]
rw=randwrite
size=5G
direct=1
directory=/msa70/fio/data/
numjobs=5
group_reporting
name=random-write-direct
bs=8k
runtime=10
[root@link ~]# cat sequential-write-test-buf.fio
[sequential-write]
rw=write
size=5G
direct=0
directory=/msa70/fio/data/
numjobs=5
group_reporting
name=sequential-write-buffered
bs=8k
runtime=10

First I ran fio against a freshly created ext3 volume on our HP MSA70 with 24x 10kRPM SAS disks in RAID 10. The mkfs command used was mke2fs -j -m0 /dev/cciss/c0d0. The controller card is a HP P800. Here’s the output:

[wpspoiler name=’fio – ext3 full results’]

[root@link ~]# time fio random-write-test.fio random-write-test-buf.fio sequential-write-test.fio sequential-write-test-buf.fio
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
fio 1.57
Starting 20 processes
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 1 (f=1): [________________W___] [19.8% done] [0K/0K /s] [0 /0  iops] [eta 03m:14s]
random-write-direct: (groupid=0, jobs=5): err= 0: pid=2447
  write: io=596856KB, bw=56946KB/s, iops=7118 , runt= 10481msec
    clat (usec): min=67 , max=944937 , avg=645.32, stdev=4647.69
     lat (usec): min=67 , max=944937 , avg=645.46, stdev=4647.69
    bw (KB/s) : min=  144, max=47152, per=22.71%, avg=12933.03, stdev=4985.93
  cpu          : usr=0.26%, sys=1.48%, ctx=74614, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/74607/0, short=0/0/0
     lat (usec): 100=12.22%, 250=80.24%, 500=6.59%, 750=0.20%, 1000=0.16%
     lat (msec): 2=0.07%, 4=0.01%, 10=0.10%, 20=0.05%, 50=0.08%
     lat (msec): 100=0.10%, 250=0.14%, 500=0.04%, 750=0.01%, 1000=0.01%
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2452
  write: io=6377.3MB, bw=472902KB/s, iops=59112 , runt= 13809msec
    clat (usec): min=8 , max=4490.8K, avg=77.54, stdev=3751.21
     lat (usec): min=8 , max=4490.8K, avg=77.66, stdev=3751.21
    bw (KB/s) : min=    7, max=438688, per=34.13%, avg=161414.56, stdev=63682.89
  cpu          : usr=2.27%, sys=24.01%, ctx=1596, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/816288/0, short=0/0/0
     lat (usec): 10=0.06%, 20=83.59%, 50=16.19%, 100=0.04%, 250=0.04%
     lat (usec): 500=0.03%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.03%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2457
  write: io=2832.0KB, bw=286105 B/s, iops=34 , runt= 10136msec
    clat (usec): min=75 , max=1506.6K, avg=157500.69, stdev=96237.55
     lat (usec): min=75 , max=1506.6K, avg=157500.81, stdev=96237.54
    bw (KB/s) : min=    6, max=  444, per=19.58%, avg=54.61, stdev=13.65
  cpu          : usr=0.00%, sys=0.00%, ctx=572, majf=0, minf=140
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/354/0, short=0/0/0
     lat (usec): 100=4.80%, 250=33.05%, 500=0.56%
     lat (msec): 50=1.13%, 100=18.36%, 250=25.99%, 500=9.04%, 750=4.24%
     lat (msec): 1000=1.41%, 2000=1.41%
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2462
  write: io=255624KB, bw=19933KB/s, iops=2491 , runt= 12824msec
    clat (usec): min=7 , max=9411.7K, avg=1443.28, stdev=33091.95
     lat (usec): min=7 , max=9411.7K, avg=1443.38, stdev=33091.95
    bw (KB/s) : min=   31, max=43687, per=29.58%, avg=5896.30, stdev=2696.81
  cpu          : usr=0.05%, sys=0.89%, ctx=600, majf=0, minf=136
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/31953/0, short=0/0/0
     lat (usec): 10=13.51%, 20=77.52%, 50=8.68%, 100=0.07%, 250=0.08%
     lat (usec): 500=0.03%, 750=0.01%
     lat (msec): 4=0.01%, 10=0.01%, 20=0.02%, 50=0.01%, 250=0.01%
     lat (msec): 750=0.02%, 2000=0.01%, >=2000=0.03%

Run status group 0 (all jobs):
  WRITE: io=596856KB, aggrb=56946KB/s, minb=58313KB/s, maxb=58313KB/s, mint=10481msec, maxt=10481msec

Run status group 1 (all jobs):
  WRITE: io=6377.3MB, aggrb=472902KB/s, minb=484251KB/s, maxb=484251KB/s, mint=13809msec, maxt=13809msec

Run status group 2 (all jobs):
  WRITE: io=2832KB, aggrb=279KB/s, minb=286KB/s, maxb=286KB/s, mint=10136msec, maxt=10136msec

Run status group 3 (all jobs):
  WRITE: io=255624KB, aggrb=19933KB/s, minb=20411KB/s, maxb=20411KB/s, mint=12824msec, maxt=12824msec

Disk stats (read/write):
  cciss!c0d0: ios=798/24542244, merge=0/937067, ticks=15866/140262853, in_queue=140448875, util=89.71%

real    18m35.082s
user    0m4.328s
sys     5m16.479s
[root@link ~]#

[/wpspoiler]
That’s a lot of data; the most interesting numbers are these:

random-write-direct: (groupid=0, jobs=5): err= 0: pid=2447
  write: io=596856KB, bw=56946KB/s, iops=7118 , runt= 10481msec
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2452
  write: io=6377.3MB, bw=472902KB/s, iops=59112 , runt= 13809msec
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2457
  write: io=2832.0KB, bw=286105 B/s, iops=34 , runt= 10136msec
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2462
  write: io=255624KB, bw=19933KB/s, iops=2491 , runt= 12824msec

I then reformatted the RAIDed LUN with ext4 with mkfs.ext4 -m0 /dev/cciss/c0d0 and ran the same tests again:
[wpspoiler name=’fio – ext4 full results’]

[root@link ~]# time fio random-write-test.fio random-write-test-buf.fio sequential-write-test.fio sequential-write-test-buf.fio
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
fio 1.57
Starting 20 processes
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 5 (f=5): [_______________WWWWW] [21.5% done] [0K/126.1M /s] [0 /15.9K iops] [eta 02m:30s]
random-write-direct: (groupid=0, jobs=5): err= 0: pid=2379
  write: io=589392KB, bw=58933KB/s, iops=7366 , runt= 10001msec
    clat (usec): min=77 , max=2522.4K, avg=781.82, stdev=9768.99
     lat (usec): min=77 , max=2522.4K, avg=781.95, stdev=9768.99
    bw (KB/s) : min=  515, max=44176, per=22.05%, avg=12996.62, stdev=5332.75
  cpu          : usr=0.26%, sys=2.86%, ctx=73715, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/73674/0, short=0/0/0
     lat (usec): 100=2.54%, 250=92.13%, 500=4.76%, 750=0.14%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.07%, 20=0.04%, 50=0.07%
     lat (msec): 100=0.09%, 250=0.11%, 500=0.03%, 750=0.01%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2384
  write: io=6707.1MB, bw=670727KB/s, iops=83840 , runt= 10241msec
    clat (usec): min=7 , max=1582.5K, avg=57.04, stdev=1910.33
     lat (usec): min=7 , max=1582.5K, avg=57.15, stdev=1910.33
    bw (KB/s) : min=   30, max=518784, per=25.08%, avg=168240.26, stdev=84472.16
  cpu          : usr=2.51%, sys=25.56%, ctx=35766, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/858614/0, short=0/0/0
     lat (usec): 10=2.14%, 20=91.72%, 50=5.11%, 100=0.31%, 250=0.15%
     lat (usec): 500=0.19%, 750=0.15%, 1000=0.09%
     lat (msec): 2=0.08%, 4=0.03%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2000=0.01%
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2389
  write: io=839760KB, bw=83959KB/s, iops=10494 , runt= 10002msec
    clat (usec): min=70 , max=300244 , avg=473.09, stdev=3815.84
     lat (usec): min=71 , max=300244 , avg=473.22, stdev=3815.84
    bw (KB/s) : min= 8198, max=30528, per=20.27%, avg=17014.88, stdev=1677.78
  cpu          : usr=0.32%, sys=3.15%, ctx=104986, majf=0, minf=140
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/104970/0, short=0/0/0
     lat (usec): 100=73.60%, 250=26.18%, 500=0.01%, 750=0.01%
     lat (msec): 50=0.01%, 100=0.01%, 250=0.18%, 500=0.02%
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2394
  write: io=1730.7MB, bw=163683KB/s, iops=20460 , runt= 10827msec
    clat (usec): min=7 , max=7649.9K, avg=220.41, stdev=12535.19
     lat (usec): min=7 , max=7649.9K, avg=220.52, stdev=12535.19
    bw (KB/s) : min=   27, max=331055, per=35.92%, avg=58788.88, stdev=27863.44
  cpu          : usr=0.42%, sys=5.54%, ctx=4119, majf=0, minf=136
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/221525/0, short=0/0/0
     lat (usec): 10=50.94%, 20=47.34%, 50=1.45%, 100=0.18%, 250=0.05%
     lat (usec): 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%

Run status group 0 (all jobs):
  WRITE: io=589392KB, aggrb=58933KB/s, minb=60347KB/s, maxb=60347KB/s, mint=10001msec, maxt=10001msec

Run status group 1 (all jobs):
  WRITE: io=6707.1MB, aggrb=670726KB/s, minb=686824KB/s, maxb=686824KB/s, mint=10241msec, maxt=10241msec

Run status group 2 (all jobs):
  WRITE: io=839760KB, aggrb=83959KB/s, minb=85974KB/s, maxb=85974KB/s, mint=10002msec, maxt=10002msec

Run status group 3 (all jobs):
  WRITE: io=1730.7MB, aggrb=163683KB/s, minb=167611KB/s, maxb=167611KB/s, mint=10827msec, maxt=10827msec

Disk stats (read/write):
  cciss!c0d0: ios=52/276341, merge=0/553601, ticks=224/1731709, in_queue=1735311, util=94.47%

real    0m42.218s
user    0m1.812s
sys     0m19.028s
[root@link ~]#

[/wpspoiler]
Summary numbers:

random-write-direct: (groupid=0, jobs=5): err= 0: pid=2379
  write: io=589392KB, bw=58933KB/s, iops=7366 , runt= 10001msec
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2384
  write: io=6707.1MB, bw=670727KB/s, iops=83840 , runt= 10241msec
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2389
  write: io=839760KB, bw=83959KB/s, iops=10494 , runt= 10002msec
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2394
  write: io=1730.7MB, bw=163683KB/s, iops=20460 , runt= 10827msec

This is just one run of each test, so not very scientific, but it does look pretty clear that ext4 outperforms ext3 (I actually did run the test several times but didn’t collect the results, they were pretty consistent though). What the test doesn’t show, but you can see from the time output is that each ext3 test took over 4 minutes to run while the ext4 tests each took about 10-12 seconds. In the ext3 tests, each “file1: Laying out IO file(s) (1 file(s) / 5120MB)” line took nearly a minute; in the ext4 tests they were essentially instant.

This is nothing groundbreaking, I was just playing around with fio and figured I’d post the results in case anyone else was curious. I did notice some oddities though, like the fact that random buffered writes appear to be faster than even sequential buffered writes, on both ext3 and ext4 – on ext4, showing 670 MB/s random buffered writes and 83k IOPS. That doesn’t seem to make much sense, but I’ll leave the tea-leaf reading to someone else.

Addendum: I ran the test with XFS, just for fun. Overall, it was significantly faster than ext3, but not as fast as ext4, though sequential buffered writes were much faster than either ext3 or ext4.

[wpspoiler name=’fio – xfs full results’]

[root@link ~]# mkfs.xfs -f -b size=4k /dev/cciss/c0d0
meta-data=/dev/cciss/c0d0        isize=256    agcount=4, agsize=107506378 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=430025510, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=209973, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@link ~]# mount -v -t xfs /dev/cciss/c0d0 /msa70/
/dev/cciss/c0d0 on /msa70 type xfs (rw)
[root@link ~]# mkdir -p /msa70/fio/data
(reverse-i-search)`f': mkdir -p /msa70/^Co/data
[root@link ~]# time fio random-write-test.fio random-write-test-buf.fio sequential-write-test.fio sequential-write-test-buf.fio
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-direct: (g=0): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
random-write-buf: (g=1): rw=randwrite, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-direct: (g=2): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
...
sequential-write-buffered: (g=3): rw=write, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
fio 1.57
Starting 20 processes
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
random-write-buf: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-direct: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
sequential-write-buffered: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 5 (f=5): [_______________WWWWW] [91.3% done] [0K/143.4M /s] [0 /17.1K iops] [eta 00m:26s]
random-write-direct: (groupid=0, jobs=5): err= 0: pid=2662
  write: io=416920KB, bw=41688KB/s, iops=5210 , runt= 10001msec
    clat (usec): min=83 , max=1592.6K, avg=1013.62, stdev=10137.19
     lat (usec): min=83 , max=1592.6K, avg=1013.77, stdev=10137.19
    bw (KB/s) : min=    7, max=23184, per=22.46%, avg=9363.32, stdev=2852.49
  cpu          : usr=0.31%, sys=1.61%, ctx=106802, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/52115/0, short=0/0/0
     lat (usec): 100=0.06%, 250=4.55%, 500=87.85%, 750=4.23%, 1000=0.70%
     lat (msec): 2=2.01%, 4=0.13%, 10=0.24%, 20=0.07%, 50=0.01%
     lat (msec): 100=0.03%, 250=0.06%, 500=0.03%, 750=0.02%, 2000=0.02%
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2667
  write: io=6557.9MB, bw=27876KB/s, iops=3484 , runt=240892msec
    clat (usec): min=7 , max=237984K, avg=1631.10, stdev=257906.92
     lat (usec): min=7 , max=237984K, avg=1631.22, stdev=257906.92
    bw (KB/s) : min=    0, max=628832, per=1222.74%, avg=340851.04, stdev=70996.06
  cpu          : usr=0.13%, sys=1.33%, ctx=24945, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/839397/0, short=0/0/0
     lat (usec): 10=18.02%, 20=79.39%, 50=2.19%, 100=0.18%, 250=0.07%
     lat (usec): 500=0.03%, 750=0.01%, 1000=0.04%
     lat (msec): 2=0.02%, 4=0.01%, 10=0.02%, 20=0.01%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 2000=0.01%
     lat (msec): >=2000=0.01%
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2677
  write: io=519056KB, bw=51895KB/s, iops=6486 , runt= 10002msec
    clat (usec): min=81 , max=242925 , avg=825.67, stdev=3980.13
     lat (usec): min=81 , max=242925 , avg=825.81, stdev=3980.13
    bw (KB/s) : min=   59, max=20029, per=19.03%, avg=9873.41, stdev=2128.50
  cpu          : usr=0.23%, sys=1.43%, ctx=129921, majf=0, minf=140
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/64882/0, short=0/0/0
     lat (usec): 100=60.60%, 250=38.63%, 500=0.01%
     lat (msec): 10=0.10%, 50=0.11%, 100=0.23%, 250=0.32%
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2682
  write: io=7688.1MB, bw=770776KB/s, iops=96347 , runt= 10215msec
    clat (usec): min=6 , max=1727.2K, avg=53.67, stdev=1633.96
     lat (usec): min=6 , max=1727.2K, avg=53.78, stdev=1633.96
    bw (KB/s) : min= 5340, max=704608, per=21.28%, avg=164049.10, stdev=87065.00
  cpu          : usr=2.15%, sys=19.54%, ctx=16284, majf=0, minf=136
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/984185/0, short=0/0/0
     lat (usec): 10=41.97%, 20=56.42%, 50=1.10%, 100=0.23%, 250=0.10%
     lat (usec): 500=0.05%, 750=0.02%, 1000=0.03%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.02%, 20=0.01%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 2000=0.01%

Run status group 0 (all jobs):
  WRITE: io=416920KB, aggrb=41687KB/s, minb=42688KB/s, maxb=42688KB/s, mint=10001msec, maxt=10001msec

Run status group 1 (all jobs):
  WRITE: io=6557.9MB, aggrb=27876KB/s, minb=28545KB/s, maxb=28545KB/s, mint=240892msec, maxt=240892msec

Run status group 2 (all jobs):
  WRITE: io=519056KB, aggrb=51895KB/s, minb=53140KB/s, maxb=53140KB/s, mint=10002msec, maxt=10002msec

Run status group 3 (all jobs):
  WRITE: io=7688.1MB, aggrb=770776KB/s, minb=789274KB/s, maxb=789274KB/s, mint=10215msec, maxt=10215msec

Disk stats (read/write):
  cciss!c0d0: ios=8/836060, merge=0/549215, ticks=11/36458428, in_queue=36468678, util=98.88%

real    4m31.948s
user    0m2.692s
sys     0m24.351s
[root@link ~]#

[/wpspoiler]

XFS abridged:

random-write-direct: (groupid=0, jobs=5): err= 0: pid=2662
  write: io=416920KB, bw=41688KB/s, iops=5210 , runt= 10001msec
random-write-buf: (groupid=1, jobs=5): err= 0: pid=2667
  write: io=6557.9MB, bw=27876KB/s, iops=3484 , runt=240892msec
sequential-write-direct: (groupid=2, jobs=5): err= 0: pid=2677
  write: io=519056KB, bw=51895KB/s, iops=6486 , runt= 10002msec
sequential-write-buffered: (groupid=3, jobs=5): err= 0: pid=2682
  write: io=7688.1MB, bw=770776KB/s, iops=96347 , runt= 10215msec


VMWare 5’s new licensing model.

After reading up on the new VMware licensing&pricing model I understand the uproar. Limiting vRAM is a reasonable constraint, but 32GB per socket for Enterprise? 48 GB for “Enterprise Plus”? If you have a dual CPU server with 144 GB (easily configurable last year), with 4.1 you’d only need 2 enterprise licenses to use all 144 GB, since in 4.1 an “Enterprise” license covered 1 CPU (up to 6 cores) and up to 256 GB memory on the host.

But with 5.0 you’ll either have to buy 5 Enterprise licenses (160 GB) or 3 Enterprise Plus licenses (144 GB) just to use the full 144 GB. I guess VMware has done away with memory overcommit as a selling point? They used to tell us it was recommended to go up to 2:1 so we could safely put ~140 GB of VMs on a 72 GB machine – with the new model that’s completely gone.

To put it in monetary terms, on the machine with 144 GB ram from above, the cost for 4.1 would be $2875 * 2 = $5750. To stick with Enterprise it would be $2875 * 5 = $14,375, or $3495 * 3 = $10,485. A gigantic price jump. I haven’t read up on any features of vSphere 5, but I don’t think any feature can make up for at minimum nearly doubling the cost, with loss of a major selling point (memory overcommit). I mean, you can still overcommit as long as you’re willing to pay for the overcommitted memory. Also, the above numbers are per-host, so if you’ve got a 5-host cluster you’re looking at a $25,000 price hike.

VMWare has long been one of my favorite products, but this is making me consider alternatives. Almost 100% of the feedback I’ve read about this change has been negative. Seems like a huge mistake on VMware’s part.

Migrating a PostgreSQL DB to a new machine without doing a dump & restore

Long ago, before I stepped into my role as de facto DBA in my current job, dump & restore of our Postgres database (dumping the entire DB to a text file, formatting the data disk, and restoring the data) was a pretty regular event. This was needed because we didn’t regularly vacuum the DB, which in turn was due to vacuums taking forever (which in the end ended up being due to crappy underlying hardware). We started doing monthly global vacuums (they took so long that monthly was about all we could handle) and we discovered we no longer needed to do dump & restores. When we finally upgraded to Postgres 8.2, which introduced autovacuum, things got even better, since we didn’t have to manage vacuuming via cron jobs.

In the years between then and now our Postgres DB has ballooned to such a size that a D&R isn’t something we could feasibly do in a scheduled maintenance window. The last time I did one was when we bade farewell to the decrepit database in September, 2007. At that point our database consumed 730 GB on disk. Looking through my email archives, we began the dump at 6:30 PM on a Friday night and it completed at 3:48 AM Saturday. The restore started around 9 AM and ran until around 1 PM (I assume it went so much faster than the dump due to the new DB being significantly better hardware-wise). Building indices took until 9:27 PM Saturday. We then ran a global “ANALYZE” to generate DB stats; the analyze ran from 10 PM until 1 AM Sunday. We then had most of Sunday to process the backlog of data that accumulated since Friday afternoon when we took the database offline.

So, with a 730 GB DB, the entire procedure took 9 hours (dump) + 4 hours (restore) + 8 hours (index rebuild) + 3 hours (analyze), so about 24 hours.

However, as I said, in the years since then our database has grown as we house more and more data, currently at about 1220 GB. It might have been possible to do the migration via dump & restore in the scheduled window, but I wasn’t looking forward to it. Instead, I decided to try a different option: copying the data files directly from the old server to the new one. If this worked it would eliminate almost all the overhead and the move would be complete in however long it took to copy the data from one host to the other.

Reasons

We had a couple reasons for doing this upgrade. Performance wasn’t really one of them though; we were pretty confident that the performance of the DB was as good as we were likely to get with platter disks, and our SAN doesn’t currently have SSDs. The old DB has dual Xeon 5160 2-core CPUs @3.0ghz, 32 GB memory, and a RAID 5 OS volume. The database resided on an HP MSA70, with 24x 10krpm 146 GB SAS drives (+1 hot spare) in RAID 10 for a 1.6 TB logical volume. At the time I debated RAID 6 vs RAID 10 but in the end I opted for the performance of RAID 10 vs the capacity of RAID 6 and it worked out well.

But one of the reasons we decided to upgrade was that the drives in the DB were starting to die and the warranty had expired on them, and each disk cost about $300 to replace. That was a pretty big liability and I expected the disks to begin dying more frequently.

Another reason for upgrading was the benefits of having the data on the SAN, especially snapshotting. We’d been doing daily backups for a while but doing a dump of the DB while it’s in use makes it take forever, causing degraded performance while it’s running. Snapshot isn’t a perfect solution, but at least it’s an option.

Another reason for wanting to move the DB to the SAN was for DR purposes; if we setup SAN-SAN replication to another site, with the DB on the SAN, we get that backed up for free.

And probably the biggest reason, we were up to 1.25 TB used out of 1.6, over 80% full. We’d probably be good for another few months, but for me 80% is pretty full.

Prerequisites

In order for this to work, the version of Postgres on both machines has to be of the same minor version. In my case, the source DB (server A) was running 8.2.5 (the newest at the time the box was built), so I built 8.2.18 (source RPM) on the target (server B). Server B is pretty beefy: HP DL360 G7 with dual Xeon X5660 6-core CPUs @2.8 GHz, 96 GB PC3-10600 ECC mem, QLogic QLE-4062 iSCSI HBA connected to a 4TB volume on our Compellent SAN (tier 1, RAID 10 across 32x 15krpm FC disks). Both machines of course need to be of the same architecture, in my case x86_64. It might work across Intel/AMD, but I’m not sure about that; fortunately I didn’t have to worry about that.

Moving the data

When we did dump & restores, we dumped directly to a commonly-mounted NAS, which worked well, since we wouldn’t start the restore until the dump was complete, and we didn’t want to consume disk on the target with a gigantic dump file (in addition to spinning the disks with reading the dumpfile while attempting to write to them; the contention causes everything to go much slower). There wasn’t really any need to use a NAS as an intermediary in this case though, it would just double the amount of time needed to get the data from A to B.

I created an NFS export on B:

/data/pgsql             10.0.0.35(rw,no_root_squash)

And mounted it on A with these options in /etc/fstab:

10.0.0.36:/data/pgsql  /mnt/gannon/nfs nfs     rw,rsize=32768,wsize=32768,nfsvers=3,noatime,udp        0     0

I tried TCP vs UDP mounts and found UDP was faster.

I then copied the data over with my favorite Unix tool, rsync:

time rsync -atp --delete --progress /var/lib/pgsql/data /mnt/gannon/nfs/ > /home/evan/rsync.log

Dry run

In January I did a dry run of the procedure. I had tried copying data over without stopping postgres on A, so as not to cause a service interruption, but it didn’t work; data was changing too rapidly. By the time the rsync completed, the files it had copied over earliest had already been modified again. I ended up scheduling some downtime for a weekend and did the copy. With the above NFS settings I was able to transfer data at around 50 MB/s over our gigabit switches, the whole thing took 3-4 hours. When it came up, everything seemed to be fine. I was pretty happy, because 3-4 hours is a whole lot better than 24+.

Day of Reckoning

I finally did the real migration this past weekend. I started it at 8 PM and (after an rsync snafu) completed it around 4 AM. The snafu was caused by my use of the “--delay-updates” flag, which I later learned copies modified files to a /.~tmp~/ directory, and when all of them are copied, moves them into place in an attempt to make it a more “atomic” operation. I didn’t realize this was what was happening, and I got a little freaked out when I saw the disk usage for the target growing 100 GB larger than the source. I cancelled the rsync and ran it again, stupidly dropping the –delay-updates flag, which with the –delete flag caused it to delete all the stuff in .~tmp~ that it had already copied over. It deleted like 300 GB of stuff before I freaked out and cancelled it again. I cursed myself a few times, then manually moved the contents of .~tmp~ up to the parent to salvage what had already been transferred, and ran the rsync once again to move the remaining data. So it probably would have been done much sooner had I not cancelled it and then deleted a bunch of my progress.

You may notice that the rsync flags above don’t include -z. With huge binary files being transferred over a fast LAN I don’t think there’s much reason to use -z, in fact when I added -z the throughput plummeted.

After copying the data, I moved the virtual IP of the DB to the new machine, moved the cron jobs over, started postgres on B and everything worked. I finished all of my cleanup around 6 AM Saturday, though like I said, I would have been done much sooner had I not deleted a bunch of my progress. Even still, this is a lot better than the dump & restore scenario, and has the added benefit of being reversible. I’m planning to upgrade postgres on A to 8.2.18 and leave it as a standby server; if a problem arises with B, the data can be moved back relatively quickly.

Conclusion

Well, I don’t have any great insight to put here, but so far this has worked out. My next DB project is upgrading from 8.2 to either 8.4 or the 9.x series, but that’s going to require a lot more planning since client drivers will likely need to be updated, and I’m not sure if queries might need to be altered.

The end (I hope).

Remount all NFS mounts to another cluster IP

mount | grep 10.0.0.73 | cut -f3 -d\  | xargs mount -o remount,rw,rsize=32768,wsize=32768,nfsvers=3,bg,intr,tcp,retrans=0,addr=10.0.0.74

Not much to say. This seems to work even when the disk is in use. Using it with Isilon to remount off of a quiesced node.