Goodbye, pg_dump

I’ve been a Postgres user and administrator for a while. Over the years, my views on backups have evolved.

Originally, like most people, I started out with good old pg_dump. With a reasonably small database (under 50 GB) dumping to a flat text file is a fine option. I’d generally do something like pg_dump -Upostgres dbname | gzip > dbname.sql.gz to compress it on the fly and save space. For years this seemed perfect: dumping the entire database in a single transaction into a single file that can be restored anywhere.

But as my databases started growing larger and larger, the time it took to do a pg_dump grew as well. At a previous job, the database grew to nearly 2TB and the pg_dump took nearly 18 hours. We’d by that point already changed the pg_dump schedule from daily to weekly and then to three times a month and then finally to semi-monthly. Not only was it slow, but since it operated in a single transaction it wreaked havoc with normal database operation for queries that needed locks on tables locked by the dump.

When we moved the database from a physical RAID to a volume on our SAN, that gave us the opportunity to use LUN snapshotting rather than pg_dump (I just remembered I already wrote about that here). This let us move to a monthly pg_dump and more frequent snapshot-level backups that took up very little space. This was ideal on Compellent since the snapshots would auto-expire after however long you specified.

When I started at Yodle we were doing nightly pg_dumps and pretty soon we ran into the same problems I’d seen at Didit with the dump itself interfering with normal DB operation – the dump would start at midnight and run until 7-8 AM when I started, and after a few months it would still be running at noon. We discussed moving to wal archiving and making a basebackup to NFS but that would require a pretty massive amount of space, and as anybody who uses “enterprise storage” knows, that’s not something you want to do. We discussed building a whitebox file server for backups but nobody was really in love with that option – we’re trying to reduce the reliance on physical machines as much as possible. We talked about pushing it all to S3 but that seemed rather difficult.

When I attended NYC PgDay earlier this year, there was lots of discussion about WAL-E. I hadn’t ever head of WAL-E so I looked it up and was impressed. Basically, WAL-E handles archiving of wal to S3, but first compresses and pgp-encrypts it. It also handles pushing the basebackup to S3, also compressed and pgp-encrypted. This was just what we were looking for. We set it up and, amazingly, it worked perfectly. After a few weeks (and confirming we can restore from the wal-e backups) we moved our pg_dump to weekly, on the weekend when it doesn’t interfere with any user processes. We do a wal-e basebackup every 3-4 days or so and retain 3 of them. We retain all the wal so we can restore the DB to any point within the last ~10 days if needed. The best part is it’s faster than pg_dump, and since the basebackup doesn’t operate in a transaction (it’s a filesystem-level backup rather than an application-level backup) it doesn’t mess with user queries. There’s of course elevated IO during this time but our SAN has more than enough bandwidth.

We setup some basic monitoring of S3 (check the age of the most recent WAL and log it in Zabbix) just to ensure the backups are actually happening, and we’re at the point where we’re discussing moving pg_dump to monthly, or simply not doing it at all. Overall, wal-e has been a huge win for us, enabling better, faster backups that don’t interfere with the DB itself, and, while not free, aren’t ridiculously expensive. And since it’s in its own S3 bucket, you can tweak the bucket settings (e.g. enable RRS) to save money, and Amazon tells you exactly how much your backups cost you.

Linux ‘date’ stuff I didn’t know until recently.

I’ve been using Linux for a long time but I had no idea you could do this.

evan@evbox:~$ date -d yesterday
Thu Jul 25 12:44:39 EDT 2013
evan@evbox:~$ date -d '42 days ago'
Fri Jun 14 12:44:51 EDT 2013
evan@evbox:~$ date -d '65 minutes ago'
Fri Jul 26 11:41:18 EDT 2013

As epoch:

evan@evbox:~$ date -d '65 minutes ago' +%s
1374853307

Amazing.

Digital Ocean – First Impressions

For the past few years I’ve been hosting this site on an old desktop in my basement on my FiOS connection. This was one of the things I really liked when I switched from Cablevision to Verizon – they don’t block port 80 inbound, so I didn’t have to pay for separate hosting. My “server” was an old AMD desktop with 1 gig ram and a sata drive. It was ok; my site was slow but I was ok with that. I configured Nginx to cache the static assets which sped most things up to “ok” levels but it was never fast.

This setup had a bunch of problems though, and the biggest one was power. Namely, it goes out in my house all the time. I probably have 4 or 5 brief outages each month, and my old box doesn’t come back up properly on reboot (some bios conflict with an eSATA disk I have hooked up to it). Plus, since my basement became a huge bathtub during Sandy, my site was down for about a month, but that wasn’t really a big concern at the time.

Anyway, via a “Promoted Tweet” on Twitter I found Digital Ocean, a VPS provider with rates starting at $5/month for an SSD-backed VM. They also had a promo at the time for a $10 credit, so I figured I’d give it a try.

Account creation was simple and I didn’t need to enter my CC until I actually created a server (“droplet” in their parlance). Server creation was pretty trivial: select the OS image (I chose CentOS 6.4 but they offer Ubuntu, Arch, Debian and Fedora as well), the size (512 MB ram through 16 GB), the region (San Francisco, New York, or Amsterdam), enter a hostname and your SSH pubkey. In about 60 seconds your server is ready to go, with a public IP and everything. My VM has a 20 GB disk and the base OS install was about 900 MB. I installed Apache, Nginx, MySQL and some other stuff, then dumped my WordPress DB and uploaded it to the new VM and copied the entire Apache docroot over as well. Within about 30 minutes of spinning up the VM I had everything up on the new box, and I made the DNS changes shortly after that. Pretty straightforward.

It’s only been a couple of days but so far I’m really liking the performance. My site doesn’t get a lot of traffic to begin with, but since I cache most stuff to disk, and the disk is SSD, it’s really quick. I’ll keep an eye on it but so far this is looking like a great choice for small website hosting. The only thing is I’ll need to setup some sort of offsite backups, but I can just cron an rsync to my home machine for now.

evanhoffman_digitalocean

XFS write speeds: software RAID 0/5/6 across 45 spindles

We’re currently building a new storage server to store low-priority data (tertiary backups, etc). One of the requirements for the project is that it needs to be on cheap storage (as opposed to expensive enterprise SAN/NAS). After some research we decided to build a Backblaze pod. Backblaze used 3TB Hitachi drives in their system, but the ones they listed in their blog post are discontinued and the reviews for all other 3TB+ drives were terrible, so we went with Samsung ST2000DL004 2TB 7200 RPM drives. Like Backblaze, we’re going with software raid, but I figured a good first step would be to figure out what RAID level we want to use, and if we want to use the mdadm/LVM mish-mosh Backblaze uses, or find something simpler. For my testing I created a RAID6 of all 45 drives and created a single XFS volume (XFS’s size limit is ~8 exabytes vs ext4’s 16TB). Ext4 may present some performance advantages, but the management overhead is probably not worth it in our case.

So, this is just a simple benchmarking comparing RAID 0 (stripe with no parity) as a baseline, RAID5 (stripe with 1 parity disk) and RAID6 (stripe with 2 parity disks) across 45 total spindles. For all tests I used Linux software RAID (mdadm).

To test, I have 3 scripts, makeraid0.sh, makeraid5.sh, and makeraid6.sh. Each one does what its name implies. The raid0 has 43 disks, raid5 has 44 disks, and raid6 has 45 disks, so there are 43 “data” disks in each test. The system is a Protocase “Backblaze-inspired” system with a Core i3 540 CPU, 8 GB memory, CentOS 6.3 x64, and 45x We’re just using this box for backup and it gives us about 79 TB usable, which is still plenty, so 2TB isn’t a big problem.

makeraid?.sh for filesystem creation:

#!/bin/bash

mdadm --create /dev/md0 --level=raid6 -c 256K --raid-devices=45 
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde 
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj 
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo 
/dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt 
/dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy 
/dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad 
/dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai 
/dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan 
/dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

Filesystem:

[root@Protocase ~]# mkfs.xfs -f /dev/md0
meta-data=/dev/md0               isize=256    agcount=79, agsize=268435392 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=21000267072, imaxpct=1
         =                       sunit=64     swidth=2752 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@Protocase ~]# mount /dev/md0 /raid0/
[root@Protocase ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdat2            289G  3.2G  271G   2% /
tmpfs                 3.9G  260K  3.9G   1% /dev/shm
/dev/sdat1            485M   62M  398M  14% /boot
/dev/md0               79T   35M   79T   1% /raid0

RAID0

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.1944 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.1922 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.7665 s, 423 MB/s

RAID5

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.2239 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.7427 s, 424 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.2434 s, 433 MB/s

RAID6:

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.9032 s, 390 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.5255 s, 395 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.4338 s, 397 MB/s

I found it pretty strange that RAID5 seemed to outperform RAID0, but I tested it several times and RAID5 averaged 10-15 MB/s faster than RAID0. Maybe a bug in the kernel? I tried other block sizes ranging from 60KB to 4MB for dd but the results were pretty consistent. In the end it looks like I’m going to go with RAID6 of 43 drives + 2 hotspares, which still yields ~400 MB/s throughput and 75 TB usable:

#!/bin/bash

mdadm --create /dev/md0 --level=raid6 -c 256K -n 43 -x 2 
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde 
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj 
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo 
/dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt 
/dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy 
/dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad 
/dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai 
/dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan 
/dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

Update: A coworker suggested looking into write-intent bitmap to improve rebuild speeds. After adding a 256 MB-chunked bitmap, the write performance didn’t degrade much, so this looks like a good addition to the configuration:

[root@Protocase ~]# mdadm -G --bitmap-chunk=256M --bitmap=internal /dev/md0
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.8157 s, 406 MB/s
[root@Protocase ~]# rm -fv /raid0/zeros.dat
removed `/raid0/zeros.dat'
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.4233 s, 397 MB/s
[root@Protocase ~]# rm -fv /raid0/zeros.dat
removed `/raid0/zeros.dat'
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.2593 s, 399 MB/s
[root@Protocase ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdat2            289G  3.2G  271G   2% /
tmpfs                 3.9G   88K  3.9G   1% /dev/shm
/dev/sdat1            485M   62M  398M  14% /boot
/dev/md0               75T  9.8G   75T   1% /raid0
[root@Protocase ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdas[44](S) sdar[43](S) sdaq[42] sdap[41] sdao[40] sdan[39] sdam[38] sdal[37] sdak[36] sdaj[35] sdai[34] sdah[33] sdag[32] sdaf[31] sdae[30] sdad[29] sdac[28] sdab[27] sdaa[26] sdz[25] sdy[24] sdx[23] sdw[22] sdv[21] sdu[20] sdt[19] sds[18] sdr[17] sdq[16] sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
      80094041856 blocks super 1.2 level 6, 256k chunk, algorithm 2 [43/43] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 2/4 pages [8KB], 262144KB chunk

unused devices: 

Reorganizing photos in 1 line with exiftool

A few years ago I wrote a utility in Java to find all JPG files in a directory and move them into a date-based directory structure like /YYYY/MM/DD/ based on the date the photo was taken, extracted from the exif metadata in the file. Well, apparently that was a huge waste of time, as I just discovered that exiftool, an awesome perl utility I’ve used for years to edit/extract the metadata on the command line, can also do this natively. So my entire program can be replaced with this simple command:

$ exiftool -r '-FileName<CreateDate' -d /targetDir/%Y/%Y-%m/%Y-%m-%d/%Y-%m-%d.%%f.%%e /media/EOS_DIGITAL/

This will copy the files directly off the SD card mounted at /media/EOS_DIGITAL/ into the proper structure in /targetDir/.

Slow HTTP downloads through Cisco ASA 5500

Recently we noticed weird behavior downloading files from certain sites. The transfer would start out fast (around 10 MB/s), then after a couple of seconds it would plummet to around 9 KB/s. It didn’t happen for every file or every site: downloads from S3 buckets were still particularly fast. But some files that I remember being particularly fast were now showing this weird fast/slow/fast/slow behavior, for example the Sun JDK and ISOs from rit.edu that used to saturate our pipe were now getting all cRAzY.

After some poking around I decided to test HTTP versus FTP to see if it could be an application/protocol-level issue. The easiest way to do this was to find a file available via both FTP and HTTP and download it via both protocols. This is where mirrors.rit.edu came in handy. I used cURL to download it and noticed that via HTTP it was much slower than over FTP:

[evan@boba 16:07:03 ~]$ curl -O ftp://mirrors.rit.edu/pub/centos/6/isos/x86_64/CentOS-6.2-x86_64-netinstall.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  227M  100  227M    0     0   9.8M      0  0:00:22  0:00:22 --:--:-- 7816k
[evan@boba 16:07:33 ~]$ rm CentOS-6.2-x86_64-netinstall.iso 
[evan@boba 16:07:39 ~]$ curl -O http://mirrors.rit.edu/centos/6/isos/x86_64/CentOS-6.2-x86_64-netinstall.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  227M  100  227M    0     0  5686k      0  0:00:40  0:00:40 --:--:-- 6269k

22 seconds via FTP at 9.8MB/s average, 40 seconds over HTTP at 5.6 MB/s average (which was one of the better HTTP runs).

This was affecting all machines on our network, and had nothing to do with the per-machine iptables rules (verified by flushing all rules). The only thing I could think of that might affect all machines, but only HTTP and not FTP would be something like packet inspection. Well, turns out that http packet inspection is on by default on the ASA. So I disabled it as described here:

Zeus(config)# conf t
Zeus(config)# policy-map global_policy
Zeus(config-pmap)# class inspection_default
Zeus(config-pmap-c)# no inspect http
Zeus(config-pmap-c)# write mem
Building configuration...

Since then HTTP transfers have been consistently fast.

Using rrdtool to generate server load & bandwidth graphs

I’ve been using MRTG and routers2.cgi for years to graph the various aspects of a server that warrant monitoring. I’ve long known that they used something called rrdtool to do… well, something, but never had a need or desire to figure out exactly what that was.

But, having just moved my site to a new server, I was curious how the server would handle the load. Rather than setting up some behemoth like Nagios or Zabbix, which are full monitoring/alerting suites, I just wanted graphing. As I said, in the past I’ve used MRTG or routers2.cgi for this but both of them were overkill for me in this case. Since both of them used rrdtool, I figured that was a good place to look.

The two metrics I want to record are server load and in/out bandwidth. The first step is to create the RRDs (round robin databases). This was done via these commands:

# rrdtool create /mrtg/load.rrd --start N DS:load1:GAUGE:600:0:100 DS:load5:GAUGE:600:0:100 DS:load15:GAUGE:600:0:100 RRA:AVERAGE:0.5:2:800

# rrdtool create /mrtg/eth1.rrd --start N DS:in:COUNTER:600:0:10000000000 DS:out:COUNTER:600:0:10000000000 RRA:AVERAGE:0.5:2:800

A good explanation of what these various fields mean is here. In short, each “DS:” section defines a “column” (for fellow RDBMS users) in the database. The first one has 3 “columns,” named load1, load5, load15, each of which will contain GAUGE data. The second one contains two COUNTER fields, representing the bytes in/out for interface eth1.

To actually get the data I poll snmpd via this bash script:

#!/bin/bash

rrdupdate /mrtg/load.rrd N:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.1`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.2`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost laLoad.3`

rrdupdate /mrtg/eth1.rrd N:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost ifInOctets.3`:
`/usr/bin/snmpget -v 2c -c public -Oqv localhost ifOutOctets.3`

I have that run every 5 minutes via cron. Then to generate the actual graph, I run this script via cron:

#!/bin/bash

rrdtool graph /var/www/html/graphs/load.png 
        -N 
        -E 
        --start now-30hours 
        --title "Load Averages" 
        --width 300
         --x-grid MINUTE:60:HOUR:2:HOUR:4:0:%H
        --height 200 
        -u 1.0 
        --lower-limit 0
        --vertical-label "Load Avg" 
        --full-size-mode 
-a PNG --title="Load Avg" 
'DEF:load1=/mrtg/load.rrd:load1:AVERAGE' 
'VDEF:load1last=load1,LAST' 
'DEF:load5=/mrtg/load.rrd:load5:AVERAGE' 
'DEF:load15=/mrtg/load.rrd:load15:AVERAGE' 
'AREA:load15#33CC33:15 Min Load Avg ' 
'LINE1:load1#0000ff:1 Min Load Avg ' 
'GPRINT:load1:AVERAGE:"Load1 Avg:%3.2lf"' 
'GPRINT:load1last:Drawn at %Y-%m-%d, %H:%M:strftime' 
#'LINE1:load5#ff00ff:5 Min Load Avg ' 

 
rrdtool graph /var/www/html/graphs/eth1.png 
        -N 
        -E 
        --start now-30hours 
        --title "eth1 traffic" 
        --width 300
         --x-grid MINUTE:60:HOUR:2:HOUR:4:0:%H
        --height 200 
        -u 1000000 
        --lower-limit 0
        --vertical-label "bps" 
        --full-size-mode 
-a PNG --title="eth1 traffic" 
'DEF:eth1in=/mrtg/eth1.rrd:in:AVERAGE' 
'CDEF:eth1inbits=eth1in,8,*' 
'VDEF:eth1last=eth1in,LAST' 
'DEF:eth1out=/mrtg/eth1.rrd:out:AVERAGE' 
'CDEF:eth1outbits=eth1out,8,*' 
'AREA:eth1inbits#33CC33:eth1 in ' 
'LINE1:eth1outbits#0000ff:eth1 out' 
'GPRINT:eth1last:Drawn at %Y-%m-%d, %H:%M:strftime' 

The final graphs look decent, though not very fancy, but I’ll play around with it a bit more:

eth1 graph
eth1 graph
load graph
load graph

Load balancing in EC2 with Nginx and HAProxy

We wanted to setup a loadbalanced web cluster in AWS for expansion. My first inclination was to use ELB for this, but I soon learned that ELB doesn’t let you allocate a static IP, requiring you to refer to it only by DNS name. This would be OK except for the fact that our current DNS provider, Dyn, requires IP addresses when using their GSLB (geo-based load balancer) service.

Rather than let this derail the whole project, I decided to look into the software options available for loadbalancing in EC2. I’ve been a fan of hardware load balancers for a while, sort of looking down at software-based solutions without any real rationale, but in this case I really had no choice so I figured I’d give it a try.

My first stop was Nginx. I’ve used it before in a reverse-proxy scenario and like it. The problem I had with it was that it doesn’t support active polling of nodes – the ability to send requests to the webserver and mark the node as up or down based on the response. As far as I can tell, using multiple upstream servers in Nginx allows you to specify max_fails and fail_timeout, however a “fail” is determined when a real request comes in. I don’t want to risk losing a real request – I like active polling.
Continue reading “Load balancing in EC2 with Nginx and HAProxy”

Installing Sun (Oracle) JDK 1.5 on an EC2 instance

I’m currently working on moving a Tomcat-based application into EC2. The code was written for Java 5.0. While Java 6 would probably work, I’d like to keep everything as “same” as possible, since EC2 presents its own challenges. I spun up a couple of t1.micro instances and copied everything over, including the Java 5 JDK, jdk-1_5_0_22-linux-amd64.rpm. Installing from RPM was easy, but the EC2 instance defaults to using OpenJDK 1.6:

[root@ec2 ~]# java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.10) (amazon-52.1.9.10.40.amzn1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

There were a couple of things I had to do to get the system to accept the Sun JDK as its “real” java.

Alternatives

Red Hat’s “alternatives” system is designed to allow a system to have multiple versions of a program installed and make it easy to choose which one you want to run. Unfortunately I’ve found the syntax a bit strange and always have to Google it, so I figured I’d document it here for posterity.

So here’s the default:

[root@ec2 ~]# alternatives --config java

There is 1 program that provides 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number: 

Here’s how to add Sun java, assuming the java binary is in /usr/java/jdk1.5.0_22/jre/bin/java (where the RPM puts it).

[root@ec2 ~]# alternatives --install /usr/bin/java java /usr/java/jdk1.5.0_22/jre/bin/java 1
[root@ec2 ~]# alternatives --config java
There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
   2           /usr/java/jdk1.5.0_22/jre/bin/java

Enter to keep the current selection[+], or type selection number: 2
[root@ec2 ~]# java -version
java version "1.5.0_22"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_22-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_22-b03, mixed mode)

Yay! Unfortunately this doesn’t help with the other problem I had with Tomcat, which was that EC2 instances set the JAVA_HOME var to OpenJDK as well (/usr/lib/jvm/jre). Fortunately this is an easy fix as well.

Setting JAVA_HOME

The JAVA_HOME var is set in /etc/profile.d/aws-apitools-common.sh. Comment out this line:

export JAVA_HOME=/usr/lib/jvm/jre

Create a new file, /etc/profile.d/sun-java.sh, and put this in it:

export JAVA_HOME=/usr/java/jdk1.5.0_22/jre

Also in that file I added the following to instruct the JVM to process all dates in America/New_York, since that’s the timezone all of our other servers use, and it makes reading log files easier when all dates are in the same tz:

export TZ=America/New_York

(I found I had to do this even after pointing /etc/localtime to the correct zoneinfo – Java was stuck on UTC even after the rest of the system was using America/New_York.)