Amazon has a pretty decent writeup on how to do this (here), but in trying to establish Postgres replication across regions, I found some weird behavior where I could connect to the port directly (telnet to 5432) but psql (or pg_basebackup) didn’t work. tcpdump showed this:
16:11:28.419642 IP 10.121.11.47.35039 > 10.1.11.254.postgresql: Flags [P.], seq 9:234, ack 2, win 211, options [nop,nop,TS val 11065893 ecr 1811434], length 225 16:11:28.419701 IP 10.121.11.47.35039 > 10.1.11.254.postgresql: Flags [P.], seq 9:234, ack 2, win 211, options [nop,nop,TS val 11065893 ecr 1811434], length 225 16:11:28.421186 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [.], ack 234, win 219, options [nop,nop,TS val 1811520 ecr 11065893,nop,nop,sack 1 {9:234}], length 0 16:11:28.425273 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811522 ecr 11065893], length 1375 16:11:28.425291 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556 16:11:28.697397 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811590 ecr 11065893], length 1375 16:11:28.697438 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556 16:11:29.241311 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811726 ecr 11065893], length 1375 16:11:29.241356 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556 16:11:30.333438 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811999 ecr 11065893], length 1375 16:11:30.333488 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556 16:11:32.513418 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1812544 ecr 11065893], length 1375 16:11:32.513467 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556 16:11:36.881409 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1813636 ecr 11065893], length 1375 16:11:36.881460 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
After quite a bit of Google and mucking in network ACLs and security groups, the fix ended up being this:
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1500
(The above two commands need to be run on both OpenSwan boxes.)