Using OpenSWAN to connect two VPCs in different AWS regions

Amazon has a pretty decent writeup on how to do this (here), but in trying to establish Postgres replication across regions, I found some weird behavior where I could connect to the port directly (telnet to 5432) but psql (or pg_basebackup) didn’t work. tcpdump showed this:

16:11:28.419642 IP 10.121.11.47.35039 > 10.1.11.254.postgresql: Flags [P.], seq 9:234, ack 2, win 211, options [nop,nop,TS val 11065893 ecr 1811434], length 225
16:11:28.419701 IP 10.121.11.47.35039 > 10.1.11.254.postgresql: Flags [P.], seq 9:234, ack 2, win 211, options [nop,nop,TS val 11065893 ecr 1811434], length 225
16:11:28.421186 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [.], ack 234, win 219, options [nop,nop,TS val 1811520 ecr 11065893,nop,nop,sack 1 {9:234}], length 0
16:11:28.425273 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811522 ecr 11065893], length 1375
16:11:28.425291 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
16:11:28.697397 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811590 ecr 11065893], length 1375
16:11:28.697438 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
16:11:29.241311 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811726 ecr 11065893], length 1375
16:11:29.241356 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
16:11:30.333438 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1811999 ecr 11065893], length 1375
16:11:30.333488 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
16:11:32.513418 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1812544 ecr 11065893], length 1375
16:11:32.513467 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556
16:11:36.881409 IP 10.1.11.254.postgresql > 10.121.11.47.35039: Flags [P.], seq 2:1377, ack 234, win 219, options [nop,nop,TS val 1813636 ecr 11065893], length 1375
16:11:36.881460 IP 10.1.96.20 > 10.1.11.254: ICMP 10.121.11.47 unreachable - need to frag (mtu 1422), length 556

After quite a bit of Google and mucking in network ACLs and security groups, the fix ended up being this:

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1500

(The above two commands need to be run on both OpenSwan boxes.)

Create CloudWatch alerts for all Elastic Load Balancers

I manage a bunch of ELBs but we were missing an alert on a pretty basic metric: how many errors the load balancer was returning.  Rather than wade through the UI to add these alerts I figured it would be easier to do it via the CLI.

Assuming aws-cli is installed and the ARN for your SNS topic (in my case, just an email alert) is $arn:

for i in `aws elb describe-load-balancers | grep LoadBalancerName | 
perl -ne 'chomp; my @a=split(/s+/); $a[2] =~ s/[",]//g ; print "$a[2] ";' ` ; 
do aws cloudwatch put-metric-alarm --alarm-name "$i ELB 5XX Errors" --alarm-description 
"High $i ELB 5XX error count" --metric-name HTTPCode_ELB_5XX --namespace AWS/ELB 
--statistic Sum --period 300 --evaluation-periods 1 --threshold 50 
--comparison-operator GreaterThanThreshold --dimensions Name=LoadBalancerName,Value=$i 
--alarm-actions $arn --ok-actions $arn ; done

That huge one-liner creates a CloudWatch notification that sends an alarm when the number of 5XX errors returned by the ELB is greater than 50 over 5 minutes, and sends an “ok” message via the same SNS topic. The for loop creates/modifies the alarm for every ELB.

More info on put-metric-alarm available in the AWS docs.

Amazon knows customer service.

I bought the latest Green Day album for $5. After downloading and playing it I realized I was listening to a censored version of the album. I didn’t think there was much recourse since it was an MP3 download, and there’s no way to “return” it. But I figured it was worth a try anyway. The worst that could happen is they say no. I started a chat session and to my surprise, they refunded my purchase price within 5 minutes! That’s what I call service.

You are now connected to Snow from Amazon.com.
Me:I didn’t realize the album I purchased was censored. The uncensored version is this one, which is the one I want instead: http://www.amazon.com/%C2%A1UNO-Explicit/dp/B009B50QOU/
Snow:Hi there, Evan ! My name is Snow, and I would be happy to help you with this issue.
This will take just a few moments to look into. 🙂
Me:Hello SNow
Snow:I am sorry about the confusion with that order,Evan . I have it refunded for you so you can get the version you had meant to. I also sent a confirmation of your refund to your email address associated with your Amazon account.
Me:Wow, that’s great. Thanks so much.
Snow:What that leaves now is just reordering from that link above.
🙂
You are super welcome! I have gotten the wrong album on accident as well! It’s never good to hear Green Day censored. 🙂
Me:Haha. It is such a great price too this week, $5 for the album. But when all the curses were bleeped out I was sad and realized I must have bought the wrong version.
Thanks again for your help!
Snow:You are very welcome ! I am glad I could help! Thank you for shopping with Amazon!

I realize $5 is nothing to a company like Amazon, and a happy customer is worth a lot more than that, but this was still a great experience overall.

Load balancing in EC2 with Nginx and HAProxy

We wanted to setup a loadbalanced web cluster in AWS for expansion. My first inclination was to use ELB for this, but I soon learned that ELB doesn’t let you allocate a static IP, requiring you to refer to it only by DNS name. This would be OK except for the fact that our current DNS provider, Dyn, requires IP addresses when using their GSLB (geo-based load balancer) service.

Rather than let this derail the whole project, I decided to look into the software options available for loadbalancing in EC2. I’ve been a fan of hardware load balancers for a while, sort of looking down at software-based solutions without any real rationale, but in this case I really had no choice so I figured I’d give it a try.

My first stop was Nginx. I’ve used it before in a reverse-proxy scenario and like it. The problem I had with it was that it doesn’t support active polling of nodes – the ability to send requests to the webserver and mark the node as up or down based on the response. As far as I can tell, using multiple upstream servers in Nginx allows you to specify max_fails and fail_timeout, however a “fail” is determined when a real request comes in. I don’t want to risk losing a real request – I like active polling.
Continue reading “Load balancing in EC2 with Nginx and HAProxy”

Integrating Amazon Simple Email Service with postfix for SMTP smarthost relaying.

So, we’ve outgrown the 500 outbound messages/day limit imposed by Google Apps’s Standard tier. A wise friend suggested SendGrid, but I figured it was worth looking into what options Amazon provides. I found SES and am in the process of setting it up. Hopefully I can set it up as a drop-in replacement, obviating the need for code changes to use it. SES is attractive for us because:

Free Tier
If you are an Amazon EC2 user, you can get started with Amazon SES for free. You can send 2,000 messages for free each day when you call Amazon SES from an Amazon EC2 instance directly or through AWS Elastic Beanstalk. Many applications are able to operate entirely within this free tier limit.

Note: Data transfer fees still apply. For new AWS customers eligible for the AWS free usage tier, you receive 15 GB of data transfer in and 15 GB of data transfer out aggregated across all AWS services, which should cover your Amazon SES data transfer costs. In addition, all AWS customers receive 1GB of free data transfer per month.

Free to try? Sounds good.

After signing up, the first thing I did was download the Perl scripts. Create a credentials file with your AWS access key ID and Secret Key (credentials can be found here when logged in). The credentials file (aws-credentials) should look like this:

AWSAccessKeyId=022QF06E7MXBSH9DHM02
AWSSecretKey=kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct

Make sure to chmod 0600 aws-credentials. To ensure it’s working, run:

$ ./ses-get-stats.pl -k aws-credentials -s

If it doesn’t return anything it should be working correctly.

Next, you need to add at least one verified email address:

$ ./ses-verify-email-address.pl -k aws-credentials --verbose -v support@example.com

Amazon will send a verification message to support@example.com with a link you need to click to verify the address. Once you click, it’s verified. It’s important to note that initially your account will only be able to send email to verified addresses. According to this thread, you need to submit a production access request to send to unverified To: addresses. I did this and got my “approval” email about 30 minutes later.

To send a test email:

$ ./ses-send-email.pl --verbose -k aws-credentials -s "Test from SES" -f support@example.com evan@example.com
This is a test message from SES.

(Press ctrl-D to send.)

The next step is integrating the script with sendmail/postfix. The first thing I did was move my scripts to /opt/ (out of /root/) and attempt to run them with absolute pathnames (rather than ./ses-send-email.pl) and I got perl @INC errors:

[root@web2 ~]$ mv amazon-email/ /opt/
[root@web2 ~]$ /opt/ses-get-stats.pl -k aws-credentials -s
-bash: /opt/ses-get-stats.pl: No such file or directory
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Can't locate SES.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /opt/amazon-email/ses-get-stats.pl line 23.
BEGIN failed--compilation aborted at /opt/amazon-email/ses-get-stats.pl line 23.

The problem is that SES.pm isn’t in perl’s include path. To solve this, I tried adding the directory to the PERL5LIB environment var:

[root@web2 amazon-email]$ PERL5LIB=/opt/amazon-email/
[root@web2 amazon-email]$ echo $PERL5LIB
/opt/amazon-email/
[root@web2 amazon-email]$ cd
[root@web2 ~]$ export PERL5LIB
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Cannot open credentials file . at /opt/amazon-email//SES.pm line 54.
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -s
Timestamp               DeliveryAttempts        Rejects Bounces Complaints
2011-04-27T20:27:00Z    1                       0       0       0
[root@web2 ~]$

This worked for setting all users’ PERL5LIB … but didn’t allow postfix to send the message. After a couple more attempts at doing this “the right way,” I just ended up dropping a symlink to SES.pm in /usr/lib/perl5/site_perl and the @INC error went away.

After following Amazon’s instructions for editing main.cf and master.cf, I still was unable to send mail through Postfix, even though I could send directly through the perl scripts. I kept getting this error:

Apr 28 11:26:32 web2 postfix/pipe[27226]: A2AD33C9A6: to=, relay=aws-email, delay=0.35, delays=0.01/0/0/0.34, dsn=5.3.0, status=bounced (Command died with status 1: "/opt/amazon-email/ses-send-email.pl". Command output: Missing final '@domain' )

Google led me to this blog post which led me to this other blog post which illuminated the problem: apparently the Postfix pipe macro ${sender} uses the user@hostname of the mail sender. Since the hostname of an EC2 machine is usually something crazy like dom11-22-33-44.internal, this is not likely a validated sending email address. So the solution proposed by Ben Simon was to create a regex to map user@internal to user@realdomain.com and have postfix map everything. This didn’t work for me or the bashbang.com guys, who changed it to map from user@internal to validuser@realdomain.com. I found that you can eliminate the need for the mapping entirely by changing the master.cf entry to this:

  flags=R user=mailuser argv=/opt/amazon-email/ses-send-email.pl -r -k /opt/amazon-email/aws-credentials -e https://email.us-east-1.amazonaws.com -f support@example.com ${recipient}

The only difference between the above line and Amazon’s suggestion is that this replaces “-f ${sender}” with “support@example.com” which is a validated email address.

After this I was able to relay email successfully through SES. Whew!

Update 5/26/2011: We’ve been relaying through SES without issues for a few weeks now. I recently ran ses-get-stats.pl to see how many messages we’re actually sending and it’s a lot lower than expected. I’m still glad we moved to SES though, since it has no hard cap like Google Apps does:

$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -q
SentLast24Hours Max24HourSend   MaxSendRate
317             10000           5

Relaying through Google Apps using Sendmail to bypass EC2 spam blockage

Update 3 May 2011: I’ve subsequently modified our EC2 systems to relay SMTP mail through Amazon’s SES which doesn’t have the 500 messages per day limit that Google Apps does.

A few months ago I moved a site into EC2. I didn’t want to move the existing IMAP server (ugh) so I moved the email to Google Apps. There are only about 10 mailboxes so we went with “Standard” edition (free). Once we completed the move to EC2 we discovered that emails from our webserver were bouncing due to our EC2 IP address being listed in a spam RBL. This sucked, so I looked into relaying the mail from the EC2 webserver through our Google Apps account. Fortunately this turned out to be pretty easy.

This wiki page on scalix.com has a procedure for setting up SMTP relaying in Ubuntu with TLS & auth. I’m not running Ubuntu so the paths were different but it was basically the same procedure:

  • Create the file /etc/mail/client-info with these contents: AuthInfo:smtp.gmail.com "U:bounces@example.com" "I:bounces@example.com" "P:superpassword", where “example.com” is your Google Apps domain, “bounces” is a valid account, and the password is the account’s password. Mail relayed with these credentials will show “bounces@example.com” in the From: field of the message.
  • In /etc/mail, run makemap hash client-info < client-info
  • Edit /etc/mail/sendmail.mc, adding or uncommenting these lines:
    define(`SMART_HOST', `smtp.gmail.com')dnl
    define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
    FEATURE(`authinfo', `hash /etc/mail/client-info')dnl
    
  • Recompile sendmail.cf: m4 sendmail.mc > sendmail.cf . I got this error: “/etc/mail/sendmail.mc:10: m4: Cannot open /usr/share/sendmail-cf/m4/cf.m4” when running the command, but I resolved it by doing yum install sendmail-cf
  • Restart sendmail.

Once this was done I sent myself a test message from the command line and received it; I checked the SMTP headers and sure enough it went through Google’s mail server. One nice side effect is that all the mail sent by the webserver appears in the “Sent” folder for the Google Apps username provided in the client-info file. Hopefully this will resolve the spam issues, since the mail is now coming from Google’s IP block.

Amazon EC2 – ext3 mkfs takes 30+ minutes?

I’ve been playing around with Amazon EC2 for a new project I’m working on and so far I’m really impressed. One thing I’ve noticed, however, is that it takes forever to create an ext3 filesystem on a new volume. For example, the below command took over 30 minutes to create the filesystem on a 300 GB volume:

# mke2fs -j -m0 /dev/sdf1
mke2fs 1.40.4 (31-Dec-2007)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
39321600 inodes, 78642183 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2400 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

It took about 30 seconds do do everything up to the writing of the superblocks. Not sure why this takes so long, but it’s happened for every EBS volume I’ve formatted ext3. Annoying. Initially I thought it was hanging, and ended up terminating an instance that wouldn’t shutdown or let me cancel the operation. The terminated instance is still being displayed in the UI with a status of “terminated” and I can’t find any way to remove it from the list.