How (the hell) do you set up Splunk Cloud on Linux?

This took me way longer than I would’ve thought, mostly due to horrible documentation. Here’s my TL;DR version:

  1. Sign up for Splunk Cloud
  2. Download and install the forwarder binary from here.
  3. Log in here and note the URL of your Splunk instance:

    splunk_cloud
    In the above picture, assume the URL is https://prd-p-jxxxxxxxx.splunk6.splunktrial.com.

  4. Make sure your instances can connect to port tcp/9997 on your input host. Your input host is the hostname from above with “input-” prepended to it. So in our example, the input host is input-prd-p-jxxxxxxxx.splunk6.splunktrial.com. To ensure you can connect, try telnet input-prd-p-jxxxxxxxx.splunk6.splunktrial.com 9997. If it can’t connect you may need to adjust your firewall rules / Security groups to allow outbound tcp/9997

Below are the actual commands I used to get data into our Splunk Cloud trial instance:

$ curl -O http://download.splunk.com/products/splunk/releases/6.2.0/universalforwarder/linux/splunkforwarder-6.2.0-237341-linux-2.6-amd64.deb
$ sudo dpkg -i splunkforwarder-6.2.0-237341-linux-2.6-amd64.deb
$ sudo /opt/splunkforwarder/bin/splunk add forward-server input-prd-p-jxxxxxxxx.splunk6.splunktrial.com:9997
This appears to be your first time running this version of Splunk.
Added forwarding to: input-prd-p-jxxxxxxxx.splunk6.splunktrial.com:9997.
$ sudo /opt/splunkforwarder/bin/splunk add monitor '/var/log/postgresql/*.log'
Added monitor of '/var/log/postgresql/*.log'.
$ sudo /opt/splunkforwarder/bin/splunk list forward-server
Splunk username: admin
Password:
Active forwards:
	input-prd-p-jxxxxxxxx.splunk6.splunktrial.com:9997
Configured but inactive forwards:
	None
$ sudo /opt/splunkforwarder/bin/splunk list monitor
Monitored Directories:
		[No directories monitored.]
Monitored Files:
	/var/log/postgresql/*.log
$ sudo /opt/splunkforwarder/bin/splunk restart

Can I create an EC2 MySQL slave to an RDS master?

No.

Here’s what happens if you try:

mysql> grant replication slave on *.* to 'ec2-slave'@'%';
ERROR 1045 (28000): Access denied for user 'rds_root'@'%' (using password: YES)
mysql> update mysql.user set Repl_slave_priv='Y' WHERE user='rds_root' AND host='%';
ERROR 1054 (42S22): Unknown column 'ERROR (RDS): REPLICA SLAVE PRIVILEGE CANNOT BE GRANTED OR MAINTAINED' in 'field list'
mysql>

Note: this is for MySQL 5.5, which is unfortunately what I’m currently stuck with.

The m3.medium is terrible

I’ve been doing some testing of various instance types in our staging environment, originally just to see if Amazon’s t2.* line of instances is usable in a real-world scenario. In the end, I found that not only are the t2.mediums viable for what I want them to do, but they’re far better suited than the m3.medium, which I wouldn’t use for anything that you ever expect to reach any load.

Here are the conditions for my test:

  • Rails application (unicorn) fronted by nginx.
  • The number of unicorn processes is controlled by chef, currently set to (CPU count * 2), so a 2 CPU instance has 4 unicorn workers.
  • All instances are running Ubuntu 14.04 LTS (AMI ami-864d84ee for HVM, ami-018c9568 for paravirtual) with kernel 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64.
  • The test used loader.io to simulate 65 concurrent clients hitting the API (adding products to cart) as fast as possible for 600 seconds (10 minutes).
  • The instances were all behind an Elastic Load Balancer, which routes traffic based on its own algorithm (supposedly the instances with the lowest CPU always gets the next request).

The below charts summarize the findings.

average nginx $request_time
average nginx $request_time

This chart shows each server’s performance as reported by nginx. The values are the average time to service each request and the standard deviation. While I expected the m3.large to outperform the m3.medium, I didn’t expect the difference to be so dramatic. The performance of the t2.medium is the real surprise, however.

#	_sourcehost	_avg	_stddev
1	m3.large	6.30324	3.84421
2	m3.medium	15.88136	9.29829
3	t2.medium	4.80078	2.71403

These charts show the CPU activity for each instance during the test (data as per CopperEgg).

m3.large
m3.large
t2.medium
t2.medium
m3.medium
m3.medium

The m3.medium has a huge amount of CPU steal, which I’m guessing accounts for its horrible performance. Anecdotally, in my own experience m3.medium far more prone to CPU steal than other instance types. Moving from m3.medium to c3.large (essentially the same instance with 2 cpus) eliminates the CPU steal issue. However, since the t2.medium performs as well as the c3.large or m3.large and costs half of the c3.large (or nearly 1/3 of the m3.large) I’m going to try running most of my backend fleet on t2.medium.

I haven’t mentioned the credits system the t2.* instances use for burstable performance, and that’s because my tests didn’t make much of a dent in the credit balance for these instances. The load test was 100x what I expect to see in normal traffic patterns, so the t2.medium with burstable performance seems like an ideal candidate. I might add a couple c3.large to the mix as a backstop in case the credits were depleted, but I don’t think that’s a major risk – especially not in our staging environment.

Edit: I didn’t include the numbers, but the performance seemed to be the consistent whether on hvm or paravirtual instances.

Setting hostname in an EC2 instance from the name tag

# pip install awscli
# HOSTNAME=`aws ec2 describe-tags --region us-east-1 --filters Name=resource-id,Values=`curl http://169.254.169.254/latest/meta-data/instance-id 2> /dev/null` Name=key,Values=Name --output text --query 'Tags[*].Value'`
# hostname $HOSTNAME
# hostname > /etc/hostname

Load balancing in EC2 with Nginx and HAProxy

We wanted to setup a loadbalanced web cluster in AWS for expansion. My first inclination was to use ELB for this, but I soon learned that ELB doesn’t let you allocate a static IP, requiring you to refer to it only by DNS name. This would be OK except for the fact that our current DNS provider, Dyn, requires IP addresses when using their GSLB (geo-based load balancer) service.

Rather than let this derail the whole project, I decided to look into the software options available for loadbalancing in EC2. I’ve been a fan of hardware load balancers for a while, sort of looking down at software-based solutions without any real rationale, but in this case I really had no choice so I figured I’d give it a try.

My first stop was Nginx. I’ve used it before in a reverse-proxy scenario and like it. The problem I had with it was that it doesn’t support active polling of nodes – the ability to send requests to the webserver and mark the node as up or down based on the response. As far as I can tell, using multiple upstream servers in Nginx allows you to specify max_fails and fail_timeout, however a “fail” is determined when a real request comes in. I don’t want to risk losing a real request – I like active polling.
Continue reading “Load balancing in EC2 with Nginx and HAProxy”

Installing Sun (Oracle) JDK 1.5 on an EC2 instance

I’m currently working on moving a Tomcat-based application into EC2. The code was written for Java 5.0. While Java 6 would probably work, I’d like to keep everything as “same” as possible, since EC2 presents its own challenges. I spun up a couple of t1.micro instances and copied everything over, including the Java 5 JDK, jdk-1_5_0_22-linux-amd64.rpm. Installing from RPM was easy, but the EC2 instance defaults to using OpenJDK 1.6:

[root@ec2 ~]# java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.10) (amazon-52.1.9.10.40.amzn1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

There were a couple of things I had to do to get the system to accept the Sun JDK as its “real” java.

Alternatives

Red Hat’s “alternatives” system is designed to allow a system to have multiple versions of a program installed and make it easy to choose which one you want to run. Unfortunately I’ve found the syntax a bit strange and always have to Google it, so I figured I’d document it here for posterity.

So here’s the default:

[root@ec2 ~]# alternatives --config java

There is 1 program that provides 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number: 

Here’s how to add Sun java, assuming the java binary is in /usr/java/jdk1.5.0_22/jre/bin/java (where the RPM puts it).

[root@ec2 ~]# alternatives --install /usr/bin/java java /usr/java/jdk1.5.0_22/jre/bin/java 1
[root@ec2 ~]# alternatives --config java
There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
   2           /usr/java/jdk1.5.0_22/jre/bin/java

Enter to keep the current selection[+], or type selection number: 2
[root@ec2 ~]# java -version
java version "1.5.0_22"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_22-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_22-b03, mixed mode)

Yay! Unfortunately this doesn’t help with the other problem I had with Tomcat, which was that EC2 instances set the JAVA_HOME var to OpenJDK as well (/usr/lib/jvm/jre). Fortunately this is an easy fix as well.

Setting JAVA_HOME

The JAVA_HOME var is set in /etc/profile.d/aws-apitools-common.sh. Comment out this line:

export JAVA_HOME=/usr/lib/jvm/jre

Create a new file, /etc/profile.d/sun-java.sh, and put this in it:

export JAVA_HOME=/usr/java/jdk1.5.0_22/jre

Also in that file I added the following to instruct the JVM to process all dates in America/New_York, since that’s the timezone all of our other servers use, and it makes reading log files easier when all dates are in the same tz:

export TZ=America/New_York

(I found I had to do this even after pointing /etc/localtime to the correct zoneinfo – Java was stuck on UTC even after the rest of the system was using America/New_York.)

Integrating Amazon Simple Email Service with postfix for SMTP smarthost relaying.

So, we’ve outgrown the 500 outbound messages/day limit imposed by Google Apps’s Standard tier. A wise friend suggested SendGrid, but I figured it was worth looking into what options Amazon provides. I found SES and am in the process of setting it up. Hopefully I can set it up as a drop-in replacement, obviating the need for code changes to use it. SES is attractive for us because:

Free Tier
If you are an Amazon EC2 user, you can get started with Amazon SES for free. You can send 2,000 messages for free each day when you call Amazon SES from an Amazon EC2 instance directly or through AWS Elastic Beanstalk. Many applications are able to operate entirely within this free tier limit.

Note: Data transfer fees still apply. For new AWS customers eligible for the AWS free usage tier, you receive 15 GB of data transfer in and 15 GB of data transfer out aggregated across all AWS services, which should cover your Amazon SES data transfer costs. In addition, all AWS customers receive 1GB of free data transfer per month.

Free to try? Sounds good.

After signing up, the first thing I did was download the Perl scripts. Create a credentials file with your AWS access key ID and Secret Key (credentials can be found here when logged in). The credentials file (aws-credentials) should look like this:

AWSAccessKeyId=022QF06E7MXBSH9DHM02
AWSSecretKey=kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct

Make sure to chmod 0600 aws-credentials. To ensure it’s working, run:

$ ./ses-get-stats.pl -k aws-credentials -s

If it doesn’t return anything it should be working correctly.

Next, you need to add at least one verified email address:

$ ./ses-verify-email-address.pl -k aws-credentials --verbose -v support@example.com

Amazon will send a verification message to support@example.com with a link you need to click to verify the address. Once you click, it’s verified. It’s important to note that initially your account will only be able to send email to verified addresses. According to this thread, you need to submit a production access request to send to unverified To: addresses. I did this and got my “approval” email about 30 minutes later.

To send a test email:

$ ./ses-send-email.pl --verbose -k aws-credentials -s "Test from SES" -f support@example.com evan@example.com
This is a test message from SES.

(Press ctrl-D to send.)

The next step is integrating the script with sendmail/postfix. The first thing I did was move my scripts to /opt/ (out of /root/) and attempt to run them with absolute pathnames (rather than ./ses-send-email.pl) and I got perl @INC errors:

[root@web2 ~]$ mv amazon-email/ /opt/
[root@web2 ~]$ /opt/ses-get-stats.pl -k aws-credentials -s
-bash: /opt/ses-get-stats.pl: No such file or directory
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Can't locate SES.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /opt/amazon-email/ses-get-stats.pl line 23.
BEGIN failed--compilation aborted at /opt/amazon-email/ses-get-stats.pl line 23.

The problem is that SES.pm isn’t in perl’s include path. To solve this, I tried adding the directory to the PERL5LIB environment var:

[root@web2 amazon-email]$ PERL5LIB=/opt/amazon-email/
[root@web2 amazon-email]$ echo $PERL5LIB
/opt/amazon-email/
[root@web2 amazon-email]$ cd
[root@web2 ~]$ export PERL5LIB
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Cannot open credentials file . at /opt/amazon-email//SES.pm line 54.
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -s
Timestamp               DeliveryAttempts        Rejects Bounces Complaints
2011-04-27T20:27:00Z    1                       0       0       0
[root@web2 ~]$

This worked for setting all users’ PERL5LIB … but didn’t allow postfix to send the message. After a couple more attempts at doing this “the right way,” I just ended up dropping a symlink to SES.pm in /usr/lib/perl5/site_perl and the @INC error went away.

After following Amazon’s instructions for editing main.cf and master.cf, I still was unable to send mail through Postfix, even though I could send directly through the perl scripts. I kept getting this error:

Apr 28 11:26:32 web2 postfix/pipe[27226]: A2AD33C9A6: to=, relay=aws-email, delay=0.35, delays=0.01/0/0/0.34, dsn=5.3.0, status=bounced (Command died with status 1: "/opt/amazon-email/ses-send-email.pl". Command output: Missing final '@domain' )

Google led me to this blog post which led me to this other blog post which illuminated the problem: apparently the Postfix pipe macro ${sender} uses the user@hostname of the mail sender. Since the hostname of an EC2 machine is usually something crazy like dom11-22-33-44.internal, this is not likely a validated sending email address. So the solution proposed by Ben Simon was to create a regex to map user@internal to user@realdomain.com and have postfix map everything. This didn’t work for me or the bashbang.com guys, who changed it to map from user@internal to validuser@realdomain.com. I found that you can eliminate the need for the mapping entirely by changing the master.cf entry to this:

  flags=R user=mailuser argv=/opt/amazon-email/ses-send-email.pl -r -k /opt/amazon-email/aws-credentials -e https://email.us-east-1.amazonaws.com -f support@example.com ${recipient}

The only difference between the above line and Amazon’s suggestion is that this replaces “-f ${sender}” with “support@example.com” which is a validated email address.

After this I was able to relay email successfully through SES. Whew!

Update 5/26/2011: We’ve been relaying through SES without issues for a few weeks now. I recently ran ses-get-stats.pl to see how many messages we’re actually sending and it’s a lot lower than expected. I’m still glad we moved to SES though, since it has no hard cap like Google Apps does:

$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -q
SentLast24Hours Max24HourSend   MaxSendRate
317             10000           5

Relaying through Google Apps using Sendmail to bypass EC2 spam blockage

Update 3 May 2011: I’ve subsequently modified our EC2 systems to relay SMTP mail through Amazon’s SES which doesn’t have the 500 messages per day limit that Google Apps does.

A few months ago I moved a site into EC2. I didn’t want to move the existing IMAP server (ugh) so I moved the email to Google Apps. There are only about 10 mailboxes so we went with “Standard” edition (free). Once we completed the move to EC2 we discovered that emails from our webserver were bouncing due to our EC2 IP address being listed in a spam RBL. This sucked, so I looked into relaying the mail from the EC2 webserver through our Google Apps account. Fortunately this turned out to be pretty easy.

This wiki page on scalix.com has a procedure for setting up SMTP relaying in Ubuntu with TLS & auth. I’m not running Ubuntu so the paths were different but it was basically the same procedure:

  • Create the file /etc/mail/client-info with these contents: AuthInfo:smtp.gmail.com "U:bounces@example.com" "I:bounces@example.com" "P:superpassword", where “example.com” is your Google Apps domain, “bounces” is a valid account, and the password is the account’s password. Mail relayed with these credentials will show “bounces@example.com” in the From: field of the message.
  • In /etc/mail, run makemap hash client-info < client-info
  • Edit /etc/mail/sendmail.mc, adding or uncommenting these lines:
    define(`SMART_HOST', `smtp.gmail.com')dnl
    define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
    FEATURE(`authinfo', `hash /etc/mail/client-info')dnl
    
  • Recompile sendmail.cf: m4 sendmail.mc > sendmail.cf . I got this error: “/etc/mail/sendmail.mc:10: m4: Cannot open /usr/share/sendmail-cf/m4/cf.m4” when running the command, but I resolved it by doing yum install sendmail-cf
  • Restart sendmail.

Once this was done I sent myself a test message from the command line and received it; I checked the SMTP headers and sure enough it went through Google’s mail server. One nice side effect is that all the mail sent by the webserver appears in the “Sent” folder for the Google Apps username provided in the client-info file. Hopefully this will resolve the spam issues, since the mail is now coming from Google’s IP block.

Amazon EC2 – ext3 mkfs takes 30+ minutes?

I’ve been playing around with Amazon EC2 for a new project I’m working on and so far I’m really impressed. One thing I’ve noticed, however, is that it takes forever to create an ext3 filesystem on a new volume. For example, the below command took over 30 minutes to create the filesystem on a 300 GB volume:

# mke2fs -j -m0 /dev/sdf1
mke2fs 1.40.4 (31-Dec-2007)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
39321600 inodes, 78642183 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2400 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

It took about 30 seconds do do everything up to the writing of the superblocks. Not sure why this takes so long, but it’s happened for every EBS volume I’ve formatted ext3. Annoying. Initially I thought it was hanging, and ended up terminating an instance that wouldn’t shutdown or let me cancel the operation. The terminated instance is still being displayed in the UI with a status of “terminated” and I can’t find any way to remove it from the list.