Reorganizing photos in 1 line with exiftool

A few years ago I wrote a utility in Java to find all JPG files in a directory and move them into a date-based directory structure like /YYYY/MM/DD/ based on the date the photo was taken, extracted from the exif metadata in the file. Well, apparently that was a huge waste of time, as I just discovered that exiftool, an awesome perl utility I’ve used for years to edit/extract the metadata on the command line, can also do this natively. So my entire program can be replaced with this simple command:

$ exiftool -r '-FileName<CreateDate' -d /targetDir/%Y/%Y-%m/%Y-%m-%d/%Y-%m-%d.%%f.%%e /media/EOS_DIGITAL/

This will copy the files directly off the SD card mounted at /media/EOS_DIGITAL/ into the proper structure in /targetDir/.

Graphing SSH dictionary attacks with HighCharts

After my 10-year-old basement Linux server died this week from a power outage, I took the sad step of giving up on it. It’s died before and I’ve patched it back together with a new power supply here or an addon PCI SATA card there, but I finally decided to throw in the towel since I had a newer old computer that had been idle for several years. The one that died was an Athlon K7 750 MHz with 512 MB ram. The new one is an Athlon 2 GHz (3200+) with 1 gig. For my uses, specs don’t really matter that much, but it’s nice to have more power for free.

I put CentOS 6 on it and configured Samba and copied all the data off the old machine and was back up and running within a few hours. Since I forward ports through my FiOS router to this box I did my standard lockdown procedure, including adding myself to the AllowUsers in sshd_config. Afterwards I took a look in /var/log/secure and saw the typical flood of dictionary attacks trying to get in as root or bob or tfeldman or jweisz. I have iptables configured to rate-limit SSH connections to 2 per 5 seconds per IP so the box doesn’t get DoSed out of existence, but some stuff does make it through to sshd.

Looking through /var/log/secure, I got to thinking it would be interesting if there was some way to visualize the attacks in a handy graph. Then I remembered, oh, wait, I can do that.

I wrote a perl script to parse out the attacks from /var/log/secure and insert them into a Postgres DB. This turned out to be pretty easy. Then I thought it would be more interesting to tie the IP of each attack to its originating country. I’ve used MaxMind’s GeoIP DB pretty extensively before, but I was looking something free. That’s when I remembered that MaxMind has a free GeoIP DB: GeoLiteCity. I grabbed it and yum-installed the Perl lib and added the geo data to the attack DB. Rather than worry about normalizing the schema I just shoved the info into the same table. Life is easier this way, and it’s just a for-fun project.

So I got that all working and parsed it against the existing /var/log/secures via

[root@lunix2011 ~]# zcat /var/log/secure-20111117.gz | perl parse-secure.pl 

I wrote ssh.php to see what’s in the table:

ssh.php list of hacking attempts
ssh.php list of hacking attempts

So now that the data was all in place, time to move on to the graphs, which is what I really wanted to do. Last time I wanted to graph data programmatically I used JPGraph, which does everything in PHP and is super versatile. But I wanted something… cooler. Maybe something interactive. A little Googling turned up Highcharts which is absolutely awesome, and does everything in JavaScript. I basically modified some of their example charts and pumped my data into them and got the charts below.

Pie chart of attacks grouped by country for the past 30 days:

Pie chart by country
Pie chart by country

Bar graph of attacks per day:

Bar graph of daily attacks
Bar graph of daily attacks

So, that’s that. Code is in github if anyone wants to play around with it. I’ve cronned parse-secure.pl to run every 5 minutes so the data gets updated automatically.

Logging RT username in Apache access_log

RT has its own internal accounting & tracking system for logging activity, but I was interested in even more granular stuff, like seeing who looked at which tickets. I figured it wouldn’t be that hard to log this in Apache. Well, I was kind of right, in that it wasn’t “hard,” but it took me a long time to find the right place to do it. I did finally get it though.
Continue reading “Logging RT username in Apache access_log”

Integrating Amazon Simple Email Service with postfix for SMTP smarthost relaying.

So, we’ve outgrown the 500 outbound messages/day limit imposed by Google Apps’s Standard tier. A wise friend suggested SendGrid, but I figured it was worth looking into what options Amazon provides. I found SES and am in the process of setting it up. Hopefully I can set it up as a drop-in replacement, obviating the need for code changes to use it. SES is attractive for us because:

Free Tier
If you are an Amazon EC2 user, you can get started with Amazon SES for free. You can send 2,000 messages for free each day when you call Amazon SES from an Amazon EC2 instance directly or through AWS Elastic Beanstalk. Many applications are able to operate entirely within this free tier limit.

Note: Data transfer fees still apply. For new AWS customers eligible for the AWS free usage tier, you receive 15 GB of data transfer in and 15 GB of data transfer out aggregated across all AWS services, which should cover your Amazon SES data transfer costs. In addition, all AWS customers receive 1GB of free data transfer per month.

Free to try? Sounds good.

After signing up, the first thing I did was download the Perl scripts. Create a credentials file with your AWS access key ID and Secret Key (credentials can be found here when logged in). The credentials file (aws-credentials) should look like this:

AWSAccessKeyId=022QF06E7MXBSH9DHM02
AWSSecretKey=kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct

Make sure to chmod 0600 aws-credentials. To ensure it’s working, run:

$ ./ses-get-stats.pl -k aws-credentials -s

If it doesn’t return anything it should be working correctly.

Next, you need to add at least one verified email address:

$ ./ses-verify-email-address.pl -k aws-credentials --verbose -v support@example.com

Amazon will send a verification message to support@example.com with a link you need to click to verify the address. Once you click, it’s verified. It’s important to note that initially your account will only be able to send email to verified addresses. According to this thread, you need to submit a production access request to send to unverified To: addresses. I did this and got my “approval” email about 30 minutes later.

To send a test email:

$ ./ses-send-email.pl --verbose -k aws-credentials -s "Test from SES" -f support@example.com evan@example.com
This is a test message from SES.

(Press ctrl-D to send.)

The next step is integrating the script with sendmail/postfix. The first thing I did was move my scripts to /opt/ (out of /root/) and attempt to run them with absolute pathnames (rather than ./ses-send-email.pl) and I got perl @INC errors:

[root@web2 ~]$ mv amazon-email/ /opt/
[root@web2 ~]$ /opt/ses-get-stats.pl -k aws-credentials -s
-bash: /opt/ses-get-stats.pl: No such file or directory
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Can't locate SES.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /opt/amazon-email/ses-get-stats.pl line 23.
BEGIN failed--compilation aborted at /opt/amazon-email/ses-get-stats.pl line 23.

The problem is that SES.pm isn’t in perl’s include path. To solve this, I tried adding the directory to the PERL5LIB environment var:

[root@web2 amazon-email]$ PERL5LIB=/opt/amazon-email/
[root@web2 amazon-email]$ echo $PERL5LIB
/opt/amazon-email/
[root@web2 amazon-email]$ cd
[root@web2 ~]$ export PERL5LIB
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k aws-credentials -s
Cannot open credentials file . at /opt/amazon-email//SES.pm line 54.
[root@web2 ~]$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -s
Timestamp               DeliveryAttempts        Rejects Bounces Complaints
2011-04-27T20:27:00Z    1                       0       0       0
[root@web2 ~]$

This worked for setting all users’ PERL5LIB … but didn’t allow postfix to send the message. After a couple more attempts at doing this “the right way,” I just ended up dropping a symlink to SES.pm in /usr/lib/perl5/site_perl and the @INC error went away.

After following Amazon’s instructions for editing main.cf and master.cf, I still was unable to send mail through Postfix, even though I could send directly through the perl scripts. I kept getting this error:

Apr 28 11:26:32 web2 postfix/pipe[27226]: A2AD33C9A6: to=, relay=aws-email, delay=0.35, delays=0.01/0/0/0.34, dsn=5.3.0, status=bounced (Command died with status 1: "/opt/amazon-email/ses-send-email.pl". Command output: Missing final '@domain' )

Google led me to this blog post which led me to this other blog post which illuminated the problem: apparently the Postfix pipe macro ${sender} uses the user@hostname of the mail sender. Since the hostname of an EC2 machine is usually something crazy like dom11-22-33-44.internal, this is not likely a validated sending email address. So the solution proposed by Ben Simon was to create a regex to map user@internal to user@realdomain.com and have postfix map everything. This didn’t work for me or the bashbang.com guys, who changed it to map from user@internal to validuser@realdomain.com. I found that you can eliminate the need for the mapping entirely by changing the master.cf entry to this:

  flags=R user=mailuser argv=/opt/amazon-email/ses-send-email.pl -r -k /opt/amazon-email/aws-credentials -e https://email.us-east-1.amazonaws.com -f support@example.com ${recipient}

The only difference between the above line and Amazon’s suggestion is that this replaces “-f ${sender}” with “support@example.com” which is a validated email address.

After this I was able to relay email successfully through SES. Whew!

Update 5/26/2011: We’ve been relaying through SES without issues for a few weeks now. I recently ran ses-get-stats.pl to see how many messages we’re actually sending and it’s a lot lower than expected. I’m still glad we moved to SES though, since it has no hard cap like Google Apps does:

$ /opt/amazon-email/ses-get-stats.pl -k /opt/amazon-email/aws-credentials -q
SentLast24Hours Max24HourSend   MaxSendRate
317             10000           5

Blocking comment spammers by IP

I use Akismet to block comment spam, but it still annoys me that it even exists. Last night I put a simple IP ban into my httpd config. But who to block?

I used a grep & Perl to get a rough guess of which IPs were submitting the most comments (working on the assumption that one IP address submits many spam comments) It took me about 20 minutes to write this mess but it does what I wanted to do:

[root@lunix ~]# zgrep POST /var/log/httpd/evanhoffman-access_log-201008??.gz | grep comment | perl -ne 'chomp; $_ =~ m/(?:\d{1,3}\.){3}\d{1,3}/; print "$&\n";' | perl -e '%a = (); while (<>) { chomp; $a{$_} += 1; } while (my ($key, $value) = each (%a)) { if ($value > 1) { print "$value\t=>\t$key\n";}}'
2 => 218.6.9.140
180 => 91.201.66.34
2 => 213.5.67.41
2 => 188.187.102.74
[root@lunix ~]#

That’s pretty hard to read. Here’s a quick explanation of each piece:

zgrep POST /var/log/httpd/evanhoffman-access_log-201008??.gz

Use zgrep to search for the string “POST” in all of the gzipped Apache logs for August. Pipe the results (the matching lines) to the next part:

grep comment

grep for the string “comment”. This isn’t really scientific, but I feel safe in assuming that if “POST” and “comment” both appear in the HTTP request, it’s probably someone posting a comment. Pipe the matches to…

perl -ne ‘chomp; $_ =~ m/(?:\d{1,3}\.){3}\d{1,3}/; print “$&\n”;’

This is a perl one-liner that uses a regular expression to match an IP address in a given line and print it out. The original regex I used was \d+\.\d+\.\d+\.\d+, this one was slightly fancier but did the same work in this case. It’s worth noting that this will only print out the first match in the given line, but since the requester’s IP (REMOTE_ADDR) is the first field in Combined Log Format, that’s fine this case.

The output (the IPs from which comment posts have been made) is piped to…

perl -e ‘%a = (); while (<>) { chomp; $a{$_} += 1; } while (my ($key, $value) = each (%a)) { if ($value > 1) { print “$value\t=>\t$key\n”;}}’

This is another perl one-liner. Basically, it maintains a hash of String=>count pairs, so each time it sees a string it increments a “counter” for that line. Then when it’s done receiving input (i.e. all the data has been processed) it prints out the contents of the hash for keys that have a value > 1 (i.e. IPs that have POSTed more than 1 comment).

The output shows pretty clearly where the spam is coming from:

2 => 218.6.9.140
180 => 91.201.66.34
2 => 213.5.67.41
2 => 188.187.102.74

180 submits from 91.201.66.34. Out of curiosity I looked up that IP in whois:

[root@lunix ~]# whois 91.201.66.34
[Querying whois.ripe.net]
[whois.ripe.net]
% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: This output has been filtered.
%       To receive output for a database update, use the "-B" flag.

% Information related to '91.201.64.0 - 91.201.67.255'

inetnum:        91.201.64.0 - 91.201.67.255
netname:        Donekoserv
descr:          DonEkoService Ltd
country:        RU
org:            ORG-DS41-RIPE
admin-c:        MNV32-RIPE
tech-c:         MNV32-RIPE
status:         ASSIGNED PI
mnt-by:         RIPE-NCC-END-MNT
mnt-by:         MNT-DONECO
mnt-by:         MNT-DONECO
mnt-lower:      RIPE-NCC-END-MNT
mnt-routes:     MHOST-MNT
mnt-routes:     MNT-PIN
mnt-domains:    MHOST-MNT
source:         RIPE # Filtered

organisation:   ORG-DS41-RIPE
org-name:       DonEko Service
org-type:       OTHER
address:        novocherkassk, ul stremyannaya d.6
e-mail:         admin@pinspb.ru
mnt-ref:        MNT-PIN
mnt-by:         MNT-PIN
source:         RIPE # Filtered

person:         Metluk Nikolay Valeryevich
address:        korp. 1a 40 Slavy ave.,
address:        St.-Petersburg, Russia
e-mail:         nm@internet-spb.ru
phone:          +7 812 4483863
fax-no:         +7 901 3149449
nic-hdl:        MNV32-RIPE
mnt-by:         MNT-PIN
source:         RIPE # Filtered

% Information related to '91.201.66.0/23AS21098'

route:          91.201.66.0/23
descr:          Route MHOST IDC
origin:         AS21098
mnt-by:         MHOST-MNT
source:         RIPE # Filtered

[root@lunix ~]#

Not much info other than the IP is based in Russia. Well, anyway, I IP blocked 91.0.0.0/8 (sorry, Russia), so if you’re in that subnet you’re probably seeing a 403 now.

Edit: It occurred to me that I can accomplish the same thing while being less draconian if I wrap the Deny in a <Limit></Limit> clause. This way everyone can still see the site but certain IP ranges won’t be able to POST anything:

<Limit POST PUT DELETE>
Order Allow,Deny
Allow from all
Deny from 218.6.9.
Deny from 173.203.101.
Deny from 122.162.28.
Deny from 91.
Deny from 213.5
</Limit>