Blocking comment spammers by IP

I use Akismet to block comment spam, but it still annoys me that it even exists. Last night I put a simple IP ban into my httpd config. But who to block?

I used a grep & Perl to get a rough guess of which IPs were submitting the most comments (working on the assumption that one IP address submits many spam comments) It took me about 20 minutes to write this mess but it does what I wanted to do:

[root@lunix ~]# zgrep POST /var/log/httpd/evanhoffman-access_log-201008??.gz | grep comment | perl -ne 'chomp; $_ =~ m/(?:\d{1,3}\.){3}\d{1,3}/; print "$&\n";' | perl -e '%a = (); while (<>) { chomp; $a{$_} += 1; } while (my ($key, $value) = each (%a)) { if ($value > 1) { print "$value\t=>\t$key\n";}}'
2 =>
180 =>
2 =>
2 =>
[root@lunix ~]#

That’s pretty hard to read. Here’s a quick explanation of each piece:

zgrep POST /var/log/httpd/evanhoffman-access_log-201008??.gz

Use zgrep to search for the string “POST” in all of the gzipped Apache logs for August. Pipe the results (the matching lines) to the next part:

grep comment

grep for the string “comment”. This isn’t really scientific, but I feel safe in assuming that if “POST” and “comment” both appear in the HTTP request, it’s probably someone posting a comment. Pipe the matches to…

perl -ne ‘chomp; $_ =~ m/(?:\d{1,3}\.){3}\d{1,3}/; print “$&\n”;’

This is a perl one-liner that uses a regular expression to match an IP address in a given line and print it out. The original regex I used was \d+\.\d+\.\d+\.\d+, this one was slightly fancier but did the same work in this case. It’s worth noting that this will only print out the first match in the given line, but since the requester’s IP (REMOTE_ADDR) is the first field in Combined Log Format, that’s fine this case.

The output (the IPs from which comment posts have been made) is piped to…

perl -e ‘%a = (); while (<>) { chomp; $a{$_} += 1; } while (my ($key, $value) = each (%a)) { if ($value > 1) { print “$value\t=>\t$key\n”;}}’

This is another perl one-liner. Basically, it maintains a hash of String=>count pairs, so each time it sees a string it increments a “counter” for that line. Then when it’s done receiving input (i.e. all the data has been processed) it prints out the contents of the hash for keys that have a value > 1 (i.e. IPs that have POSTed more than 1 comment).

The output shows pretty clearly where the spam is coming from:

2 =>
180 =>
2 =>
2 =>

180 submits from Out of curiosity I looked up that IP in whois:

[root@lunix ~]# whois
% This is the RIPE Database query service.
% The objects are in RPSL format.
% The RIPE Database is subject to Terms and Conditions.
% See

% Note: This output has been filtered.
%       To receive output for a database update, use the "-B" flag.

% Information related to ' -'

inetnum: -
netname:        Donekoserv
descr:          DonEkoService Ltd
country:        RU
org:            ORG-DS41-RIPE
admin-c:        MNV32-RIPE
tech-c:         MNV32-RIPE
status:         ASSIGNED PI
mnt-by:         RIPE-NCC-END-MNT
mnt-by:         MNT-DONECO
mnt-by:         MNT-DONECO
mnt-lower:      RIPE-NCC-END-MNT
mnt-routes:     MHOST-MNT
mnt-routes:     MNT-PIN
mnt-domains:    MHOST-MNT
source:         RIPE # Filtered

organisation:   ORG-DS41-RIPE
org-name:       DonEko Service
org-type:       OTHER
address:        novocherkassk, ul stremyannaya d.6
mnt-ref:        MNT-PIN
mnt-by:         MNT-PIN
source:         RIPE # Filtered

person:         Metluk Nikolay Valeryevich
address:        korp. 1a 40 Slavy ave.,
address:        St.-Petersburg, Russia
phone:          +7 812 4483863
fax-no:         +7 901 3149449
nic-hdl:        MNV32-RIPE
mnt-by:         MNT-PIN
source:         RIPE # Filtered

% Information related to ''

descr:          Route MHOST IDC
origin:         AS21098
mnt-by:         MHOST-MNT
source:         RIPE # Filtered

[root@lunix ~]#

Not much info other than the IP is based in Russia. Well, anyway, I IP blocked (sorry, Russia), so if you’re in that subnet you’re probably seeing a 403 now.

Edit: It occurred to me that I can accomplish the same thing while being less draconian if I wrap the Deny in a <Limit></Limit> clause. This way everyone can still see the site but certain IP ranges won’t be able to POST anything:

Order Allow,Deny
Allow from all
Deny from 218.6.9.
Deny from 173.203.101.
Deny from 122.162.28.
Deny from 91.
Deny from 213.5

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: