SemrushBot is an annoying web crawler that has proven to completely disregard the robots policies as well as hammering webservers hard by recursively following all the links on a website without delay and outright ignoring any repeating 403 Forbidden
error messages.
Folklore claims that SemrushBot helps your site generate revenue from ads but the question is whether that revenue outweighs the money spent accommodating SemrushBot's rampant behaviour that yields a morbidly increased server load.
On the IP layer:
iptables -t mangle -A INPUT -p tcp --dport 80 -m string --string 'SemrushBot' -j DROP
Which is an awful solution to get rid of this pest without even hitting the application layer!
If are okay with your frontend being hammered by this total garbage, then the SemrushBot
user agent can be blocked in Apache2.
Enable the rewrite
module:
a2enmod rewrite
and include in virtual hosts:
<IfModule mod_rewrite.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR] RewriteCond %{HTTP_USER_AGENT} sosospider [NC,OR] RewriteCond %{HTTP_USER_AGENT} BaiduSpider [NC] # Allow access to robots.txt and forbidden message # at least 403 or else it will loop RewriteCond %{REQUEST_URI} !^/robots\.txt$ RewriteCond %{REQUEST_URI} !^/403\.shtml$ RewriteRule ^.* - [F,L] </IfModule>
which is a bad solution because Forbidden
is meaningless to the greatness that is SemrushBot
.
Perhaps blocking with Varnish may be a good compromise between having your Apache2 hammered and blocking the string SemrushBot
on the IP layer:
sub vcl_recv { # Block user agents. if (req.http.User-Agent ~ "SemrushBot") { return (synth(403, "Forbidden")); } # ... }
An even better method would be to use fail2ban to block SemrushBot
by reading Varnish logs on the frontend or Apache2 log files on the backend which will prevent either of them to get hammered with requests.
For Varnish, copy /etc/fail2ban/filter.d/apache-badbots.conf
to /etc/fail2ban/filter.d/varnish-badbots.conf
thereby duplicating the Apache2 configuration (this works due to NCSA log format) and edit /etc/fail2ban/filter.d/varnish-badbots.conf
to add SemrushBot
to the list of custom bad bots:
badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|SemrushBot
then correct the failregex
line to:
failregex = ^<HOST> -.*(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s).*?$
and finally add the following to the jail configuration:
[varnish-badbots] enabled = true port = http,https filter = varnish-badbots logpath = /var/log/varnish/varnishncsa.log maxretry = 1
and restart fail2ban
.
To check that the bots are being banned, tail /var/log/syslog
and look for:
fail2ban.jail[18168]: INFO Jail 'varnish-badbots' started
indicating that the varnish-badbots
jail has started.
Hopefully followed by lines similar to:
NOTICE [varnish-badbots] Ban 46.229.168.68
Redirect SMTPs to custom port for SSLsniff:
iptables -t nat -A PREROUTING -p tcp --destination-port 995 -j REDIRECT --to-ports 4995 sslsniff -a -c /usr/share/sslsniff/certs/wildcard -s 4995 -w /dev/stdout
Example Session:
1385227016 INFO sslsniff : Added OCSP URL: ocsp.ipsca.com 1385227016 INFO sslsniff : Certificate Ready: * sslsniff 0.8 by Moxie Marlinspike running... 1385227031 DEBUG sslsniff : Read from Server (mail.net.hu) : +OK POP3 PROXY server ready <7575E80698581E88C26B60701C2C67717034A020@smtp.mail.net.hu> 1385227032 DEBUG sslsniff : Read from Client (mail.net.hu) : USER harry 1385227032 DEBUG sslsniff : Read from Server (mail.net.hu) : +OK Password required 1385227032 DEBUG sslsniff : Read from Client (mail.net.hu) : PASS secretpassword
After successive updates and corrections, the password database might not get updated such that scanning /etc/passwd
for users reveals multiple variants of the /usr/sbin/nologin
shell such as:
systemd-coredump:x:998:998:systemd Core Dumper:/:/sbin/nologin rslsync:x:999:999::/home/rslsync:/sbin/nologin sshd:x:107:65534::/var/run/sshd:/usr/sbin/nologin
where the correct answer seems to be /usr/bin/nologin
and as one might imagine, the other file paths do not even exist leaving an opportunity for an attacker to slide a shell into place.
A solution to batch-change the shell for all users that have no-login shells, correcting the path, would be the following:
for i in `cat /etc/passwd | grep nologin | awk -F ':' '{ print $1 }' | xargs`; do usermod -s /usr/sbin/nologin "$i"; done