A GUIDE TO BLOCKING BAD BOTS WITH .HTACCESS FILES
- Category : Server Administration
- Posted on : Mar 15, 2018
- Views : 3,142
- By : Radcliff S.
One of the issues facing all webmasters is bad bots. Whether it’s comment spam, drive-by hacking attempts, or DDoS attacks, you’ve probably seen the issues some automated traffic can cause.
In this blog post, we’ll be delving into an easy way of stopping common bad bots, using .htaccess files and mod_rewrite. If you’re using the Apache web server, an afternoon of setting up a hardened .htaccess file can save you many headaches down the road.
If you’re not already aware, a .htaccess file is a hidden file (hence the dot in front of it) that gives Apache web servers instructions on how to handle traffic hitting the folder it lives in, and folders below it. It’s a plain text file, which you can just create in a folder.
BLOCKING BAD USER AGENTS
First off, we might want to block some generic bad bots, or user agents clearly indicative of an automated program. Here’s how we do that:
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / ## Automated HTTP libraries RewriteCond %{HTTP_USER_AGENT} ^.*(dav.pm/v|libwww-perl|urllib|python-requests|python-httplib2|winhttp.winhttprequest|lwp-request|lwp-trivial|fasthttp|Go-http-client|Java|httplib|httpclient|Zend_Http_Client).*$ [NC] RewriteRule .* - [F,L] ## Commonly seen in DDoS attacks RewriteCond %{HTTP_USER_AGENT} ^.*(CtrlFunc|w00tw00t|Apachebench).*$ [NC] RewriteRule .* - [F,L] </IfModule> |
Usually, if a bot’s developer doesn’t bother changing their bot’s user agent from the default, they’re up to no good. You’ll commonly see these kinds of bots probing for phpmyadmin, for example. But we can do more.
INTRO TO BLOCKING HTTP HEADERS
Many bots use valid HTTP user agents, masquerading as a legitimate web browser. Fortunately for us, many of them are still based on the same automated libraries, and often get their HTTP headers slightly wrong, or send different ones from what a human would send. It’s hard to filter these because the same goes for legitimate, good bots (like Google), but let’s block the ones we can:
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / ## There is no user agent at all RewriteCond %{HTTP_USER_AGENT} ^\s*$ RewriteRule .* - [F,L] ## There is no host header RewriteCond %{HTTP_HOST} ^$ RewriteRule .* - [F,L] </IfModule> |
ADVANCED BLOCKING: WORDPRESS
The next part of this guide assumes you’re running WordPress. It can be adapted to any other software (you should seriously think about doing so!), and it’s some of the most effective filtering in this entire guide. Unfortunately, we can’t account for all software.
The following assumes the wp-login.php lives in the same folder as the .htaccess file you’re creating:
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / ## We can employ much more strict filtering on login & comment pages, which only humans should ever access. ## We don't need to worry about accidentally filtering good bots here. ## All modern human user agents should contain the string "Mozilla/5.0" RewriteCond %{THE_REQUEST} ^.*wp-login [OR] RewriteCond %{THE_REQUEST} ^.*wp-comment RewriteCond %{HTTP_USER_AGENT} !^.*Mozilla/5.*$ [NC] RewriteRule .* - [F,L] ## And if we're POSTing to these pages (i.e. clicking a submit button) we should have a referer, too. RewriteCond %{THE_REQUEST} ^.*wp-login [OR] RewriteCond %{THE_REQUEST} ^.*wp-comment RewriteCond %{REQUEST_METHOD} POST RewriteCond %{HTTP_REFERER} ^$ RewriteRule .* - [F,L] </ifModule> |
BONUS ROUND: BLOCK HTTP/1.0
HTTP/1.0 is an old version of the HTTP protocol. Humans haven’t used it since the days of netflix, but many bots, both good and bad, still do. Common search engines like Google tend not to. We can turn this to our advantage, but it needs to be done carefully, and tested extensively, as it can block some good bots, or have false positives on servers using a proxy in front of Apache.
If you feel daring, uncomment the version of this rule you prefer:
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / ## Block all HTTP/1.0 requests site-wide. ## RewriteCond %{THE_REQUEST} HTTP/1\.0$ ## RewriteRule .* - [F,L] ## OR, block all HTTP/1.0 POST requests site-wide (far less likely to break legitimate things) ## RewriteCond %{THE_REQUEST} HTTP/1\.0$ ## RewriteCond %{REQUEST_METHOD} POST ## RewriteRule .* - [F,L] </ifModule> |
Adapting these rules to your own software and website setup can drastically cut down on comment spam, and even help protect your website from hacking. It’s not a panacea, but it’ll help make life a little easier.
Categories
- cPanel Question 47
- cPanel Software Management 29
- cPanel Tutorials 13
- Development 29
- Domain 13
- General 19
- Linux Helpline (Easy Guide) 156
- Marketing 47
- MySQL Question 13
- News 2
- PHP Configuration 14
- SEO 4
- SEO 42
- Server Administration 84
- SSL Installation 54
- Tips and Tricks 24
- VPS 3
- Web Hosting 44
- Website Security 22
- WHM questions 13
- WordPress 148
Subscribe Now
10,000 successful online businessmen like to have our content directly delivered to their inbox. Subscribe to our newsletter!Archive Calendar
Sat | Sun | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
7 | 8 | 9 | 10 | 11 | 12 | 13 |
14 | 15 | 16 | 17 | 18 | 19 | 20 |
21 | 22 | 23 | 24 | 25 | 26 | 27 |
28 | 29 | 30 | 31 |
Recent Articles
-
Posted on : Sep 17
-
Posted on : Sep 10
-
Posted on : Aug 04
-
Posted on : Apr 01
Tags
- ts
- myisam
- vpn
- sql
- process
- kill
- tweak
- server load
- attack
- ddos mitigation
- Knowledge
- layer 7
- ddos
- webmail
- DMARC
- Development
- nginx
- seo vpn
- Hosting Security
- wireguard
- innodb
- exim
- smtp relay
- smtp
- VPS Hosting
- cpulimit
- Plesk
- Comparison
- cpu
- encryption
- WHM
- xampp
- sysstat
- optimize
- cheap vpn
- php-fpm
- mariadb
- apache
- Small Business
- Error
- Networking
- VPS
- SSD Hosting
- Link Building
- centos
- DNS
- optimization
- ubuntu