HostSEO Blog

Stories and News from IT Industry, Reviews & Tips | Technology Blog


A GUIDE TO BLOCKING BAD BOTS WITH .HTACCESS FILES

One of the issues facing all webmasters is bad bots. Whether it’s comment spam, drive-by hacking attempts, or DDoS attacks, you’ve probably seen the issues some automated traffic can cause.

In this blog post, we’ll be delving into an easy way of stopping common bad bots, using .htaccess files and mod_rewrite. If you’re using the Apache web server, an afternoon of setting up a hardened .htaccess file can save you many headaches down the road.

If you’re not already aware, a .htaccess file is a hidden file (hence the dot in front of it) that gives Apache web servers instructions on how to handle traffic hitting the folder it lives in, and folders below it. It’s a plain text file, which you can just create in a folder.

BLOCKING BAD USER AGENTS

First off, we might want to block some generic bad bots, or user agents clearly indicative of an automated program. Here’s how we do that:

Usually, if a bot’s developer doesn’t bother changing their bot’s user agent from the default, they’re up to no good. You’ll commonly see these kinds of bots probing for phpmyadmin, for example. But we can do more.

INTRO TO BLOCKING HTTP HEADERS

Many bots use valid HTTP user agents, masquerading as a legitimate web browser. Fortunately for us, many of them are still based on the same automated libraries, and often get their HTTP headers slightly wrong, or send different ones from what a human would send. It’s hard to filter these because the same goes for legitimate, good bots (like Google), but let’s block the ones we can:

ADVANCED BLOCKING: WORDPRESS

The next part of this guide assumes you’re running WordPress. It can be adapted to any other software (you should seriously think about doing so!), and it’s some of the most effective filtering in this entire guide. Unfortunately, we can’t account for all software.

The following assumes the wp-login.php lives in the same folder as the .htaccess file you’re creating:

BONUS ROUND: BLOCK HTTP/1.0

HTTP/1.0 is an old version of the HTTP protocol. Humans haven’t used it since the days of netflix, but many bots, both good and bad, still do. Common search engines like Google tend not to. We can turn this to our advantage, but it needs to be done carefully, and tested extensively, as it can block some good bots, or have false positives on servers using a proxy in front of Apache.

If you feel daring, uncomment the version of this rule you prefer:

Adapting these rules to your own software and website setup can drastically cut down on comment spam, and even help protect your website from hacking. It’s not a panacea, but it’ll help make life a little easier.

Subscribe Now

10,000 successful online businessmen like to have our content directly delivered to their inbox. Subscribe to our newsletter!

Archive Calendar

SatSunMonTueWedThuFri
 12345
6789101112
13141516171819
20212223242526
27282930 

Born in 2004 ... Trusted By Clients n' Experts

SEO Stars

They never made me feel silly for asking questions. Help me understand how to attract more people and improve my search engine ranking.

Read More

Emily Schneller Manager at Sabre Inc
SEO Stars

Took advantage of Hostseo's superb tech support and I must say, it is a very perfect one. It is very fast, servers reliability is incredible.

Read More

Leena Mäkinen Creative producer
SEO Stars

We're operating a worldwide network of servers with high quality standards requirements, we’ve choose hostseo to be our perfect partner.

Read More

Ziff Davis CEO at Mashable
SEO Stars

It’s very comfortable to know I can rely about all technical issues on Hostseo and mostly that my website and emails are safe and secured here.

Read More

Isaac H. Entrepreneur
SEO Stars

With hostseo as a hosting partner we are more flexible and save money due to the better packages with great pricing, free SEO n' free SSL too!

Read More

Madeline E. Internet Professional