Author |
Message |
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
Posted: Tue Apr 07, 2009 3:00 pm Post subject: Stupid Bots 0.2.0 |
|
|
Stupid Bots is a term I've used for awhile which encompasses non-human visitors to my web site, and generally visitors who are NOT search engines which I would use, such as Google, Yahoo, Cuil and MSN.
Stupid bots could be email harvesters, spam bots trying to sign up accounts, people trying the latest open source software vulnerability, and other general stupidity.
After examining my access logs by eye for the past few months, and most recently, after having written a PHP script to filter my logs, I've made a few conclusions and have put methods into place to block these so-called Stupid Bots.
The point here is to prevent them from sucking down my site's bandwidth. I don't care about blocking spam or RFI or vulnerabilities, because there are none which are ever successful at the moment. However, that could change any time, and I'm sure it's not the case for everyone! The whole purpose of Stupid Bots is to show a nice white page with a one-line error message and 403/400 HTTP error which takes only a few hundred bytes of bandwidth.
Here's what I'm blocking:
- All HTTP/1.0 connections which are not Yahoo or Twiceler
- Foreign characters in the path info, such as asterisk, colon, and semi-colon, since these are used a lot in RFI and related annoyances.
- Proxy connections
- I've compiled a nice list of nasty user agents which these bots use, so any user agent which matches will be blocked.
In addition, though it will not be released here, I have a timer put in place. Since my robots.txt file forbids crawling faster than 1 page every 8 seconds, I'm blocking all bots which access more than a certain number of pages in a certain number of seconds. This has also been successful, as all repeated connections get a 403 error for 5 minutes.
Here are some samples from my access log. Everything blocked by Stupid Bots will give either a 400 or 403 HTTP error, so that's what to look for.
Blocking HTTP/1.0 Connections
Note that Twiceler is using HTTP/1.0, but I allow it in since it is a legit search robot.
Quote: |
194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Opera/7.54 (Windows NT 5.1; U) [pl]"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Opera/7.54 (Windows NT 5.1; U) [pl]"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 2.0.0 Beta 1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 2.0.0 Beta 1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; KKman2.0)"
194.8.75.145 - - [06/Apr/2009:20:12:42 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
208.36.144.7 - - [06/Apr/2009:20:12:45 -0400] "GET /downloads/?mode=download&file_id=185 HTTP/1.0" 302 0 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
|
Blocking RFI attempts
It wouldn't have worked anyway, but at least we are not wasting bandwidth on this connection
Quote: |
98.238.87.86 - - [06/Apr/2009:20:22:26 -0400] "GET /gallery/showphoto.php?pic_id=http://mypregnancy.hostinginfive.com/index.html? HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
98.238.87.86 - - [06/Apr/2009:20:22:29 -0400] "GET /gallery/showphoto.php?pic_id=http://mypregnancy.hostinginfive.com/index.html? HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
|
Blocking spammers
Note the improper use of the HTML entity for the ampersand. A real browser would send the ampersand, not the entity.
Quote: |
140.239.56.38 - - [06/Apr/2009:18:32:42 -0400] "GET /mygui/join.php HTTP/1.1" 200 14812 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
140.239.56.38 - - [06/Apr/2009:18:32:42 -0400] "GET /mygui/user.php?mode=visicap&id=e295662e7d5c4ae65bb4147eee07e42a HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible;)"
140.239.56.38 - - [06/Apr/2009:18:32:45 -0400] "POST /mygui/join.php HTTP/1.1" 200 15222 "http://macgui.com/mygui/join.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
140.239.56.38 - - [06/Apr/2009:18:32:45 -0400] "GET /mygui/user.php?mode=visicap&id=b10e88832adaee473f927e2b19baff97 HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible;)"
|
It is important to note that Stupid Bots blocks, not bans! It does not need a database connection, and takes effect quite early, before most include files have been loaded in common.php. Overall, it should work silently in the background, giving short error lines to the stupid bots, while the legitimate search engines and human users go about their daily business.
Download version 0.2.0
(14 Apr 09)
http://dserver.macgui.com/Stupid_bots_0_2_0.zip _________________ Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield IL • Mac 512K Blog • Mac GUI |
|
Back to top |
|
|
Jim_UK Board Member
Joined: 19 Nov 2008
Posts: 656 Location: North West UK
|
Posted: Tue Apr 07, 2009 3:16 pm Post subject: Re: Stupid Bots 0.0.1 |
|
|
Dog Cow wrote: | Or maybe it's trying to tell me something... |
Oh no!!!! Dog Cow is really a bot.
I will be interested in your results as like many I suspect that most of my bandwidth is sucked up by bots and Mod_security log often shows a full page of attempted exploits during the course of a single day.
Now this is what I call a useful mod.
Jim |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
Posted: Tue Apr 07, 2009 3:32 pm Post subject: Re: Stupid Bots 0.0.1 |
|
|
Jim_UK wrote: | Dog Cow wrote: | Or maybe it's trying to tell me something... |
Oh no!!!! Dog Cow is really a bot.
|
Well, I found out that the real Internet Explorer identifies itself too similarly to a user agent which I have on the naughty list. The problem is when bot writers get really smart and start making the bots identify themselves with user agents identical to FireFox, IE or another common browser. For now, we are fortunate, because the majority of bots have a similar user agent, but slip up in some place, so we can spot and block them.
Quote: | like many I suspect that most of my bandwidth is sucked up by bots |
That's what really annoyed me and made want to make this thing!
Quote: | Now this is what I call a useful mod. |
I'll try and hurry it up, then! _________________ Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield IL • Mac 512K Blog • Mac GUI |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
|
Back to top |
|
|
Jim_UK Board Member
Joined: 19 Nov 2008
Posts: 656 Location: North West UK
|
Posted: Wed Apr 08, 2009 2:10 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
I will add it and watch for complaints from users and also monitor incidents of attempted site/server exploit.
Many of the attempts have multiple /////////// in the request.
Jim
Quote: | Fatal error: Call to undefined function: stripos() in /home/controll/public_html/phpBB2/includes/stupid_bots.php on line 94 |
Quote: | if ( stripos(trim(strtolower($_SERVER['HTTP_USER_AGENT'])), $bot) !== false ) |
PHP Version 4.4.4
Jim |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
|
Back to top |
|
|
Ptirhiik Board Member
Joined: 19 Nov 2008
Posts: 114
|
Posted: Wed Apr 08, 2009 6:10 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
preg_match() is very efficient too . |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
|
Back to top |
|
|
Ptirhiik Board Member
Joined: 19 Nov 2008
Posts: 114
|
Posted: Thu Apr 09, 2009 2:47 am Post subject: Re: Stupid Bots 0.1.0 |
|
|
Yep. Don't forget the preg_quote() also: you need it. |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
|
Back to top |
|
|
Ptirhiik Board Member
Joined: 19 Nov 2008
Posts: 114
|
Posted: Thu Apr 09, 2009 1:55 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
They are often - but not systematically - right. However, double trim plus double strtolower plus strpos versus one preg_match without joker... |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
Posted: Thu Apr 09, 2009 5:01 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
Some notes for this week:
- It looks like Majestic12 (MJ12bot) is a good search engine, but there's actually a malicious bot which identifies itself as MJ12bot. This will be addressed in the next version of Stupid Bots.
- I will be replacing the strpos stuff with preg_match()
- Stupid Bots appears to have saved 10 MB of bandwidth over two days, as compared to days with similar amount of hits. This figure would no doubt be exponentially larger if my site were exponentially more popular. _________________ Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield IL • Mac 512K Blog • Mac GUI |
|
Back to top |
|
|
Jim_UK Board Member
Joined: 19 Nov 2008
Posts: 656 Location: North West UK
|
Posted: Thu Apr 09, 2009 5:26 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
Could that array of bad bots not be separate and read by that PHP file so that there would be no editing - just uploading a new text file periodically?
Jim |
|
Back to top |
|
|
Dog Cow Board Member
Joined: 18 Nov 2008
Posts: 378
|
Posted: Thu Apr 09, 2009 5:31 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
Jim_UK wrote: | Could that array of bad bots not be separate and read by that PHP file so that there would be no editing - just uploading a new text file periodically?
Jim |
Yes, but a .php file would be even better. However, opening another file adds a bit of overhead to generating the page.
I'll probably add it to a future version here, but not to the one I use on my site. _________________ Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield IL • Mac 512K Blog • Mac GUI |
|
Back to top |
|
|
Jim_UK Board Member
Joined: 19 Nov 2008
Posts: 656 Location: North West UK
|
Posted: Fri Apr 10, 2009 1:55 pm Post subject: Re: Stupid Bots 0.1.0 |
|
|
The reason that I say that is that it would not mean a constant updating of the mod id if it was a separate file that could be obtained.
Jim |
|
Back to top |
|
|
|