phpBB2Refugees.com Logo
Not affiliated with or endorsed by the phpBB Group

Register •  Login 

Continue the legacy...

Welcome to all phpBB2 Refugees!Wave Smilie

This site is intended to continue support for the legacy 2.x line of the phpBB2 bulletin board package. If you are a fan of phpBB2, please, by all means register, post, and help us out by offering your suggestions. We are primarily a community and support network. Our secondary goal is to provide a phpBB2 MOD Author and Styles area.

[BETA] Stupid Bots 0.2.0

Goto page 1, 2, 3  Next
 
Search this topic... | Search MOD Development... | Search Box
Register or Login to Post    Index » MOD Development  Previous TopicPrint TopicNext Topic
Author Message
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Tue Apr 07, 2009 3:00 pm 
Post subject: Stupid Bots 0.2.0

Stupid Bots is a term I've used for awhile which encompasses non-human visitors to my web site, and generally visitors who are NOT search engines which I would use, such as Google, Yahoo, Cuil and MSN.

Stupid bots could be email harvesters, spam bots trying to sign up accounts, people trying the latest open source software vulnerability, and other general stupidity.

After examining my access logs by eye for the past few months, and most recently, after having written a PHP script to filter my logs, I've made a few conclusions and have put methods into place to block these so-called Stupid Bots.

The point here is to prevent them from sucking down my site's bandwidth. I don't care about blocking spam or RFI or vulnerabilities, because there are none which are ever successful at the moment. However, that could change any time, and I'm sure it's not the case for everyone! The whole purpose of Stupid Bots is to show a nice white page with a one-line error message and 403/400 HTTP error which takes only a few hundred bytes of bandwidth.

Here's what I'm blocking:
- All HTTP/1.0 connections which are not Yahoo or Twiceler
- Foreign characters in the path info, such as asterisk, colon, and semi-colon, since these are used a lot in RFI and related annoyances.
- Proxy connections
- I've compiled a nice list of nasty user agents which these bots use, so any user agent which matches will be blocked.

In addition, though it will not be released here, I have a timer put in place. Since my robots.txt file forbids crawling faster than 1 page every 8 seconds, I'm blocking all bots which access more than a certain number of pages in a certain number of seconds. This has also been successful, as all repeated connections get a 403 error for 5 minutes.

Here are some samples from my access log. Everything blocked by Stupid Bots will give either a 400 or 403 HTTP error, so that's what to look for.

Blocking HTTP/1.0 Connections
Note that Twiceler is using HTTP/1.0, but I allow it in since it is a legit search robot.
Quote:

194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:40 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Opera/7.54 (Windows NT 5.1; U) [pl]"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Opera/7.54 (Windows NT 5.1; U) [pl]"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 2.0.0 Beta 1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 2.0.0 Beta 1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
194.8.75.145 - - [06/Apr/2009:20:12:41 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; KKman2.0)"
194.8.75.145 - - [06/Apr/2009:20:12:42 -0400] "GET / HTTP/1.0" 403 13 "http://macgui.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
208.36.144.7 - - [06/Apr/2009:20:12:45 -0400] "GET /downloads/?mode=download&file_id=185 HTTP/1.0" 302 0 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"


Blocking RFI attempts
It wouldn't have worked anyway, but at least we are not wasting bandwidth on this connection
Quote:

98.238.87.86 - - [06/Apr/2009:20:22:26 -0400] "GET /gallery/showphoto.php?pic_id=http://mypregnancy.hostinginfive.com/index.html? HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
98.238.87.86 - - [06/Apr/2009:20:22:29 -0400] "GET /gallery/showphoto.php?pic_id=http://mypregnancy.hostinginfive.com/index.html? HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"


Blocking spammers
Note the improper use of the HTML entity for the ampersand. A real browser would send the ampersand, not the entity.
Quote:

140.239.56.38 - - [06/Apr/2009:18:32:42 -0400] "GET /mygui/join.php HTTP/1.1" 200 14812 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
140.239.56.38 - - [06/Apr/2009:18:32:42 -0400] "GET /mygui/user.php?mode=visicap&id=e295662e7d5c4ae65bb4147eee07e42a HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible;)"
140.239.56.38 - - [06/Apr/2009:18:32:45 -0400] "POST /mygui/join.php HTTP/1.1" 200 15222 "http://macgui.com/mygui/join.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
140.239.56.38 - - [06/Apr/2009:18:32:45 -0400] "GET /mygui/user.php?mode=visicap&id=b10e88832adaee473f927e2b19baff97 HTTP/1.1" 400 27 "-" "Mozilla/4.0 (compatible;)"


It is important to note that Stupid Bots blocks, not bans! It does not need a database connection, and takes effect quite early, before most include files have been loaded in common.php. Overall, it should work silently in the background, giving short error lines to the stupid bots, while the legitimate search engines and human users go about their daily business.

Download version 0.2.0
(14 Apr 09)
http://dserver.macgui.com/Stupid_bots_0_2_0.zip

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Jim_UK
Board Member



Joined: 19 Nov 2008

Posts: 656
Location: North West UK


flag
PostPosted: Tue Apr 07, 2009 3:16 pm 
Post subject: Re: Stupid Bots 0.0.1

Dog Cow wrote:
Or maybe it's trying to tell me something...


Oh no!!!! Dog Cow is really a bot. icon_lol.gif

I will be interested in your results as like many I suspect that most of my bandwidth is sucked up by bots and Mod_security log often shows a full page of attempted exploits during the course of a single day.
Now this is what I call a useful mod. icon_wink.gif

Jim
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Tue Apr 07, 2009 3:32 pm 
Post subject: Re: Stupid Bots 0.0.1

Jim_UK wrote:
Dog Cow wrote:
Or maybe it's trying to tell me something...

Oh no!!!! Dog Cow is really a bot. icon_lol.gif

Well, I found out that the real Internet Explorer identifies itself too similarly to a user agent which I have on the naughty list. The problem is when bot writers get really smart and start making the bots identify themselves with user agents identical to FireFox, IE or another common browser. For now, we are fortunate, because the majority of bots have a similar user agent, but slip up in some place, so we can spot and block them.

Quote:
like many I suspect that most of my bandwidth is sucked up by bots

That's what really annoyed me and made want to make this thing! icon_mad.gif
Quote:
Now this is what I call a useful mod. icon_wink.gif

I'll try and hurry it up, then! icon_smile.gif

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Wed Apr 08, 2009 1:23 pm 
Post subject: Stupid Bots 0.1.0 Released

Here it is, version 0.1.0. Go see the first post for the download link.
_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Jim_UK
Board Member



Joined: 19 Nov 2008

Posts: 656
Location: North West UK


flag
PostPosted: Wed Apr 08, 2009 2:10 pm 
Post subject: Re: Stupid Bots 0.1.0

I will add it and watch for complaints from users and also monitor incidents of attempted site/server exploit.

Many of the attempts have multiple /////////// in the request.

Jim

Quote:
Fatal error: Call to undefined function: stripos() in /home/controll/public_html/phpBB2/includes/stupid_bots.php on line 94


Quote:
if ( stripos(trim(strtolower($_SERVER['HTTP_USER_AGENT'])), $bot) !== false )


PHP Version 4.4.4


Jim
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Wed Apr 08, 2009 2:57 pm 
Post subject: Re: Stupid Bots 0.1.0

stripos exists only in PHP 5.

Try this instead:

Code:
if ( strpos(trim(strtolower($_SERVER['HTTP_USER_AGENT'])), strtolower($bot)) !== false )

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Ptirhiik
Board Member



Joined: 19 Nov 2008

Posts: 114


flag
PostPosted: Wed Apr 08, 2009 6:10 pm 
Post subject: Re: Stupid Bots 0.1.0

preg_match() is very efficient too icon_wink.gif.
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Wed Apr 08, 2009 6:41 pm 
Post subject: Re: Stupid Bots 0.1.0

Ptirhiik wrote:
preg_match() is very efficient too icon_wink.gif.

So you would certainly suggest it in place of stripos?

I'll test it out. icon_smile.gif

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Ptirhiik
Board Member



Joined: 19 Nov 2008

Posts: 114


flag
PostPosted: Thu Apr 09, 2009 2:47 am 
Post subject: Re: Stupid Bots 0.1.0

Yep. Don't forget the preg_quote() also: you need it.
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Thu Apr 09, 2009 11:14 am 
Post subject: Re: Stupid Bots 0.1.0

Ptirhiik, what is your response to the people who say that strpos is faster than preg_match?
_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Ptirhiik
Board Member



Joined: 19 Nov 2008

Posts: 114


flag
PostPosted: Thu Apr 09, 2009 1:55 pm 
Post subject: Re: Stupid Bots 0.1.0

They are often - but not systematically - right. However, double trim plus double strtolower plus strpos versus one preg_match without joker... icon_wink.gif
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Thu Apr 09, 2009 5:01 pm 
Post subject: Re: Stupid Bots 0.1.0

Some notes for this week:

- It looks like Majestic12 (MJ12bot) is a good search engine, but there's actually a malicious bot which identifies itself as MJ12bot. This will be addressed in the next version of Stupid Bots.

- I will be replacing the strpos stuff with preg_match()

- Stupid Bots appears to have saved 10 MB of bandwidth over two days, as compared to days with similar amount of hits. This figure would no doubt be exponentially larger if my site were exponentially more popular.

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Jim_UK
Board Member



Joined: 19 Nov 2008

Posts: 656
Location: North West UK


flag
PostPosted: Thu Apr 09, 2009 5:26 pm 
Post subject: Re: Stupid Bots 0.1.0

Could that array of bad bots not be separate and read by that PHP file so that there would be no editing - just uploading a new text file periodically?

Jim
Back to top
Dog Cow
Board Member



Joined: 18 Nov 2008

Posts: 378


flag
PostPosted: Thu Apr 09, 2009 5:31 pm 
Post subject: Re: Stupid Bots 0.1.0

Jim_UK wrote:
Could that array of bad bots not be separate and read by that PHP file so that there would be no editing - just uploading a new text file periodically?

Jim

Yes, but a .php file would be even better. However, opening another file adds a bit of overhead to generating the page.

I'll probably add it to a future version here, but not to the one I use on my site.

_________________
Moof!
Lincoln's Tomb, Oak Ridge Cemetery, Springfield ILMac 512K BlogMac GUI
Back to top
Jim_UK
Board Member



Joined: 19 Nov 2008

Posts: 656
Location: North West UK


flag
PostPosted: Fri Apr 10, 2009 1:55 pm 
Post subject: Re: Stupid Bots 0.1.0

The reason that I say that is that it would not mean a constant updating of the mod id if it was a separate file that could be obtained.

Jim
Back to top
Display posts from previous:   
Register or Login to Post    Index » MOD Development  Previous TopicPrint TopicNext Topic
Page 1 of 3 All times are GMT - 4 Hours
Goto page 1, 2, 3  Next
 
Jump to:  

Index • About • FAQ • Rules • Privacy • Search •  Register •  Login 
Not affiliated with or endorsed by the phpBB Group
Powered by phpBB2 © phpBB Group
Generated in 0.0568 seconds using 16 queries. (SQL 0.0095 Parse 0.0011 Other 0.0463)
phpBB Customizations by the phpBBDoctor.com
Template Design by DeLFlo and MomentsOfLight.com Moments of Light Logo