phpBB2Refugees.com Logo
Not affiliated with or endorsed by the phpBB Group

Register •  Login 

Continue the legacy...

Welcome to all phpBB2 Refugees!Wave Smilie

This site is intended to continue support for the legacy 2.x line of the phpBB2 bulletin board package. If you are a fan of phpBB2, please, by all means register, post, and help us out by offering your suggestions. We are primarily a community and support network. Our secondary goal is to provide a phpBB2 MOD Author and Styles area.

Permission Question

Goto page Previous  1, 2
 
Search this topic... | Search General Support... | Search Box
Register or Login to Post    Index » General Support  Previous TopicPrint TopicNext Topic
Author Message
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Sun Jan 11, 2009 11:54 am 
Post subject: Re: Permission Question

1.

Quote:

You set one up that barred them and perhaps it takes another 24 hours before they will respond to the absence of the bar.


Perhaps that's it.

A few days ago (before I became aware of phpbb2's permissions feature), I tried stopping Google et al. from crawling a particular (somewhat of a nuisance) forum, at my board.

I did this by including what I thought was the forum's url, in robots.txt.

Yet, to 'play it safe', since that url did not look like a proper url, I also included what I thought was the url (which also did not look completely-proper) for a subgroup of that forum.

Perhaps by having /phpbb2/ included as part of both urls, Google or my server or whatever did the best it could to obey what it could and just barred the entire board.

I removed the urls a couple of days ago, but they were there, for 2 or 3 days, already.

So, perhaps that's it; perhaps it will take 2 or 3 or more days to purge the system.




2. I started to "panic" when I tried creating a new sitemap, via gsite crawler.

The program's crawlers would not venture beyond "/phpbb2/index.php". So, I figured that I messed-up something.

But, as you suggest, time might be the only answer.

I tried gsite crawler, again, this morning, and, again, the crawlers would not go beyond "/phpbb2/index.php".

Darn!

- rexx icon_confused.gif
Back to top
dogs and things
Board Member



Joined: 18 Nov 2008

Posts: 628
Location: Spain


flag
PostPosted: Sun Jan 11, 2009 12:08 pm 
Post subject: Re: Permission Question

If you want to add a sitemap to your site just copy the code from this page, save it as sitemap.php, upload it to your phpBB2 folder and tell Google it's there introducing the url to your sitemap in Webmaster tools. The url you'll give to Google will be something like http://your-sitename/phpBB2/sitemap.php

This sitemap generator script will produce a sitemap in proper xml format and will only include topics that are readable for guests. Forums set to higher permissions will be exluded.

_________________
phpBB2 will never die, I hope!
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Sun Jan 11, 2009 12:18 pm 
Post subject: Re: Permission Question

Thanks!

I'll have a look, at that.

I have only had luck with Gsite Crawler. Yet, creating sitemaps has been a real drag, because it usually takes that program 5 or more days to crawl my site (30,000+ postings). Presently, the site contains 11 sitemap files plus a Yahoo sitemap file.

Theoretically, the number of sitemap files should substantially-decrease after I successfully block spiders (via permissions), from crawling that problematic forum.

Again, though, thanks!

- rexx icon_wink.gif
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Sun Jan 11, 2009 8:28 pm 
Post subject: Re: Permission Question

UPDATE:

Jim_UK:

I think you did it, again!

Although my Gsite Crawler's crawlers are still not venturing beyond, ".../phpbb2/index.php", few hours ago, I followed your advice and checked a recurring IP. It belonged to Googlebot.

Then I set my 'problem' forum's "Read" permission to "REG".

There are still 2 Googlebots crawling that forum, but most are finally crawling other forums - which, frankly, I find refreshing.

You're a smart guy!

Thanks!

- rexx icon_wink.gif
Back to top
Sylver Cheetah 53
Board Member



Joined: 17 Dec 2008

Posts: 426
Location: Milky Way


flag
PostPosted: Tue Jan 13, 2009 9:54 am 
Post subject: Re: Permission Question

dogs and things wrote:
If you want to add a sitemap to your site just copy the code from this page, save it as sitemap.php, upload it to your phpBB2 folder and tell Google it's there introducing the url to your sitemap in Webmaster tools. The url you'll give to Google will be something like http://your-sitename/phpBB2/sitemap.php

This sitemap generator script will produce a sitemap in proper xml format and will only include topics that are readable for guests. Forums set to higher permissions will be exluded.

I've put this to my site, but I'm not sure this is good for my forum.
This is my new sitemap: http://friendsforever.co.cc/sitemap.php
And this is my GooglePuller page: http://friendsforever.co.cc/googlepuller.php

I want to still read data from GooglePuller. Are you sure this sitemap is helping robots index better?
I am also using Forum Meta Tags MOD.

_________________
Image link
My Forum || My Blog

phpBB2 forever! icon_smile.gif
Back to top
dogs and things
Board Member



Joined: 18 Nov 2008

Posts: 628
Location: Spain


flag
PostPosted: Tue Jan 13, 2009 3:00 pm 
Post subject: Re: Permission Question

Yes, I am sure this sitemap will help spiders to index my forum better because this sitemap includes a last_mod date and thus helps spiders to find the latest changes.

And to help spiders even more I use a robots.txt which tells them which part of the files/urls on my server donīt need to be looked at.

Finally, I don't think googlepuller is a good idea because it simply gives such a huge load of urls that I donīt see any use in it for spiders. If you want to use data from googlepuller block access for spiders to it via your robots.txt.

_________________
phpBB2 will never die, I hope!
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Tue Jan 20, 2009 5:01 pm 
Post subject: Re: Permission Question

It didn't work; GSiteCrawler (GSC) is still crawling the forum I want ignored.

The GSC-dedicated forum suggested that spiders will ignore permissions, for boards that generate session ids - apparently, like phpbb2.

Yet, the forum did imply that blocking the forum's contents was still not impossible.

Presently, my robots.txt looks like this:



Code:

User-agent: *
Disallow: /bins/
Disallow: /checks/
Disallow: /downloadtest1/
Disallow: /downloadtest2/
Disallow: /notices/
Disallow: /twc1/
Disallow: /wcc1/
Disallow: /phpbb3/
Disallow:  /privmsg.php
Disallow:  /posting.php
Disallow:  /login.php





For this forum, I set permissions such that anyone (all) may view the forum's contents, but only board members (reg) may read these contents.

Short of taking many days to manually-add all the urls, from this forum, to robots.txt, is there any other way to block spiders, from including the forum's contents?

- rexx icon_confused.gif
Back to top
dogs and things
Board Member



Joined: 18 Nov 2008

Posts: 628
Location: Spain


flag
PostPosted: Tue Jan 20, 2009 6:40 pm 
Post subject: Re: Permission Question

How do you know, what makes you think it is still crawling that forum's content?
_________________
phpBB2 will never die, I hope!
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Tue Jan 20, 2009 7:07 pm 
Post subject: Re: Permission Question

Before, I set permissions, for that forum to: View (ALL) & Read (Reg), GSite Crawler (GSC) used to crawl over 26,000 urls.

So, with all my changes, in place, I was expecting GSC to start crawling a lot fewer urls.

Yet, I noticed, today, that it had over 27,000 urls, in que.

Sounds like 'business as usual', to me.

Here's what some one, at GSC's dedicated forum said:

Quote:

If your forum generates urls with session ids you wll hav this prorblem.
Not just GSC, but any robot includign Googlebot will have the same problem.


Assuming you find a way to handle what you want blocked properly, I doubt
this will help.


Once you have any porn, then the site is porn. You have to make it totally
inaccessible, maybe by adding a password or something..




- rexx
Back to top
~Cowboy~
Board Member



Joined: 08 Dec 2008

Posts: 297
Location: Chicago


flag
PostPosted: Tue Jan 20, 2009 9:12 pm 
Post subject: Re: Permission Question

The bots are most likely clicking on the links. That doesn't mean they can access that links destination they are trying to get to.. If the permissions are set to registered read they will be stopped with the login screen. It will still show up as a hit on that link though even if they can't get into the page.
_________________
Image link
We are not refugees we are trail blazers. icon_wink.gif
Back to top
dogs and things
Board Member



Joined: 18 Nov 2008

Posts: 628
Location: Spain


flag
PostPosted: Wed Jan 21, 2009 3:39 am 
Post subject: Re: Permission Question

Also,

It will take some time before you will see the number of indexed urls diminish in Webmaster Tools, give it some weeks, maybe a month and you will see that number starts to drop.

It is impossible that any bot will be able to few content for which it doesnt have permissions.

_________________
phpBB2 will never die, I hope!
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Wed Jan 21, 2009 10:53 am 
Post subject: Re: Permission Question

1. Thanks for the input.

I guess, then, all I can do then is wait to see what GSC comes up with.

Presently, there are 3,134 urls, in que.

If you are right, then I would expect GSC to generate fewer sitemap files, than it did the last time. Presently, there are 10 sitemap-related files in addition to a Yahoo! sitemap file.





2. It would be nice and convenient, if you were right.

I figured out that including the forum's individuals undesireable urls, to robots.txt, would add over 20,000 lines to that file.

Following is what the GSC site had to say about that idea:


Quote:

20000 lines? You're doomed.

That's not a generic pattern.
A generic patten based on those uris woudl be:


Disallow: /phpbb2/viewtopic.php?t=689


assuming they are all in that range.


But still thats' not enough.


You also need to get the urls removed from the index - and that will be one
by one.


Why don't you just remove the crap so it responds with 410 and be done?
You want a porn forum? put it on a different website. At least in a
different subdomain, and block robots from it altogether.




So, I hope you are right!

I'll report back, later, on how things turned out.

Again, thanks!

- rexx
Back to top
rexx
Board Member



Joined: 10 Jan 2009

Posts: 20



PostPosted: Thu Jan 22, 2009 8:20 pm 
Post subject: Re: Permission Question

PREAMBLE: Despite the implication, I am not running a porn site.

I set up a separate forum, for legal porn, in an effort to compromise with pornographers who were plastering my entire site with porn. Unfortunately, the number of porn posts, in that forum, soon outnumbered the non-porn posts, elsewhere, at the board. So, Google kept fixating on porn, at my site, and, essentially, ignoring the non-porn stuff.

Yet, I have been trying to reverse that trend. Hence, my lastest efforts.

WHY NOT JUST DELETE THAT PORN? A: Partly because I do not want to return to the global-plastering, which was underway before I created the forum. Also, deleting legal content would violate my belief in the importance of upholding free speech.

UPDATE: Well, GSC generated just 8 sitemap files, as opposed to the usual 10, but this still does not even come close to meeting my expectations; I was hoping that the changes I adopted would decrease those files, by, at least, half.

So, I'm not sure what the improvement, if any, was.

Following are some of GSC's "General Statistics":

Quote:

Number of URLs listed total: 231999
Number of URLs listed to be included: 231999
Number of URLs listed to be crawled: 231884
Number of URLs still waiting in the crawler: 0 (may include some already listed)
Number of URLs aborted in the crawler: 720



- rexx icon_neutral.gif
Back to top
Display posts from previous:   
Register or Login to Post    Index » General Support  Previous TopicPrint TopicNext Topic
Page 2 of 2 All times are GMT - 4 Hours
Goto page Previous  1, 2
 
Jump to:  

Index • About • FAQ • Rules • Privacy • Search •  Register •  Login 
Not affiliated with or endorsed by the phpBB Group
Powered by phpBB2 © phpBB Group
Generated in 0.0646 seconds using 16 queries. (SQL 0.0104 Parse 0.0010 Other 0.0533)
phpBB Customizations by the phpBBDoctor.com
Template Design by DeLFlo and MomentsOfLight.com Moments of Light Logo