Internet and Businesses Online > SEO > One Million Pages of WebmasterWorld Dropped by Google as Forum Bans Bots
0
Reviews [ add review ],
Article rating : 0.00, 0 votes. Author : Mike Valentine
The top internet forum and best known discussion site for
website owners, WebmasterWorld has been dropped entirely from
Google! A site with over a million pages seeing over 2 million
page views a month just disappeared from search engines! How
often have you been searching for the answer to issues
affecting your web site when you found a thread in
WebmasterWorld forums in the top search results?
Never again will you see WebmasterWorld in search results
until this bot ban is reversed.
The following URL actually takes up in the middle of the "FOO"
forum discussion that runs over 40 pages (at the time of this
writing) But there is a nice recap of issues that leads the
page there recapping much of the previous 23 pages of
discussion.
http://www.webmasterworld.com/forum9/9618-1-10.htm
Site owner Brett Tabke is being grilled, toasted and roasted
by forum members for requiring logins (and assigning cookies)
for all visitors and effectively locking out all search engine
spiders. One big issue is lack of effective site search now
that you can't use a "site:WebmasterWorld.com" query to
find WebMasterWorld info on specific issues with a Google
search. Tabke is being slammed for not having an effective
site search function in place before getting the site dropped.
WebmasterWorld has been entirely removed from Google
after Tabke decided to use robots.txt to block all spiders
with a universal blocking of all crawlers.
User-agent: *
Disallow: /
He has stated that this is due to rogue bots clogging and
slowing site performance, scraping and re-using content and
searching for web reputation on individual companies within
forum comments. I've a similar problem at my site on a much
smaller scale. Crawlers can request pages at excessive rates
that slow site performance for visitors. I've instituted a
"Crawl-delay" for Yahoo and MSN, but rogue bots don't follow
robots.txt instructions. (Google is more polite and requests
pages at a more liesurely rate.)
Can't say I completely understand the WebmasterWorld action to
ban all bots, or if it will achieve what Tabke is after, but
it sure is creating a buzz in search engine circles. Lots of
new links to WebmasterWorld will be generated by this extreme
action and then, when access to search engine spiders is once again allowed from the robots.txt file, the site is likely to get re-indexed by all the engines once again in it's entirety.
That will certainly be a heavy crawl schedule to re-index over a million pages by the top search engines, further loading the server and slowing the site for visitors. Perhaps Tabke plans a phased re-crawl by allowing Googlebot to index the site first, then Slurp (Yahoo), then MSN bot, then Teoma. It could be that he's created more work for himself in managing that re-crawl.
When this happens, there'll be thousands of new links from all the buzz and many articles discussing the bot ban which will lead to WebmasterWorld becoming even more popular. Many have suggested
the extreme move of banning all crawlers was simply a plan to gain public relations value, and links, but somehow I doubt it. Tabke claims the bot ban was done in a moment of frustration after his IP address ban list grew to over 4000 and management of rogue bots became a 10 hour a week job.
Barry Schwartz of SEO Roundtable interviewed Tabke after his
dramatic decision to ban all bots. That interview clarifies
much confusion, but still doesn't fully justify the dramatic
move that effectively drops over one million pages from
Google. http://www.seroundtable.com/archives/002863.html
Web reputation crawlers are partially at play here as well.
Corporations looking for online commentary, both positive and
negative to their company, use web reputation services which
crawl the
web with reputation bots (crawling mostly blogs and
news stories) looking for comments about their clients that
may harm or help them. This may be of value to those
corporations, but it needlessly slows site performance to no
advantage for webmasters. If a site owner has trashed a
company on their blog, they certainly don't want the "Web
Reputation Police" crawling their content in order to sue them
for libel.
Rogue bots are a serious problem, but they simply can't be
controlled with robots.txt. Tabke said himself that even the
cookies and login are useless against serious scraper bots as
the bot owner must simply manually enter their bots through the
login, which assigns a cookie to it, then let it loose within
the forums to automatically continue to scrape away once past
the gate. Rogue bots don't follow robots.txt instructions.
I've often wondered why anyone would go to such lengths to
steal content and re-use it elsewhere, when it is unlikely to
help them in any substantial way. Everyone knows that content
is freely available at several article marketing archives,
but the rogue bot programmers seek out content that ranks
highly first - and fail to realize that there are multiple
reasons for those high rankings. Off page factors like
quality, relevant, inbound, one-way links from highly ranked
blogs and industry news sites. The bad boys out there stealing
content won't get those inbound links - OR the high rankings
on the sites where they've posted that scraped content.
Article archives experience scraper bots too. Bot programmers
would rather write a bot program that collects content for them
(to automatically dump it into another site) than to
carefully choose relevant work to post in sensible hierarchies
of useful content. Automated scrape and dump laziness. What
other reasons would you have for scraping free articles?
The other reason for scraping content would be to plaster it
up across Adsense and Yahoo Publisher Network (YPN) sites as
content to attract advertisements and hope for clickthroughs
from visitors seeking valuable keyword phrases that generate contextual ads worth more to those webmasters. This convoluted thinking results in sites that don't end up ranking very well and don't generate much income to those lazy, bot programming, nerds
that create those types of sites.
There are several software and cloaking packages available to
lazy webmasters that claim to gather keyword-phrase-based
content from across the web via bots and scrapers, then
publish that content to "mini-webs" automatically, with no
work on your part required. Those pages are cloaked
automatically, against search engine best practices, and then
Adsense and YPN ads are plastered over those automatically
created pages, yes, you guessed it - automatically. Serious
search engine sp*m, cloaked, so search engines don't know.
One last reason for content scrapers is to find content to
use on blogs in the latest craze used to fill those fake blogs
(also known as Spam Blogs or Splogs) with content, then ping
the blog search services to notify them of new posts. Constant
newly added scraped content is added to the blogs and the
pinging suggests that the blog is prolific and should be
highly ranked. This is closely related and promoted by the
above mentioned article scrapers. This is the latest type of
spam that is being combatted by search engines. It seems that
search engine sp*m is just as serious as emailed sp*m.
Good luck to WebmasterWorld's effort to ban those rogue bots
and scrapers!
Copyright © December, 2005 by Mike Banks Valentine
Mike Banks Valentine operates http://WebSite101.com Free Web
Small Business Ecommerce tutorial and Provides SEO content
aggregation, press release optimization and custom web content
Search Optimization http://seoptimism.com/SEO_Contact.htm
Free Content Article distribution site http://Publish101.com
Deep Articles portal.
Article reviews
Post your review
[ Note : no HTML/URLs - will removed automatically ]
More articles from Internet and Businesses Online > SEO
|