So Google finally start to react to low level spamming on a high volume, by announcing the first opening shots of the war on duplicate content scraping and mashing.
Originally posted over at the Google Blog was information about Google search and Search Engine Spam. An interesting post that reveals Google have increased their Size (the number of documents in the index), and also (and more importantly in my mind, their ‘freshness’ the speed at which pages are churned.
The follow up post by Matt Cutts on his blog goes to further explain that in fact google are looking to weed out duplicate content only providers, and this could have a great affect on those sites who really don’t produce much new content themselves, they just rehash the content that other people originate.
This is something I like but my old friend Jill Whalen raises a good point in the following
Does this only effect scraper type, unauthorized use of content? Or will it also effect those articles where the third party has permission to republish the content.
For instance, I have many articles where I have allowed other sites to run them, but I would still presume that my original article would be the one that shows up in a relevant search. It seems this is not always the case (even with your new algorithm…assuming it’s in place.
Do we give up that right when we allow others to republish our work?
No real reply to that one as yet, so the jury is still out. Google are facing pressure from many sides, not least of which is from the search and big internet players themselves. I think that 2011/12 will be a landmark time for Google, it will be the years that truly define just how big they will get.
The question is, why intentionally publish and duplicate your articles on other sites and expect your site to rank for that article ?
The article may get crawled on the other site first and links back to the original articles are increasingly no-followed, blocked redirected or use JS.
They of course do it just to gain backlinks, then have the audacity to complain when the duplicates they created are outranking them.
Those duplicate content only providers such as article directories, mashups should have been weeded out a long time ago. Why on earth has it taken Google so long to realise this? finally they are doing something about it, hoorah 🙂