Today I thought i would mention the case of the curiously disappearing pages in google. I am now getting regular enquiries from people who are concerned about their site losing page saturation in the might G. No indexed page = no traffic of course, which in turn = no money.
having worked with a few sites who have suffered this, I have found that the main issues contributing to it were
By this I mean that the pages were pretty much duplicated with only a small element changing on each page. This is especially a problem with E com sites as the header, nav footer all stay the same. Visible content might look a little different but when you look at the code, you see that in hundreds of lines, maybe only 4 or 5 are unique.
Poor navigation/site structure
In this case I noticed that the actual site hierarchy was poorly designed. There was no clear structure for the search engines to apply weight to.
Poor linking structure
here I saw poor linking, in as much as most pages were linked to each other from most other pages. this only served to water down PageRank (link juice) to a point where the pages were all seen as unimportant (again related to poor structure/architecture
Too many links
This was a common theme as many carts have pop out or drop down option selections, which look innocent enough, but on further investigation can be seen to be causing problems . It is possible to have hundreds of links per page, and this isn’t good.,
While a flat file structure is OK for a small site, a clear linking and hierarchy MUST be evident in a large site. This allows Google to apply it’s weight to each page, its trust to each page, pro-rata.
Poor linking leads to poor PR spread, and that is not good in the eyes of Google. Despite what many say, actual PR matters to Google, it matters for many things, and gauging the value of a page is one of them.
All the above serve only to confuse the search engines as to the importance of pages within your site. Each site has a page saturation level, it is worked out by the two main elements in the google algorithm (yes there are really only 2 when it is all boiled down)
1. Importance – this is a measure of value in the eyes of google and is pretty much page rank
2. Relevance this is a textual value.
The above are further split in to the 250 or more sub elements that make up the algorithm, but when all said and done, it is those 2 that matter.
With most of a shopping carts pages being near duplicate content, and the page cross linking structure being higgledy piggldy at best, how is Google supposed to know what is important or relevant? Put simply, they can’t.
The result of this is that wile a site may be showing some 1500 or so pages indexed, when you get to between 200-300 pages, the cached versions stop showing. So the reality is even worse. No cache in the main index, yet often times a cache when you visit the individual page?
While Google announced the scrapping of the infamous supplementary index, it appears that just like in the case of Mark Twain (Samuel Langorne Clemens, rumours of its death were greatly exaggerated.
Finally, I will say this. If you read the Google webmaster technical advice pages, it tells you not to make the errors above that contribute to page dropping.