In this week’s hangout, John Mueller discussese the topic of crawling thin content and duplicate content extensively.
You may notice that there isn’t really a single satisfactory solution to many of the problems regarding decisions on how to manage multiple URLs per page and similar/thin/duplicate content.
Use Category Pages Instead of Thin Details Pages
For sites with very similar details pages which may be considered thin content, John recommends using category pages which can group those pages instead.
Google Categorises New Pages as High or Low Quality
Google tries to categorise new pages into ‘probably high quality’ or ‘probably low quality’ (e.g. thin pages) until they can gather more data.
Redirect All URLs At Once For the Fastest Site Move
The best way to migrate to HTTPS is to redirect all URLs at once, as this helps Google complete the site move faster. When part of the site is moved, and part is staying, they have to reevalute the destination site ‘step by step’ which takes longer.
Smaller Sitemaps Give Better Indexing Feedback
There is no technical advantage in splitting out your large Sitemaps into smaller categorised ones, as Google say they can crawl both equally easily, but it gives feedback on indexing which is helpful.
Avoid Thin Content, or Noindex It
John says the Google search quality team don’t like you to have thin content, even if it’s noindexed. They would rather you made the pages good quality, or didn’t have them at all, but John says it’s OK to noindex or de-empahsise them.
Don’t Use Disallow for Thin and Duplicate Content
John then says you shouldn’t use robots.txt to disallow thin pages or duplicate pages, as Google can’t see any noindex or canonical signals to help them know what to do with the page.
Only Disallow URLs Which Cause Server Issues
Google only really recommend disallowing pages which cause you some kind of server load issues, such as search results pages.
Can You Migrate a Page Without Redirection?
If you want to migrate a page to another without redirecting, if the pages are not the same, the canonical tag may be ignored. John recommends merging the content of the pages together onto a single URL.
Include Redirected URLs in Sitemaps For a Few Weeks
When you redirect URLs, you can keep those URLs in Sitemaps for a few weeks until they have been dropped by Google.
You can Noindex and Canonicalise a Beta Domain to Preserve Link Authority
In the situation where you have a beta site on a sub-domain which may generate links, and the content has changed to the point where canonical might not be accepted, but you don’t want the beta site indexed, you can noindex and canonicalise to the main site.
Generally you shouldn’t noindex a canonicalised page, but if the content is different, such as a new beta site, it could help to avoid indexing.
A Large Volume of Canonicalised URLs is OK
John says don’t worry, and use it where it’s appropriate for the situation.
PageRank Calculation includes Navigation and Footer Links
PageRank is calculated across all links including boilerplate areas like navigation and footer.