Thin Content
Thin content is defined as web pages that provide little or no value to site visitors — whether by not offering enough content, or offering content that doesn’t really satisfy a user’s search intent. In order to ensure users receive a good experience, search engine algorithms aim to distinguish between high and low-quality content.
Want to explore thin content issues in more depth? Check out our additional resource: Thin Pages: Why Thin Content Hinders SEO.
Our SEO Office Hours notes below cover insights into how Google, in particular, views pages with thin content, along with further advice collected from Google’s regular Office Hours sessions for managing thin content.
For even more on website content best practices for SEO, read our Guide to Optimizing Website Content for Search — or explore our Website Intelligence Academy resources on SEO & Content.
If URLs that are blocked by robots.txt are getting indexed by Google, it may point to insufficient content on the site’s accessible pages
Why might an eCommerce site’s faceted or filtered URLs that are blocked by robots.txt (and have a canonical in place) still get indexed by Google? Would adding a noindex tag help? John replied that the noindex tag would not help in this situation, as the robots.txt block means it would not be seen by Google.
He pointed out that URLs might get indexed without content in this situation (as Google cannot crawl them with the block in robots.txt), but they would be unlikely to show up for users in the SERPs, so should not cause issues. He went on to mention that, if you do see these blocked URLs being returned for practical queries, then it can be a sign that the rest of your website is hard for Google to understand. It could mean that the visible content on your website is not sufficient for Google to understand that the normal (and accessible) pages are relevant for those queries. So he would first recommend looking into whether or not searchers are actually finding those URLs that are blocked by robots.txt. If not, then it should be fine. Otherwise, you may need to look at other parts of the website to understand why Google might be struggling to understand it.
Empty or Thin Pages Can be Served if Different Content is Shown Depending on Location
Empty or thin pages may be displayed in Google’s index if different content is served based on the visitor’s location. E.g. if a full content page is served to US visitors but not to non-US visitors, the page might still be indexed as Googlebot crawls from the US but non-US visitors wouldn’t see the content.
Noindex Thin Pages That Provide Value to Users on Site But Not in Search
Some pages on your site may have thin content so it won’t be as valuable to have them indexed and shown in search, but if they are useful to users navigating your website then you can noindex them rather than removing them.
Focus on Creating Fewer Stronger Pages Rather Than Splitting Them Up
John recommends focusing on having fewer, stronger pages rather than splitting up longer pieces of content into separate pages to target different queries.
A Small Proportion of Thin Pages Is Not an Issue
Thin content is a normal occurrence on websites and shouldn’t be considered a critical issue if it only impacts a small proportion of pages e.g. large news publishers may have some shorter articles which still provide unique content.
A Small Proportion of Thin Content Pages is Fine
Thin content pages can be a natural part of a site, like on category pages, and isn’t an issue with Google providing it is a small proportion of a site’s pages.
Microsites Can Be Seen as Doorway Pages
Microsites often look like a collection of doorway pages. If you are looking to build these microsites up in the long run then this might be an option, but if they don’t have value beyond driving traffic to another site, then microsites aren’t recommended for search and should be noindexed.
Google Tries to Figure Out Full Content When Encounters 206 Response Code
For pages returning the 206 response code (don’t have full content), Google follows that response code and tries to figure out the full content of the page so they can index it. Google doesn’t do anything special for this 206 response code, they try to follow the HTTP standards.
Manual Action Penalties can be Applied to Thin, Spun or Aggregated Content
Thin content penalties can be applied to sites manually by the web spam team where the entire site seems to be thin, ‘spun’, or aggregated from other sources without any unique additional value.
Google Considers Amount of Unique Content Per Page and Number of Pages with Unique Content
Google looks at how much text on each page is unique, and how many pages have unique content.