What is duplicate content? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See Lumar’s full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)
Unless locations have unique content offerings, separate pages are not recommended
When asked about whether to canonicalize so-called ‘doorway pages’, John was keen to stress that there’s no one solution that fits every situation. The example given was a site that has separate pages for ‘piano lessons birmingham’ and ‘piano lessons london’. If there’s something unique about the offerings in each city, it’s generally fine to have separate URLs. If the information on both is the same, it’s recommended to consider folding these into one ‘stronger’ page, rather than diluting signals across multiple near-identical ones. You could also consider a mix of the two approaches if there’s a stand-out, unique element in one of those locations.
Make sure important content is not found only on canonicalized pages
John answered a question about whether duplicate content that appears in some form on both the canonicalized page and the canonical page needs to match. He replied that they don’t need to have the exact same content. With a canonical tag, Google will try to index the canonical page that was specified. If there is any unique content on the non-canonical pages then it won’t be indexed. So make sure that any content that is critical from canonicalized pages is also on the canonical page.
The URL parameter tool does not prevent pages from being crawled
John explained that any URLs set to be ignored within the URL Parameter tool may still be crawled, albeit at a much slower rate. Parameter rules set in the tool can also help Google to make decisions on which canonical tags should be followed.
FAQ Content Should be Specific to Each Page
Content you provide in an FAQs section should be specific to each individual page and not copied across multiple pages.
Duplicated Same Language Content for Different Countries May Not be Indexed but Can Show in Search Results
If you have same language content for different countries, Google will see them as duplicated and fold them together for indexing, but unfold them in search results.
Technical Issues Can Cause Content to be Indexed on Scraper Sites Before Original Site
If content on scraper sites is appearing in the index from those sites before the original site, this could be due to technical issues on the original site. For example, Googlebot might not be able to find main hub pages or category pages or may be getting stuck in crawl traps by following excess parameter URLs.
Google’s Algorithms Should be Able to Detect & Prioritize Original Content From Near Duplicate Versions
Google’s algorithms will ideally be able to detect spun content which has been rewritten from another source and see the original content as more valuable.
Having Multiple Pages for Different Product Variations Isn’t a Problem
John recommends two approaches for products with multiple variations, either ensure each individual page is indexed or have one main product page with each variation option available. The best method depends on the size of the site and the uniqueness of each variation.
GSC Data Across Duplicate Language Versions Will Only be Shown for Selected Canonical
Even if you have hreflang set up correctly, Google can fold together similar language version pages and choose one to index, meaning that data in Google Search Console will only be shown for the one selected canonical page.
Having Sections Of Duplicate Content on A Site Is Fine
Google will not demote your site if you have sections of duplicate content across several different pages. Instead they will recognise the content is contained on several pages and try to filter it out within search results and show just one page.