In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.
Google can only index what Googlebot sees
In response to a question about whether there are cloaking issues around showing Google different content vs. what a user would see on a more personalized page, John clarified that only what Googlebot sees is indexed. Googlebot usually crawls from the US and crawls without cookies, so whatever content is there would be what is indexed for the website. So, on personalized pages, make sure that you’re only changing things for users that are not critical to how you want to be seen in search.
Speed up re-crawling of previously noindexed pages by temporarily linking to them on important pages
Temporarily internally linking to previously noindexed URLs on important pages (such as the homepage) can speed up recrawling of those URLs if crawling has slowed down due to the earlier presence of a noindex tag. The example given was of previously noindexed product pages and John’s suggestion was to link to them for a couple of weeks via a special product section on the homepage. Google will see the internal linking changes and then go and crawl those linked-to URLs. It helps to show they are important pages relative to the website. However, he also stated that if significant changes are made to internal linking, it can cause other parts of your site which are barely indexed to drop out of the index—this is why he suggests using these links as a temporary measure to get them recrawled at the regular rate, before changing it back.
If a page is noindexed for a long period of time, crawling will slow down
Having a page set to noindex for a long time will cause Google’s crawling for it to slow down. Once a page is indexable again, crawling will pick up again, but it can take time for that initial recrawling to happen. He also mentioned that Search Console reports can show a worse situation than it actually is but you can use things like sitemaps and internal linking to speed up recrawling of them.
To better control page indexing, use ‘noindex’ on pages rather than ‘nofollow’ tags on internal links
Adding rel=”nofollow” tags to internal links is not recommended as a way to control indexing. Instead, John suggests adding noindex tags to pages that you don’t want indexed, or removing internal links to them altogether.
Merchant Center feeds can help eCommerce sites keep Google current when items go out of stock
Creating a Google Merchant Center feed is recommended for e-commerce sites that have items regularly going in and out of stock. This helps Google stay up-to-date by sending signals around product status and means that Google doesn’t need to depend on recrawling your webpages to detect and reflect updates to your product availability.
The URL parameter tool does not prevent pages from being crawled
John explained that any URLs set to be ignored within the URL Parameter tool may still be crawled, albeit at a much slower rate. Parameter rules set in the tool can also help Google to make decisions on which canonical tags should be followed.
Sites Not Indexed in Search May Be Due to Spam or Technical Issues
There is a big difference between a site which completely disappears from Google search, and one which is demoted. A site which is removed from the index is usually due to a very significant web spam issue, or a technical issue. If a site is demoted, and not ranking as well as before, it may be due to the quality of the content, or setup of the site. Spammy backlinks are not likely to be a cause.
Some Machine-translated Content Can be High Enough Quality to be Indexed
Machine-translated content is getting more sophisticated and producing better results, so if these pages are translated to a high enough quality then they are fine to be indexed. However, the translation results should be checked by humans to ensure accuracy and quality, which can be difficult to scale across a large number of translated pages.
If You Have A Manual Action in Place GSC Will Still Show the Page as Indexed
If you have a manual action or URL removal in place, the inspect URL tool in Search Console will still show a page as indexed but it won’t display in search results. This is because the manual action and URL removal are filters which happen on top of the search results, so the page can still be indexed but not shown.
Anything Contained on Non-canonical Pages Will Not Be Used for Indexing Purposes
When Google picks a canonical for a page, they will understand there is a set of pages, but only focus on the content and links of the canonical page. Anything that is only contained on the non-canonical versions will not be used for indexing purposes. If you have content on those pages that you would like to be indexed, John recommends ensuring they are different.