Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.
For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:
Speed up re-crawling of previously noindexed pages by temporarily linking to them on important pages
Temporarily internally linking to previously noindexed URLs on important pages (such as the homepage) can speed up recrawling of those URLs if crawling has slowed down due to the earlier presence of a noindex tag. The example given was of previously noindexed product pages and John’s suggestion was to link to them for a couple of weeks via a special product section on the homepage. Google will see the internal linking changes and then go and crawl those linked-to URLs. It helps to show they are important pages relative to the website. However, he also stated that if significant changes are made to internal linking, it can cause other parts of your site which are barely indexed to drop out of the index—this is why he suggests using these links as a temporary measure to get them recrawled at the regular rate, before changing it back.
If a page is noindexed for a long period of time, crawling will slow down
Having a page set to noindex for a long time will cause Google’s crawling for it to slow down. Once a page is indexable again, crawling will pick up again, but it can take time for that initial recrawling to happen. He also mentioned that Search Console reports can show a worse situation than it actually is but you can use things like sitemaps and internal linking to speed up recrawling of them.
Merchant Center feeds can help eCommerce sites keep Google current when items go out of stock
Creating a Google Merchant Center feed is recommended for e-commerce sites that have items regularly going in and out of stock. This helps Google stay up-to-date by sending signals around product status and means that Google doesn’t need to depend on recrawling your webpages to detect and reflect updates to your product availability.
How to encourage Google to recrawl ‘back in stock’ product pages with internal linking
For eCommerce websites, it’s recommended to keep URLs live when a product temporarily goes out of stock. To encourage re-crawling by search engines once the item is back in stock, John suggests only temporarily removing internal links to the product page while the product is out of stock and then re-linking to it once the item is back. Deliberate internal linking (such as including internal links from the homepage) can give Google the best opportunity of finding and recrawling the page quickly.
The URL parameter tool does not prevent pages from being crawled
John explained that any URLs set to be ignored within the URL Parameter tool may still be crawled, albeit at a much slower rate. Parameter rules set in the tool can also help Google to make decisions on which canonical tags should be followed.
Removing Low Quality Pages Takes Months to Impact Crawling and Site Quality
Removing low-quality pages from your site may have a positive impact on crawling the rest of the site, but could take 3-9 months until you see changes in crawling which can be measured using log files. Improvements in the overall site quality may take even longer to have an impact. It’s unusual to have any negative impact from removing cruft content.
Average Fetch Time May be Affected by Groups of Slower Pages
If Google is spending more time crawling a particular group of slow pages then it may make the average fetch time and crawled data look worse.
Rendered Page Resources Are Included in Google’s Crawl Rate
The resources that Google fetches when they render a page are included in Google’s crawling budget and reported in the Crawl Stats data in Search Console.
Algorithm Changes May Result in Changes to Crawl Rate
The number of pages which Google wants to crawl may change during algorithm changes, which may be due to some pages being considered less important to show in search results, or from crawling optimization improvements.
Specify Timezone Formats Consistently Across Site & Sitemaps
Google is able to understand different timezone formats, for example, UTC vs GMT. However, it’s important to use one timezone format consistently across a site and its sitemaps to avoid confusing Google.