Deepcrawl is now Lumar. Read more.
DeepcrawlはLumarになりました。 詳細はこちら

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

Limit Links On a Page to 3000

Google webmaster guidelines recommends a maximum of 3000 links on a page, and anything over that are likely to be ignored.

8 Jul 2016

Hreflang Requires Multiple Crawls

Google needs to crawl pages with hreflang multiple times to learn that the markup is correct, so there can be a delay of up to a month to be recognised, such as after you have migrated URLs.

8 Jul 2016

Last Modified In Sitemaps Aids Crawling

Google thinks the Last Modified date in an XML Sitemap can be very useful to help them recrawl URLs, and they also support RSS and Atom feeds.

8 Jul 2016

Googlebot Doesn’t See Robots Meta Tags on Redirected URLs

If a page is a redirected, Google won’t see any robots meta tags on the page, although they might see a noindex in the headers.

1 Jul 2016

No Good Solution for Reactivating Pages

If you have pages which expire but are reactivated after a period of time, there isn’t really a good solution, but you can use a Sitemap to tell Google about URLs which are now active, and use the unavailable-after meta tag.

1 Jul 2016

Don’t Deliberately Block Pop-ups for Googlebot

If you have a pop-up which doesn’t show in Fetch and Render, then it probably isn’t seen by Googlebot. But if you are using a technique to block this deliberately then it could be interpreted as cloaking and result in a manual penalty in extreme situations. If Googlebot is able to see the pop-up, it might result in the content on the pop-up being given more weight than the content on the page.

1 Jul 2016

JavaScript Navigation May Not Be Crawled

Multiple selection filters can be tricky. JavaScript based navigation which redirects users to a new URL might not be detected by Google. John suggests setting up a small test to confirm the behaviour.

28 Jun 2016

Redirect Chains Slow Crawling

Redirect chains cause latency which can slow down crawling, particularly if there are more than 5 steps which will be rescheduled to be crawled later.

20 May 2016

Crawl Rate is Based on Pages Google Wants to Update

Crawl rate is somewhere between minimum list of pages Google wants to update, and the maximum number of pages they think it’s safe to crawl without impacting performance. Any new pages discovered can be crawled provided there is some remaining budget, but might get queued up for the next day.

17 May 2016

URLs in JavaScript May be Crawled

Google won’t see any content which is loaded via an onclick event. But they will find URLs inside JavaScript code itself and try to crawl them. It has to be loaded onto the page be default without an onclick in order for Google to see it.

17 May 2016

Back 16/19 Next