Deepcrawl is now Lumar. Read more.

Indexing

In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.

Signals Are Kept For 4xx or 5xx Error Pages Previously Dropped from the Index When They Are Re-added

If your pages displayed a 4xx or 5xx error for a while and were dropped from the index but become available again after a month or so, for example, Google will be able to return them to the search results in the same state they were before. They won’t have to start trying to rank from nothing.

3 Sep 2019

Likely That Content Won’t Be Indexed if Not Showing Up in Google Testing Tools

If Google’s testing tools are able to fetch all of the different resources for a page, but there is content missing in the rendered output, it is likely that this content won’t be able to be indexed.

3 Sep 2019

More or Less Every New Website is Rendered When Google Crawls it For the First Time

Nearly every website goes through the two waves of indexing when Google sees it for the first time, meaning it isn’t indexed before it has been rendered.

23 Aug 2019

URL Removal Tool Hides Pages But Doesn’t Impact Crawling or Indexing

The URL Removal Tool only hides a page from the search results. Nothing is changed with regards to the crawling and indexing of that page.

23 Aug 2019

There Isn’t a Separate Index for Mobile and Desktop Indexing

Google have one main index where either the mobile or desktop version of a site is contained, this is the version which will then be shown in search results. However, if you have a seperate mobile site, Google will always show this version to users on a mobile device.

26 Jul 2019

Disallowed Pages With Backlinks Can be Indexed by Google

Pages blocked by robots.txt cannot be crawled by Googlebot. However, if they a disallowed page has links pointing to it Google can determine it is worth being indexed despite not being able to crawl the page.

9 Jul 2019

Google May Index Redirected URLs if Served in Sitemap Files

Redirects and sitemaps are both signals that Google uses to select preferred URLs. If you redirect to a destination URL but the source URL is in a sitemap, this is giving Google conflicting signals about which URL you want to be shown in search

28 Jun 2019

Internal Search Results Pages Should be Blocked Unless They Provide Unique Value

Internal search result pages should be blocked from crawling because it could overload the site’s server and they tend to be low quality. However, there may be instances where it makes sense to have these pages indexed if they provide value.

31 May 2019

Ensure all Key Content is Available if You Are Streaming Content

If a site is streaming content progressively to a page, John would recommend ensuring all key content is available immediately due to the method used to render content. Any additional content which is useful for users but not critical to be indexed can then be streamed progressively.

28 May 2019

Googlebot No Longer Needs to Convert Hashbang URLs into Escaped Fragments

Googlebot no longer converts hashbang URLs into escaped fragments as it is able to render and index them directly rather than using the pre-rendered version specified with the escaped fragment. Therefore, John would recommend moving to something that’s URL-based rather than hashtag-based.

28 May 2019

Back 4/20 Next