AI Search is a continually evolving space. LLMs are continually improving with new versions being deployed regularly, and AI Overviews are already in place in more than 200 countries and more than 40 languages. AI mode has officially launched in the US, promising to expand what AI Overviews can do with more advanced reasoning, thinking, and multimodal capabilities to help with the toughest questions. The use of vector models is enabling semantic search to understand what is being asked, not just how it’s phrased (read our research on Semantic Relevance).
With all of this in mind, our AI research and development team is examining how AI Search is evolving, and how we can help our users understand their content’s performance and optimize it for optimal results.
But AI still relies on a lot of the foundational parts of search, like crawling and indexing. As John Mueller said in Search Central Live NYC in March 2025:
“All of the work that you all have been putting in to make it easier for search engines to crawl and index your content, all of that will remain relevant.”
Lumar has several reports that will help you optimize your content, which we’ve outlined below. And as I said above, our team is looking at additional reporting and analysis that can help, so watch this space!
View our recent webinar on technical SEO in the age of AI search
View the webinar
Crawlability Reports for AI Search
These Lumar reports identify whether AI bots can access your pages. If URLs return 4xx or 5xx errors, are blocked by robots.txt, or trapped in redirect loops, AI systems won’t be able to crawl them, making discovery impossible. Use the following reports to identify issues and make sure all your content is accessible to AI bots:
- Broken Pages (4xx Errors). URLs that return a 400, 404, or 410 status code indicate a page could not be returned by the server because it doesn’t exist.
- 5xx Errors. URLs that return any HTTP status code in the range 500 to 599 indicate a page is temporarily unavailable and may be removed from the search engine’s index.
- Failed URLs. URLs that did not return a response within Lumar’s timeout period. This shows potential temporary issues due to poor server performance, or a permanent issue.
- Disallowed Pages with Bot Hits. Pages that are disallowed in the robots.txt but appear to have been crawled by search engines.
- Redirect Loops. URLs with redirect chains that redirect back to themselves, creating an indefinite redirection loop.
- Redirect Chains. URLs redirecting to another URL that is also a redirect, resulting in a redirect chain.
- Excessive Redirects In. Pages that have more than 30 other URLs redirecting to them, which may impact relevancy.
- JavaScript Redirects. Pages that redirect to another URL using JavaScript.
- Internal Redirects Found in Web Crawl. URLs that were found in the web crawl, that redirect to another URL with a hostname that is considered internal, based on the domain scope configured in the project settings.
- ChatGPT Blocked. Pages with a 200 response that are blocked in robots.txt for the GPTBot or ChatGPT-User user-agent tokens.
- Google AI Blocked. Pages with a 200 response that are blocked in robots.txt for the Google-Extended user-agent token.
- Bing AI Blocked. Pages with a 200 response that are blocked in robots.txt for the Bingbot or MSNBot user-agent tokens.
- Common Crawl Blocked. Pages with a 200 response where the URL is blocked in robots.txt for Common Crawl’s CCBot user-agent token.
Renderability Reports for AI Search
AI systems may rely on rendered (JavaScript-executed) content. If important elements only appear post-render, but bots can’t access them, you risk your content not being visible to AI. The following reports help you identify potential issues, so they can be addressed:
- Rendered Link Count Mismatch. Pages with a difference between the number of links found in the rendered DOM and the raw HTML source.
- Rendered Canonical Link Mismatch. Pages with canonical tag URLs in the rendered HTML that does not match the canonical tag URL found in the static HTML.
- Rendered Word Count Mismatch. Pages with a word count difference between the static HTML and the rendered HTML.
Indexability & Visibility Reports for AI Search
Even if a page is crawlable, it may not appear in AI results if it’s not indexed or lacks visibility in search. The following Lumar reports identify pages that are ignored by search engines or missing from SERPs. Use these to prioritize fixes that improve discoverability:
- Non-200 Pages. All HTML pages that returned a non-200 HTTP response code in the headers. Search engines will ignore the body content of pages that do not return a 200 status, so these pages will not be indexed, and links will not be followed.
- Pages with No Bot Hits. All indexable, primary (not duplicated) pages that don’t receive Google bot hits. If Google is not finding and crawling these pages, it is likely that AI systems will also be unable to find them.
- Indexable Pages without Bot Hits. Indexable pages that did not have any requests in the Log Summary files included in the crawl.
- Primary Pages in SERP. Primary pages (unique indexable pages, or the primary indexable page of a duplicate set) that have impressions in Google Organic SERPs.
- Primary Pages Not in SERP. Primary, indexable pages that did not have any impressions in Google Organic SERPs.
- Primary Pages Not in SERP without Followed Links. Primary pages that did not generate any impressions in Google’s SERPs and have no internal followed links pointing to them.
- Primary Pages Not in SERP Not in Sitemaps. Primary pages that did not have any impressions in Google Organic SERPs and are not in Sitemaps.
- Primary Pages Not in SERP with Low Deeprank. Primary pages that did not generate any impressions in Google SERPs and have a low internal DeepRank, meaning they may not have been prioritized within your site architecture and are lacking links from prominent pages.
Bot Behavior & Crawl Budget Reports for AI Search
Insights into bot behavior show how often AI bots hit your pages and which pages are ignored. Pages with no hits or low frequency may be under-prioritized by AI crawlers, especially if they aren’t in sitemaps or are disallowed. The following reports help you assess crawl efficiency and make improvements:
- Pages with Mobile Bot Hits. Pages that had mobile bot requests in the Log Summary files included in the crawl.
- Pages with Desktop Bot Hits. Pages that had desktop bot requests in the Log Summary files included in the crawl.
- Pages with Low Desktop Bot Hits. Desktop pages that had fewer requests than the project’s “Low Bot Request setting” (default 10).
- Disallow Pages with Bot Hits. Pages that are disallowed in the robots.txt but appear to have been crawled by search engines. These URLs may not have been crawled by Lumar if the default project setting to respect the robots.txt is in place.
- Pages without Bot Hits in Sitemaps. Pages that appeared in a sitemap, but did not have any requests in the Log Summary Files included in the crawl.
- Redirects with Bot Hits. Pages that are redirecting, but are being crawled by search engines.
- Duplicate Pages with Bot Hits. Indexable pages that have been crawled by search engines, and share an identical title, description, and near-identical content with other pages found in the same crawl, excluding the primary page from each duplicate set.
Structured Data Reports for AI Search
As mentioned above, AI search is a constantly evolving space. It’s not 100% clear right now how much of an impact structured data (or schema markup) has on LLMs. It is possible AI systems may use schema markup (e.g. FAQ, Product, How To, etc.) to understand context and relationships. The following Lumar reports expose missing or invalid markup and highlight high-value structured content already in place, and so may also assist in optimizing content for AI search
- Pages with Schema Markup. All pages included in the crawl that have schema markup found in either JSON-LD or Microdata. Highlights any schema.org markup on a page.
- Pages without Structured Data. All pages included in the crawl that do not have schema markup.
- Product Structured Data Pages. All pages in the crawl that were found to have product structured data markup.
- Valid Product Structured Data Pages. All pages with valid product structured data based on Google Search Developer documentation.
- Invalid Product Structured Data Pages. All pages with invalid product structured data based on Google Search Developer documentation.
- Event Structured Data Pages. All pages in the crawl that were found to have event structured data markup.
- News Article Structured Data Pages. All pages in the crawl that were found to have news article structured data markup.
- Valid News Article Structured Data Pages. All pages with valid news article structured data based on Google Search Developer documentation.
- Invalid News Article Structured Data Pages. All pages with invalid news article data based on Google Search Developer documentation.
- Breadcrumb Structured Data Pages. All pages in the crawl that were found to have product breadcrumb data markup.
- FAQ Structured Data Pages. All pages in the crawl that were found to have FAQ structured data markup.
- How To Structured Data Pages. All pages in the crawl that were found to have How To structured data markup.
- Recipe Structured Data Pages. All pages in the crawl that were found to have recipe structured data markup.
- Video Structured Data Pages. All pages in the crawl that were found to have video structured data markup.
- QA Structured Data Pages. All pages in the crawl that were found to have QA structured data markup.
- Review Structured Data Pages. All pages in the crawl that were found to have review structured data markup.
How Else Does Lumar Help?
Aside from collecting and providing analysis at scale for your site, Lumar also helps you prioritize and action data to quickly identify, prioritize, and fix issues—and stop them happening again.
- Make the most important, impactful fixes first with logical grouping of issues, health scores and visualizations to avoid data overload. You can even engage our own industry experts to help.
- Save time, action tasks, and improve collaboration with AI-supported processes—like ticket content creation to ensure devs have all the information they need—so issues get properly fixed and technical debt is reduced.
- Stop issues recurring with automated QA tests and dev tools to prevent new code introducing issues.
- Mitigate risk with customizable alerts when issues do return or new issues appear, and customizable dashboards so you can easily monitor multiple domains, geographies, or important site sections in one place.
Find out how Lumar can help you optimize for AI Search
What’s Next in Lumar for AI Search?
As mentioned above, our team are working on additional reports, analysis and improvements that relate specifically to AI search. We’ll update this article as we release new reports and analysis, but you can also sign up for our newsletter below to stay updated on what’s new in Lumar.