This “paper” is a rundown of not just the Google InspectionTool crawlers but how they behaved in the lead up to July 1, 2023, the date when Google Analytics 4 “forced migration” for any sites using Google Analytics and completely retired Universal Analytics. Portions of this paper were published on the Strategic Marketing House website – where I am leading the digital strategy team. The Ultimate Guide to Google’s InspectionTool Crawlers is a snapshot of what they are and what they do. Here I’ll tell more of the story about them and how they evolved to what we see in our server logs today.
I also want this article to be able to stand on its own and not assume that readers are familiar with my public and on-going indexation research project. Because if it weren’t for that project, I don’t think anyone else would have been standing around and watching server logs to even know what was happening. (You’re welcome!)
If you know all about the indexation research project and you want to cut to the new stuff, click on the link and jump down to where it starts – InspectionTool Crawler info.
Introduction Of the InspectionTool Crawlers
The easiest way to digest this information is to do a walk through of the project that led me to being in a place where I was observing these new crawlers. To understand the change, one needs to understand what was happening before. I’ll share my methods in testing, what was true before the crawlers, what happened after the change over in Google analytics and my thoughts on what the data has implied to me and how that might impact all of us.
Prior To InspectionTool Crawlers
Background
Since August 2021, I’ve been singularly focused on testing the process of indexation of new content from publish to being found. In 2021, there was a significant drop off in Google taking in new content into search results.
While there was no announcement from Google of the issue, there was massive amounts of anecdotal discussion and general industry frustration on getting new content into search results. That August I felt that someone needed to document and take measurements on what requirements new content being held for getting into (or not getting into) search results.
Testing Method
This ongoing research involves launching daily (or mostly daily) test pages of various types of content with two (2) test keywords that were unknown to Google. e.g. sldkfccjhssssdfghjkasdf
One such keyword is placed in html to test for simple html rendering and a second keyword wrapped in javascript and broken up into groups of letters where only after javascript was rendered in the second pass through the google render machine was this word “seen” by Google. e.g. sldk fccjh ssssd fghjk asdf
If the simple html keyword brought up the test page, the first pass had occurred. If a test page was found by the javascript keyword, then the 2nd, rendering pass had occurred.
That is/was the test. Not trying to rank for “personal injury attorney new york”. Simple to uncover the process of indexation and answer a series of questions like:
- “Is google crawling new content?”
- “Is simple rendering on for new content?” [first pass]
- “Is javascript rending on new content?” [second pass]
- “How long does it take for new content to go from publish to findable?”
Overlaying Search Console, Semrush and Server Logs Onto Test Data
Since beginning the research, additional readings were overlaid on the data. These included:
- when does Google update the Search Console Page Report?
- how often does Google update the Search Console Page Report?
- is there a pattern to the search console updates?
- is there a connection between indexation system and what we see in Semrush sensor reading of desktop volatility and both desktop and mobile volatility?
Reading server log records of test sites gave me the ability to identify which chrome build was Googlebot using and when they updated their mobile and desktop crawler agents. This kind of information can be helpful to seo tool builders to know better how Google “reads” a page.
Measurements of Data Revealed Patterns
Once the server log data and counting of the bots was included in the research, an update pattern emerged.
The process still currently is as follows:
- Introduction of a new chrome build on a mobile crawler. The desktop crawler remains at the previous chrome build.
- Testing of the new mobile bot with the previous chrome build bot – almost like a confirmation of sorts.
- During this testing time, reports of “roll backs” occur. Roll backs are the term used for when previously served content is no longer served and then after some time (days) the content is returned to a serving status.
- Introduction of the desktop bot with the new chrome build – almost works like a bookend to signify the completion of the testing.
For more detailed on specific chrome builds and updates – check out Google Index Detector on that page there is an embedded spreadsheet where each tab represents Sunday through Sunday data each week and you can find any specific date to see what was happening and use it as a diagnostic tool to explain any issue a particular site experienced. No guarantees but it has shed a light on many unannounced events.
This following section is part of the “Ultimate Guide to Google InspectionTool Crawlers”. After that I’ll get into what I think all this means for us all.
The InspectionTool Crawlers
On May 17 2023, Google added new crawlers called InspectionTool crawlers to their list. These crawlers come whenever we use the Rich Result Test and the URL inspection in Search Console. You can find this here, look for the section on Google-InspectionTool.
Two crawlers are listed –
A mobile version –
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
And a desktop
Mozilla/5.0 (compatible; Google-InspectionTool/1.0)
Google Documentation Partial
Google’s documentation is partially accurate. Upon examination of multiple sites’ server logs – more InspectionTool Crawlers are in active service. Below I’m listing them with chrome builds as of June 28 2023.
DESKTOP crawler strings (2)
Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Safari/537.36 (compatible; Google-InspectionTool/1.0)
MOBILE crawler strings (2)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
* Note how only one character separates these two crawlers. Look at the the semi-colon before the ending parentheses on the first one.
What Files Do The Different Crawlers Target
Below are examples of the types of files each crawler appears to be responsible for crawling.
The simple html crawls relate to the first pass crawl referenced above, where the simple keyword is found. The crawling of the script files and css files is the rendering crawl. That rendering is when the javascript is activated and once that has completed is when Google first can assign the javascript keyword to the page.
File Types |
InspectionTool Crawler |
Mobile or Desktop |
Slug Files – i.e. /my-new-page/ non-rendering |
Mozilla/5.0 (compatible; Google-InspectionTool/1.0;) |
Desktop |
Slug Files – i.e. /my-new-page/ non-rendering |
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;) |
Mobile |
Scripts – i.e. /wp-content/themes/twentytwentyone/assets/css/print.css?ver=1.8 HTTP/1.1 rendering |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Safari/537.36 (compatible; Google-InspectionTool/1.0) |
Desktop |
Scripts – i.e. /wp-content/themes/twentytwentyone/assets/css/print.css?ver=1.8 HTTP/1.1 rendering |
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.179 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0) |
Mobile |
The Months Run up To G4 (July 1, 2023)
Now we’re getting to the interesting part! Was there an observable in server logs in advance of the switchover to Google Analytics 4 and sunsetting of Universal Analytics when new content was published frequently?
Yes. The InspectionTool bots evolved in the run up to the July 1st shift to Google Analytics 4. Again a picture is worth 1000 words –
Date |
Chrome Build |
# of chrome build that run |
GB/GMB |
Semrush Volatility |
May 20 – 26 |
113.0.5672.92 |
1 |
113.0.5672.92(35) |
May 18 – 21, 26 |
May 27 – June 10 |
113.0.5672.126 |
1 |
113.0.5672.126(36) |
June 1, 3, 6 – 9 |
June 11 – 13 |
112.0.5615.142 |
1 |
112.0.5615.142(33) |
June 13 |
June 14 – 25 |
112.0.5615.142 |
2 |
112.0.5615.142(33) |
June 14 – 25 |
114.0.5735.90 |
114.0.5735.90(37) |
|||
June 26 |
112.0.5615.142 |
2 |
112.0.5615.142(33) |
June 26 |
114.0.5735.133 |
114.0.5735.133(38) |
|||
June 27 |
112.0.5615.142 |
2 |
112.0.5615.142(33) |
June 27 |
114.0.5735.179 |
114.0.5735.179(39) |
|||
June 28 – date |
114.0.5735.179 |
1 |
114.0.5735.179(39) |
June 28 – July 2 |
* July 1st – Sunset of Universal Analytics and shift of sites to Google Analytics 4 (GA4)
Explanation of the above chart –
Date: The above dates relate to the various chrome builds of the InspectionTool during the lead up to the GA4 deadline.
Chrome Build: Initially the InspectionTool chrome version launched (113) and by June 11th was rolled back to a previous chrome build and then slowly and what one could say carefully, updated to the latest mobile chrome build of the standard crawler.
Number of Running Chrome Builds: This describes the various chrome builds that ran together when new content was published.
The GB/GMB (Googlebot and Google Mobile Bot): During this timeframe the specific standard crawler agents (+http:www.google.com/bot.html) that used the matching chrome build of the InspectionTool chrome build were brought back out of retirement and run in tandem.
This type of testing behavior is a characteristic of what happens when a new bot comes onboard. New bot arrives and the same content is crawled by a previous “tested” bot.
Once testing was complete, whatever Google engineers are comparatively testing for because they do not tell us what they are testing, the sign of that testing was complete is the introduction of the NEW chrome build.
The incremental testing starting in May, for a period of time running two versions always against the 112 chrome build until completing to the current chrome build of the latest mobile crawler agent – 114.0.5735.179 on June 28, 2023.
What Problem Existed That These Crawlers Solved
The above is a breakdown of WHAT happened during the launch and catch up to the current chrome rendering version.
Any discussion of WHY they introduced and developed these new crawlers is by nature going to be conjecture. Google does not give us detailed plans and tell us how or why they test. We know that they do test search results and that includes all their data streams – onpage content, the link graph, click validation, images etc) and they tell us that it could be thousands of tests during a year.
My first question is what problem does it solve by going to the added expense of creating and testing these InspectionTool crawler agents?
Prior to the introduction of these new crawlers, ultimately there was no way to distinguish at least through the server logs and I suspect the same to be true when it came to the data processing on the other side of search console, between content that was discovered and content that WE told google about either through search console or the various other tools that show us what google sees.
What Can Google Do Now That They Can Distinguish
Isn’t it now highly likely possible for Google to identify the content that THEY discovered and distinguish that from the content that we called attention to via these tools telling Google proactively about our new content?
In the documentation, Google confirms that these crawlers come when indexation requests are made through the search console and for the Rich Result Tests.
(Independently I have tested and confirmed that they also send these crawlers for tests via the mobile friendly test tool as well as others who tested and shared their findings. Another reason Google may want to update the documentation.)
Possible Implications
Perhaps it is in Google’s benefit to be able to sort out search results that THEY discovered and assigned keywords to vs search results that are a blend of their discoveries and our requested new content to be crawled, indexed and assigned a keyword to be findable in the results for that term/s.
Perhaps its a way to gate the onramp into a keyword by having these crawlers crawl and then running that data through filters such as the Helpful Content System or the syndication filter (sure there are others we just don’t have names for yet) to perhaps disqualify that content for any keyword assignment.
Assignment of Keywords To Content
Often in search console content is designated as “Indexed” but it is not served by any keyword or snippet of the page’s content to be findable in search results. I’ve even seen test data that made it all the way through to findable only to become unfindable at a later date. Lately in the past couple of weeks this is much more common. – why?
Keeping Up With The Relentless Amount of Content
We have speculated for years that Google might be struggling with the scale of new content entering into their system while also maintaining what is already in their system.
They have for some time limited the number of new pages we can request to be indexed. Whether it’s the daily limits in the search console request for indexation or the Google Indexing API, there’s only so much per day they want us to be able to proactively submit.
How is this distinguishable from gating access into the findable portion of the index?
I report my observations of the indexation testing and speculate about what I see on the Crawl or No Crawl Youtube channel concerning the Helpful Content System which I also suspect might also be a part of a gating system.
I have shared some data out of the indexation research that points to evidence that indexation now might require topical cohesion between my new content and the existing content already indexed on the site despite how well it is optimized on page.
What better way to be able to segregate new content, only run that new content through a gating filter and instead of every new page going through it, just the ones we request and thus reduce the number of pages that just aren’t going to make it into the publicly accessible index which they have curated to approx 200 results? (That is a run on sentence but this is exciting and scary at the same time.)
If the page does not topically match the existing or majority of the content already on the site, then it would stand to reason the system would not assign a keyword to those pages. It has nothing to do with the merits of the page optimizations but the merits of that new content page as a part of a collection of existing already indexed pages on the same site.
Perhaps these InspectionTool crawlers provide more that just let us request indexation. Maybe they “flag” our content and Google compares the crawled content to the classification system they tell us that is part of the Helpful Content System. If it topically relates – our content gets place in the queue to get assigned a keyword, then scoring it and ranking it. If our new content is mathematically NOT a topical match to our existing served content on the domain – then we see “INDEXED” in search console but not findable in search results.
This also presents a lot of challenge to us as SEOs meaning that if we publish content on a client’s site and it is for some reason NOT a topical match in the math – then it kind of is OUR responsibility to know this and either figure out where its off and that those keywords, entities or LSI (catch word for words that are not the main keyword or do not have a wikipedia page) off the page and try again. I’m seeing this myself in a field test for roofing that is showing me that roofing as a service is perhaps a different topic than an informational article about shingle types that might be more suited to roofing supply.
The topical tripwires are very subtle but its basically math.
There are signs that we may have a good deal more testing to do to decipher the Helpful Content System.
In Closing
Based on the data, the Google Developer crawler page needs updating as of July 3 2023 on the section describing the InspectionTool Crawlers and what tools are using these new crawlers.
For those that have questions concerning the InspectionTool crawlers – you can direct questions to me on here. As things change this information will be updated.
Check out the podcast, Confessions of an SEO – Season 3 – Episode 20 as well as the short videos on the Crawl or No Crawl Report on Youtube Helpful Content System on June 24, 2023.