For the past few days that culminated in this morning’s rash of the search console request indexation issues, more and more are reporting an error when they attempt to submit their existing or new content through Search Console. If this is you today (or tomorrow or in the future) everything I share here is going to help you understand what happened, who has to fix it and how to know when your site is out of “it”.
Likely this is as incident that will eventually make its way to the search status dashboard. If you haven’t seen it before, don’t worry, once I started to speak more publicly regarding the Indexation System Google pushed it out shortly after and replaced its List of Updates Page with it.
An executive summary would read like this – issue with the crawling subsystem of Google that was brought to google’s attention and they began to repair the system starting Saturday October 21, 2023 approximately 11 am CST – expect system repair to be in progress and we’ll have some announcement from Google on Monday, if not sooner.
The Problem:
Content submitted via search console was return failed error messages implying the issue was with the host server or the number of requests was causing an issue
Is this a problem with shared hosting or some technical setting on the site?
If you’re like me, a quick google search for Failed Host Exceeded error message will bring up recommendations and from them we can glean what may have worked as solutions in the past. Because, let’s face it, we always think WE did something wrong before we even consider that it might not be US.
But we’re not looking for an academic definition, we want a solution!
The following is a truncated series of actions performed on the site in question.
Htaccess corrupted files
Replaced current htaccess file with basic htaccess content which resulted in NO change in results. Even after simplifying the htaccess information, saving, clearing the cache and resubmitting the URL in search console – no change.
Indexing Request Rejected.
This result had me thinking that it was more likely that what we’re all seeing is an indexation subsystem error – the crawling system. Note that the search status dashboard is where this will likely be noted as an incident retroactively.
Youtube Video That Claims They Have Successfully Solved The Issue This Morning
A colleague shared with me a new video he has found that declared they had solved it. It wasn’t in English but we both followed the video and noting the results.
I don’t want to share this URL because the claim at least as of this morning was bogus. It was not designed to help for right now but to rank for this term in the future when this is no longer an issue – because who would know the truth? We will and that’s why I’m spending my Saturday writing contemporaneously to commemorate this event. Note that if you even find the video, in the comments you’ll find one that indicates thanks that someone tried what was suggested and got a different result. Could not be further from the truth – don’t waste your time trying to translate it. I did and in no example displayed in the video, did he get a different result when submitting for indexation.
Re-Submitting Sitemaps in Search Console
Submitting sitemaps in search console resulted in successful re-submissions but no confirmation they were read
That particular sitemap (sitemap-index.xml) is a nested sitemap page with a list of the other sitemaps. So I went ahead and resubmitted all the others other – each one now showing today’s date but no confirmation that they were read.
In my research I’ve found that when Google is slow to confirm it “read” the sitemap, it is a signal that they are working inside the Search Console area.
Then I tried to request indexation for individual URLs – and for three times in a row was successful. But when I tried to do it a forth time on another URL Got the failure notice – not quick enough on the old screenshot because my initial reaction if Google is working in there and one can do the same thing and get a different result, then the fixing of the issue MUST be from Google’s side.
Twitter Conversations Directed Towards Google With Requests To Fix This Error
By this time, late Saturday morning, on Twitter there were already comments and requests sent to John Mueller over the course of a couple of days that would indicate this an ongoing issue. More on this later
The Server Logs because Logs Do Not Lie
For those that know me and my work, I’m deep into observing indexation processes and server logs. In my Forensic SEO Training I devote a section to the study of server logs, specifically in regards to crawler agents and bots, specifically Google’s.
For this site in particular, I was brought into it because my colleague was struggling to get pages indexed. It wasn’t long before I had hosting access and could access the logs. What they showed me was over the past few days there was very little bot activity. I found two robots.txt inquires on the 17th and nothing after that. The logs from the 18th confirmed a crawl of one page that I had submitted that evening but nothing else.
Below I will share the actual server record from October 19 and 21 only changing the domain and slugs to protect the anonymity of the site.
This was the crawl of the page submitted via search console indexation request button on Oct 19 (Thursday)
Oct 19 2023
66.249.66.208 – – [19/Oct/2023:18:04:16 -0500] “GET /1-slug-name-here/ HTTP/1.1” 200 64026 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)” domain.com
66.249.66.193 – – [19/Oct/2023:18:04:17 -0500] “GET /robots.txt HTTP/1.1” 200 120 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” domain.com
This morning’s logs revealed the bot activity during the exact time that those 3 pages were submitted manually through search console
Oct 21st
When these three pages were submitted this morning and received a non-error confirmation that they were added to a priority queue.
66.249.66.192 – – [21/Oct/2023:11:02:55 -0500] “GET /2-slug-name-here/ HTTP/1.1” 200 77567 “-” “Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)”
66.249.66.208 – – [21/Oct/2023:11:02:55 -0500] “GET /2-slug-name-here/ HTTP/1.1” 200 77567 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)” domain.com
66.249.66.208 – – [21/Oct/2023:11:02:56 -0500] “GET /robots.txt HTTP/1.1” 200 120 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.66.192 – – [21/Oct/2023:11:04:15 -0500] “GET /3-slug-name-here/ HTTP/1.1” 200 78129 “-” “Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)” domain.com
66.249.66.192 – – [21/Oct/2023:11:04:15 -0500] “GET /3-slug-name-here/ HTTP/1.1” 200 78129 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)” domain.com
66.249.66.192 – – [21/Oct/2023:11:04:16 -0500] “GET /robots.txt HTTP/1.1” 200 120 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.66.192 – – [21/Oct/2023:11:05:09 -0500] “GET /4-slug-name-here/ HTTP/1.1” 200 70520 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)” domain.com
66.249.66.192 – – [21/Oct/2023:11:05:10 -0500] “GET /4-slug-name-here/ HTTP/1.1” 200 70520 “-” “Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)” domain.com
So Indexation is Back To Normal – right?
I’d like to say that but at this point, the logs did not show a robots txt file on that last page crawl. Plus, normally there are a couple more crawlers that come and do more things – so yes, bot behavior changed at this time. The timings indicate this correlated with the actions I was taking in search console. But we’re not out of the woods just yet.
What Else Has To Happen To Get Crawled Enough to Get Indexed, Score and Ranked
Below are are the bots that come to crawl and render new content when requested through Search Console – if using the Google Indexing API test results are indicating that the rendering google bots are not being sent at this time. Check out Crawl or No Crawl Reports for the latest test results where the Indexing API tested 20% on simple html pulls and 0% on javascript or the 2nd pass rendering. Close To The End of The October Core Update
A closer look at the Google-Inspection Tool Crawlers
Looking at the raw data above, you can see there two crawlers, one without a chrome build and one with the latest chrome build (118.0.5993.70).
Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)
And this one –
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)
Note the semi-colon at the end of the string – this is what distinguishes these two initial crawl bots from the other two bots that I was expecting to see because the missing bots are the ones that do the render crawling. The first one is the mobile version and the second one is the desktop version.
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Safari/537.36 (compatible; Google-InspectionTool/1.0)
That those last two are absent from the server logs is important – they must come in order to render the content – I have prepared a Guide to the InspectionTool crawlers that includes their lead up to the GA4 migration deadline of July 1st 2023.
For those that haven’t heard yet, I’ve been involved in indexation research since August 2021 which continues to this day and can be found here Google Index Detector.
After becoming deeply familiar with bot behavior in server logs over last two plus years, I recognize this pattern. Whenever Google bot finds a new page – it will crawl the robots.txt file to confirm if indeed THAT new page it found can be put into the index.
Data is helpful in that kind of way. You don’t have to take my word for it, you can see it in your own logs.
- Crawl html of URL first
- Crawl robots.txt second
This pattern does continues with this morning’s crawl. But what we are missing is the render crawls.
What Are Render Crawls – Why Are They Essential To Indexation Process
Render crawls are activity that certain crawlers do to pull the pieces, scripts and css portions of the page – they pull things that look like these example –
- “GET /wp-content/themes/easywebsite/css/lightbox.min.css
- “GET /wp-content/themes/easywebsite/style.css?ver=6.1.1
- “GET /wp-content/themes/easywebsite/css/owl.carousel.min.css
- “GET /wp-content/themes/easywebsite/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70
By pulling those pieces in the “2nd pass” or rendering pass, Google has what it needs to “reconstruct” our content in a way that they can easily manage it, process it and score it and flow it through their servers faster and faster.
Until we can see that Google crawlers are doing this 2nd pass, then and only then will we start to see new content and revised content updated in the search results.
The Solution: Google must fix this issue. There is nothing within our control that we can fix their crawling sub-system.
How to know when things are back to “normal”?
This is something you can do yourself. You don’t have to wonder or wait for Google to tell you. You will know because you can see it with your own eyes.
Step 1
Go to search console and submit a page (try to use the same page as your example so it will be easy for you to see it in your sever logs as you move along)
Step 2
First, confirm if that error message in search console pops up or not. Either way, you can move into Step 3
Step 3
Access your server logs. These will be on the host level. Most hosting providers will have this. You’re looking for the most recent logs that cover the same date and time when you submitted your page URL. And you are looking for google crawls of your page and there should be several.
Step 4
Search for each of the following:
- +http://www.google.com/bot.html
- Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)
- Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)
- Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Safari/537.36 (compatible; Google-InspectionTool/1.0)
When things are back to normal you’ll see the googlebot and all four (4) variations of the Google-InspectionTool including those that are rendering the pieces of your page. Please note these chrome builds are as of October 21, 2023.
If you have any questions, want to share your experience – check out the YT channel Crawl or No Crawl where there are frequent short video updates on the indexation system and Core Updates as well as spam and Helpful Content System Updates. You can post your question(s) in one of the recent videos. Or maybe you could leave comments here if I haven’t already turned them off.
BTW: I am developing a indexer software which should be in public beta sometime in November. Subscribe to the Crawl or No Crawl Youtube Channel to be a part of that beta. https://www.youtube.com/@crawlornocrawl