Clicky

Google Indexing Research

October 2023 – Crawling System Issues – InspectionTool Crawlers

After the announcement of the Core Update completing on October 19th 2023 and the Spam Update on October 20th 2023, there appeared an intermittent number of anecdotal incidents of sites across the globe experiencing sudden inability to successfully request indexation via Search Console. The message was Indexing Request Rejected.

On October 23rd, AWM published an article to record contemporaneously while it was happening so that we could have something on record about the Hostload Exceeded Error messages people were seeing in Search Console. Obviously, getting this error message has happened before but not for the same reasons. Then it was discovered that not only was it coming from Search Console, it was coming from the Mobile Friendly Test Tool and Rich Snippet Tools.

During this time Google blamed the incident on spammers seeking indexation that had caused the issue.

As per John Mueller –

In researching this issue further and covering it on Crawl Or No Crawl Reports on Youtube  it was discovered that during this time the InspectionTool Crawlers which were introduced in May 2023 were absent from server logs across several test and live sites. InspectionTool Crawlers Are MIA Crawl or No Crawl 25 October 2023.

On October 26 in preparation of data in the Indexation Research, after not seeing the normal inspection-tool crawlers, I wondered what bots would come if I was using the mobile friendly test tool. Within seconds it became apparent what was happening and is STILL happening.

The InspectionTool Crawlers in many cases are sending dozens and even hundreds of (compatible; Google-InspectionTool/1.0) to a single request for a mobile friendly test. Conversely, for search console requests for indexation – there is a reappearance of the Desktop Googlebot for the purpose of render crawling new content, a function that the Chrome build (compatible; Google-InspectionTool/1.0) crawlers had been doing.

Normal numbers of render pulls for an standard page are dictated by how many pieces make up a pages. In the past, when InspectionTool chrome crawlers come and render a page they can run from 3 – 6 as mobile and desktop, for a total of 6 to 12 crawls.

Today, some of these pages pulled 10 to 15 times the normal number of crawls.

Below is a breakdown of 10 sites, the type of page that was tested and the number of Google-InspectionTool Crawlers that arrived in response to the request for mobile friendliness as per each site’s server logs. I did perform the same type of requests across a total of 30 sites and have more in the works waiting for their server logs to update. At no point was there an error message or indication of anything other than a success test of the mobile friendly tool.

Site 1
inner page
192
Site 2
inner page
110
Site 3
home
102
Site 4
inner page
100
Site 5
home
74
Site 6
inner page
71
Site 7
inner page
64
Site 8
inner page
64
Site 9
inner page
52
Site 10
home
50

This indicates that there is an ongoing issue (today’s date is Oct 26, 2023) in the crawling system of the indexation system. Obviously, Google engineers have been able to compensate the taking of Inspection Crawlers out of the crawling process by replacing them with the Desktop chrome-build googlebots. This is event when content goes through the search console indexation request.

If you’re using the Google Indexing API to request indexation of new content, please note that according to the testing data, no rendering crawls have commenced with requests made through the API since isolating testing was started on Spetemeber 3, 2023.

This page will be updated once the situation is resolved.

September 2023 During The Helpful Content Update:

From time to time Google has mechanical issues with getting new content findable by its keywords – where the traffic comes into your site. There are hundreds of reasons for there to be issues getting content into the findable index and most of the analysis assumes with a fault in the content creator. This information however, looks at where the issues lie within Google’s indexation system and sub-systems – crawling, indexing, ranking and serving.

During September Google announced the Helpful Content Update – based on field evidence in the form of indexation-resistant content that literally within hours of the announcement being made started to see impressions and rankings in search console, I still believe that the Helpful Content Update is about a topic and the confirms of that topic.

The video below reveals the state of indexation during the week of Sept 18 – 25th – there were a number of chrome build updates in the google. agent crawlers – including mobile, desktop and the inspectiontool crawlers.

How do I get my page or site indexed?

As of today, the Google Indexing API is currently not running new content through the 2nd pass or rendering pass of the system. Since that is an incomplete process, I no longer recommend using the Google Indexing API.

Instead, the data of the research shows that requests made through the Google search console are not only a complete process including both the simple and render passes, it was very quick – less than 24 hours in most of the recent testing.

How to spot indexing issues?

The quickest way to spot them is to set up a test keyword that when searched will only pull up the page. But since most content creators don’t understand how to set this up – the next best way is to isolate the URL in the performance report in Search Console and set the date range from a day prior to the publish date of the new content to within a week later. When the indexation system is in working order, it should take within a week to see the page start to receive impressions.

Also, at this time this holds true for either data designation in search console – whether the site is a primary designated for desktop or mobile (smartphone) the same rates of crawling, indexing, ranking and serving are reported for both which wasn’t the case between January and April 2023.

Updated Indexation Research for 2023:

The research continues to provide further and deeper insights into the indexation process. In January 2023 the Google Indexing API modified its response and no longer provides the same predictable serving rate. Additionally, site that are designated “primary crawler desktop” in Google search console indicate that for these sites, there is an issue between the indexation and serving systems. Why or precisely if this is an system error or a filtering error or even a deliberate hobbling, is not readily visible to the data. The data clearly shows an issue but not the nature of the internal issue.

 

Update: May 2023

Signs in the testing data that primary desktop crawler sites are behaving more like the smartphone designated sites. Both showed same behaviors – crawling system (working), indexing system (working) and in the serving system, the mobile designated sites had little to no issue in serving but the desktop designated sites showed a marked reduction in the serving new content since early December 2022.

The more recent two weeks of data reveal that the serving of the new content through the indexation systems is performing as per a PRE-December rate.

 

 

Thank everyone for being so singleminded to have me procure a replay video. Someone in the group was able to provide a truncated version. This isn’t a great recording but it does have the meat of the research. Below is a transcript.

The presentation had a little more information and research data. If you’re interested and want to have an opportunity to see this again and have an opportunity to ask your questions, I’m putting a way below to share your interest. Just let me have your email and I’ll let you know when I can carve out some time to present this again.

Be The First to Get Notified On Next Forensic SEO Live Training

  • This field is for validation purposes and should be left unchanged.

Trascription below and before you leave – please subscribe to the Crawl or No Crawl Youtube Channel

https://bit.ly/crawlornocrawl

Link to research data log – google index detector

Subscribe to Confessions of an SEO – available everywhere you can get podcasts

Course: Forensic SEO Live Training

 

Transcription:

I’m double checking the waiting room making. Sure.
So I’ll keep that open. just thank you all for coming. Hey, Marie, is going to help me by monitoring the chat and as you all come in. This just has you muted. Just for coming in, I  wasn’t sure how many people would be chatting.
So if we can, as you have questions, post them in the chat, and then, there may be some that we want to address before moving on. There will also be a Q & A at the end. So you can definitely, ask your question. My goal is that whatever question you have, we answer it or someone will.
All right. So we’re getting, we’re getting a lot of people in here. I’m going to share my screen because I think, I it’ll make sense when you see it. Hold on. So can everybody see the, this is my history of the, index detector. Nope. Okay, cool. Is that Terry I recognize your voice.

Okay. so there, there will not be a quiz on this, so don’t worry about it. but I just wanted to, to show you a lot of, everything that I’m going to share today, came from this activity. And I’ll explain as, as we go.

01:60 – 03:02
Hey, Marie, do you mind if I make you a co-host and then if people come in the waiting room, do you mind letting them in? I just want to respect everybody’s time as they say, time is money. So, we can, keep going now, everybody is set so you’re mute. If you can’t hear me or something weird happens, please unmute yourself and, and say, you know, we lost you. But if it’s a question about what you see, go ahead and type it in the chat and we’ll get to it.

03:07 — 04:26
There are so many little things on this screen. All right. So can you guys see that Okay. I see a couple of heads nodding up and down. Okay. So the,  short message from today is when it comes to indexation of new content, you better ask Google nicely.

And this is a little reference to my favorite movie, A few good men. so now we’re going to go into just a little bit of, I can get it to here we are.

Okay. So I’m not gonna spend a lot of time. I assume most of you know, who I am, I’ve been an SEO for 13, 14 years. I’ve lost track. and I am an avid tester. I was in the beginning of SEO testing in 2015. And a lot of, what you’re going to see today is I’m going to do my best to present it as if we were in a test review, where other testers are present.

04:27 – 05:13
We’ve kind of done that before. So everything I’m sharing, I’m going to share it like a test, but, we’ve got, some proof on there. So I’m going to provide some context. I’m going to show you what I did with the two test sites that sort of led to this discovery, general testing methodology, what some of the updated findings have been since, discovering this, and then take some questions and then share with you, what else is going on, what else is coming out of, out of this, research project. So if everybody’s cool, we will keep going. I see a lot of people are coming in, so, Maria, are there any questions to start? No? OK

05:17 – 06:33

All right, here we go. So I just really want you to know you’re not wasting your time. and this has worked for other people, not just me. So, and actually, the first one I’m going to share is this is from Lee Witcher and Lee’s actually here. So we, I don’t know if you know, is going to just like post everything there. But, Lee is someone who I greatly respect. I’ve learned a lot from him from his approach to SEO, and he has a specialized advanced training for SEOs and specifically those using Cora. And, so it works for Lee and then, now I’ve got some screenshots, from Eric St-Cry, he sent me in some graphics to show you that basically, everything, once he started doing this, grew tall, like tall green grass. They went from 82 index pages to 520 pages.

06:37 – 08:13
Now what I’m sharing here is this is, this has been a long time developing. So I didn’t just wake up this morning and decide this is how things work. It’s been running this, this project’s been running since August 28 last year, and it has, I have contributed a lot of hours to it. but it’s been fun. And, if you’ve ever dealt with server logs and you know what these are, but, it kinda looks like the matrix, right It wasn’t that scary. I have to tell you that. So there was a problem. And last year we noticed it in the spring. There were times when Google would turn off a JavaScript indexing, and then they would turn off regular indexing.

And we didn’t really have any way to, document that we were all bitching on Facebook. We’re all complaining with each other. And, Ted Kubaitis in one of his (SEO FIGHT CLUB) shows was like, please, will somebody, we need to start putting this together, putting a body of work together so that we know what’s going on.

So, for right smart or stupid, I went ahead and did it because partially for me, my SEO html testing was at a standstill. I could not get any of my test pages to index. And if you can’t get them index you, you can’t find anything out.

08:15 – 09:13
All right. So I’m presuming that this looks very familiar to a lot of people. this was, right out of the search console.  Crawled not indexed, I couldn’t find a screenshot of Discovered not crawled because I don’t have that problem anymore.
Now what, what I’m listing here basically is these are the reasons, when all this started, I pulled these out of all the comments that I think all of us have seen. I don’t think this is breaking any new ground. You know, the answer is usually, well, your domain, isn’t an authority domain. You don’t have enough. And therefore that’s why it’s not getting indexed. Then my other favorite is Google knows before crawling, wht your content is about and if it’s worth crawling, I’m trying hard not to editorialize here, but what a bunch of bull.
09:13 – 10:08
Okay. now the other one, which is always fun, Google probably already has enough content on that topic, so they don’t need you. And this was always an, always favorite. I don’t have any trouble, therefore, you just did it wrong.
And part of the reason I wanted to hold this presentation is so that you all would realize you’re not, chances are, you’re not doing it wrong. If you’ve checked that little box, if you’re on WordPress and you check that little box that says, you know, go ahead and let search engines crawl here. That’s more than likely if you have a problem, that’s typically it.
But if you’ve checked that and you’re still having trouble, it’s, it’s not you. So going back to, again, I’m sharing, you know, the methodology, right

How are we supposed to do it? And simple terms we’re supposed to go to site, put in our URL.

10:21 – 12:10
I published a test page every day, which means I had to create seven HTML pages a week and then publish them in the mornings and check on them periodically throughout the day. It got so bad after, by the end of the year, I

published 123 pages of which only eight were indexed that included the homepage.
I think that sucks and so I was really frustrated. And then I decided I in mid-February that I was going to totally change this. Yes Lee, an impressive failure! Thank you. So, in mid February, I was really pulling my hair out and just trying to figure it out. What can, there’s gotta be another protocol. There’s gotta be some way to figure this out. we can’t go on like this.

And so I created a WordPress site and I did modify after January 1st, I modified the testing schedule two, three times a week. So I’m banging my head against the wall less. And I’m happy to say that it began to work. And out of 33 test pages, 33 were index a 100%  indexation not to rub it in. This is kind of like what it looked like when I first thought, okay, I think I’m onto something. And this is as of last week. I mean, it’s still growing, growing and growing. So it still works.

12:12 – 12:36
And when I found is it matters how you ask Google to index your lab And I will prove it. All right. So remember I

had two sites and one ran from August the end of August to mid February test pages every day. And then site two was launched on February 14th and it launched,

12:41 – 13:11
Now for everybody who says, well, the content quality really sucked. It was supposed to because the plan is, and testing is to remove as many variables as you can. So you’re only testing one factor. That’s why there are no entities in here. That’s why there are no pictures in here. Nothing, historically, as testing has been the lowest quality content, you can come up with for Google.

So for instance, you want to come up with, and I’ll explain this. You want to come up with keywords that Google doesn’t know about, because if you use something that.

13:27 – 13:50
To test within live content on, you can’t isolate your test to see what’s going on. So by using these unique words, we’re able to isolate the test pages and see what’s going on. So there were two, two ways that keywords were on a page. The one is the way that we all do,it.

And they can see it on the page. And  it’s in the content. Now, the second one was set up in JavaScript. So that the only way that Google could see it, and this is an example, the only way that it would read ABCD, MTV,

If it had been render processed by Google, so that when it showed up in the index, that’s how you knew that page had been rendered Well. So the goal was testing for two things, is Google taking in new content, just very simple content.

And then as Google was render processing it on the other side , Those were the two questions that was not, can I get this to rank number one No, where we’re, we’re not even there yet worse, simply trying to answer those too much.
14:46 – 15:58
And if you’re curious to know how to find words that Google doesn’t know, just pretend you’re typing, and eventually you will come up, you will search for a word that Google doesn’t know. And when you see the little monster fishing, that’s when you know you’ve got the right mind. You don’t want one that suggests, oh, did you mean, you know, this JC penny No, I did not. So you want to make sure that it’s something that is very, super simple. All right.
Now this was the content of the second test site. Now I didn’t know what the hell was going on. I didn’t know. Was it the random alpha? Was it the lack of entities? Was it the lack of other media? I mean, I had no idea. So I thought, well, what if I just used low quality articles and again, to test keywords, one that you could see and one that Google could only see if it did render it.

So we’ve got to, we’re comparing these two sites.

16:01 – 17:03
So here’s, you know, I did everything on the left was the site that started in August. So analytics search console, site map, robots, texts, feed XML. I requested indexing be a search console, submitted site maps and low-level traffic. You know, I didn’t know what it was. And then on the other one, I did everything the same with the exception of, I never requested indexation through the search console. And I set the site up directly with the Google indexing API being index now. And I said, low level traffic. Now I know everybody’s heard, you know, it was like,

oh, this is how you solve it, but this is why it’s worth solving it. Using the API 6.5% indexation versus a hundred percent indexation. And does this mean it’s going to be falling off the log to get your site set up to do it? Maybe, maybe not, but it’s worth it.
17:06 – 18:18
Now. A lot of people, when they tell them, when you tell them this, they’re like, that’s not true. Google requires that you have schema. Now, if you’re a tester, this makes sense why you would test it, find out that it didn’t work.
Go back later, try it again and find it that it does. So at this time, there is no requirement for there to be jobs, schema, or live events, schema on your pages.

You do not have to believe me. You just try it, convince yourself. Now, the other thing that tester does, is that just because it happened one time doesn’t mean it’s it’s real. So I created three more sites setups. the third site was real content, high level of optimization. Low-level traffic submitted through search console and fungus, the fourth site, real content, no real indexation, optimization low-level traffic, hooked it up with the API, every single page with index within 48 hours.

18:20 – 19:45
Okay. It’s still not enough, right Because now I’m wondering, is it the low-level traffic then even though search console had it, but I just wanted to eliminate and really isolate that it was the API. So real content, no real optimization, no traffic hook it up to the, and got the same result. I mean, it was giddy to say the least, and here was the real question. Can I go back and do some SEO testing with random alpha And so I was like, Nope, it was a nervous test because I didn’t, I knew I wanted an answer, but, I, I knew I couldn’t go down that road.

So, so we set it up on March 4th, by March 5th, it was indexed by the keyword and the JavaScript keyword. So it works. It has nothing to do with your content. It has nothing to do with your authority of your website. It has nothing to do with Google, having some sort of pre prescient sense of what you’re writing about. And it certainly, you know, obviously these are low quality pages, but, I wasn’t trying to rank them. I was trying to get them in.
19:47 – 20:47
So if you have new content, there are two ways that you can do it. I’ll, I’ll go through the two different ways, but

basically both of them involve connecting your site to the indexing API and the indexNow, now for WordPress, The rank math plugin has an instant indexing, a plugin that lets you hook it up. And then there’s a SEO tools for Excel has an indexing API connector, found that went out when I presented this information to a bunch of testers, which bit so when you publish new content, you just let the automatic submission take care of it. And you do not request indexing from search console, unless you just like writing a ton of content and letting it die. And again, this is for now, right So, if you, I can put this off, I can send this out, in an email.

20:47 – 22:06
So you all have it, but if you want to write it down, cause I I’d say don’t waste any time, get, if you’re having any

troubles and you’re in WordPress, set it up and the rank math blog, There’s a, step-by-step how you do it. You know, it’s, it does involve going into, the Google cloud and finding the API and it’s super, super simple. They have pictures and you literally could just, if you had to print it out and then just check off, step one, step two, step three.
Now there is a non WordPress option and this is, if you look for SEO tools for Excel, you’ll find it. They have an article on Google indexing. That is a page.
Comment from the chat. SEO Pressor has a way to hook into the Google Indexing API.
Oh, thank you, Chris. that is a paid option, but if you’re looking for indexation, I have, I firmly believe indexation has to be the pathway to the dollar. So it’s, it’s worth the investment. If you’re, you’re trying to get it at index. Now we’re at the question phase.
So, Marie, are there any in there the main question was, are you going to show this again or have this up for people to watch it when it’s not live
No, because you can write down rank math, instant indexing, plugin, SEO tools. That’s all you need to know.
Okay. Hey, so no other real questions about how to get it indexed or anything like that. I’m reading.

Be The First to Get Notified On Next Forensic SEO Live Training

  • This field is for validation purposes and should be left unchanged.

Ideally, Both Need To Be Green

No opinion. Evidence-based Analysis.

It is estimated that Google makes 1000’s of changes every year to its algorithms. Some they tell us. But most, they do not.

For the past 6 years, a small and growing number of SEOs are testing specific hypotheses to determine what boosts within the current algorithms.

Is Google Indexing or not? Simple or Complex

For most of 2021, certain processes have been slowed down or in some cases are not working. Indexing is one of these.

This data will be updated as new data shows us the simple truth. We do not need Google to tell us, we can find out ourselves. The purpose of this page to service hard-working SEO’s who find themselves made responsible for something they could not be responsible for at all.

* This page was created as a source that can be cited in SEO news reports, and articles. More information will be added to this page.

Data Source

SEO tester and researcher – Carolyn Holzman. Requests to view data will be honored.

New test pages in 2022 are published 3 times a week. 

 

Got Google Indexing Problems?

6 + 2 =

Confessions of an SEO logo

Our SEO Sponsor

Confessions of an SEO podcast exists as a bridge between the SEO and the organizations and businesses that depend on them. When business owners do not know or can not tell the difference they must have access to a trusted 3rd party. SEO information discussed on the podcast is agnostic. Both SEOs and those they serve can find value.

Available on Amazon Alexa, Spotify, Google Podcasts and where your get your podcasts. Click here for the latest episode – Confessions of an SEO