Indexing is the process of adding pages into search engines. Depending on which meta tag has been used (index vs. no-index), Google will crawl and index your pages accordingly. Indexing ultimately means your pages have been saved in Google’s index. It is worth mentioning that these meta tags can sometimes be ignored, but they are recognised as a directive by Google.
As SEO’s our aim is to ensure that all of our indexable pages have been indexed by Google – and that our most important pages are being indexed. At the end of the day, without indexing your pages have no way of ranking in SERPs. However, having your pages indexed doesn’t mean that your pages will start to rank.
By working to improve website indexing, we’re aiming to increase the volume of relevant and useful pages that are indexed by Google and to increase the rate of pages being indexed. As Google themselves have stated: “Google doesn’t crawl all the pages on the web, and we don’t index all the pages we crawl.”
Where Do We Begin with Indexing?
Google Search Console’s Index Coverage report helps you to learn which pages have been index and how to fix the pages that could not be indexed.
With this report, the intention is to gradually see an increasing count of valid indexed pages as your site grows. So, the best way to do this is to refine what Google has access to through the use of meta tags.
The updated Search Console has some useful features that have eliminated some pain-points and less manual work is involved. In fact, you don’t necessarily even need to have a submitted XML sitemap.
The updated Search Console also helpfully differentiates between the way it indexes pages and highlights differences. It shows:
- Any errors blocking pages from being indexed,
- Pages that have been indexed but has some issues Google is unsure about,
- Pages that have been successfully indexed,
- Intentionally excluded pages.
You can also toggle the impressions so see how any of these may have affected your site’s visibility.
So How Do We Fix These?
To help improve website indexing, we’ll look at some of the common errors for each of the above sections and what can be done to resolve them.
Submitted URL marked noindex – This is an error for Google as it is an inconsistent signal. It essentially means that it has been submitted in the sitemap (and therefore you are asking for the page to be indexed) but has a noindex meta tag (therefore you are asking Google not to index the page).
Submitted URL blocked by Robots.txt – This is another inconsistent signal as you’re asking for the page to be indexed by including it in the sitemap, but in the robots.txt file you’ve asked Google not to crawl the site.
Submitted URL seems to be a Soft 404 – For this the server is returning a 404 response and the best approach here would be to improve the content.
Submitted URL not found (404) – This page should be removed from the sitemap and old internal links to this page should be updated.
Redirect error (chains/loops) – Google doesn’t like more than 3 redirection hops and will not follow more than 5. The primary URL should be included in the sitemap and not a link that has been redirected to another. We’d also recommend that all internal links should go to the primary URL.
Indexed, though blocked by robots.txt – If it was intentional to not index this page, then a noindex page should be used. If it was not intentioned to block this page then remove the page from tobots.txt.
It’s worth remembering that robots.txt should not be used to hide web pages from Google Search results.
Indexed, not submitted in sitemap – If this is an important page then it could be worth adding this to the sitemap.
Indexed, consider marking as canonical – Canonicals are designed to define a primary URL so review the page and add a canonical if relevant.
Submitted URL not selected as canonical – Use the URL inspection Tool to see what page (if any) has been selected as the canonical URL.
Align your Indexing Signals
As we’ve covered, many of the issues that are being highlighted by Google are as a result of inconsistent signals. By aligning your signals, you will be much more likely to achieve the primary aims we covered earlier: “increase the volume of pages that are indexed by Google and to increase the speed of pages being indexed.”
Also, when you made any necessary changes to your site, it’s important to Validate your fixes. This tells Google that they should re-check the error.
We all need to make it as easy as possible for Google to crawl and index our websites, and we can do this by providing aligned, clear and consistent signals.
Remember to make use of Google’s tools and resources to diagnose issues and to exclude or refine less important content through the use of canonicals and noindex. Doing so ensures that the page is still accessible by website visitors and won’t take anything away from your crawl budget.