Does GoogleBot use Sitemap.xml to crawl a website?

does googlebot use sitemap.xml to crawl a website

Google primarily crawls through a website’s internal link structure and not the sitemap. This is why it’s important for your website to be properly linked internally. Sitemap dependency for crawling should be kept at a minimum. Let’s find out more.

If you’re an SEO strategist, a crawling strategy needs to be part of your primary goals in the first few weeks of a project. It is essential to know that Google has correct access to your website’s pages and its code, and that critical sections of the page are being rendered correctly.

Google renders all the code and content of a page, analyzes headings and then assesses what the page is about. It is imperative for SEO strategists and web developers to make sure that this is happening properly.

Primary crawling: through internal links

Contrary to the belief of many “SEO experts,” Google’s primary source of crawling is not your sitemap, but your website’s live structure. The bot goes through all the internal links and attempts to discover all pages it can find (it prioritizes the navigation).

So how can you figure out that the crawling is occurring correctly?

  1. If the option is available in your SEO crawler, either disable sitemap crawling or do not add a sitemap link (in SiteBulb, this is available in “crawl sources” during the audit settings). And then, initiate a crawl and see if the total number of crawled pages matches the total number of pages (that you know to exist) on the website. If the total pages do not match, it means some pages are not being crawled. This means that your website is not linked correctly through the internal linking to increase pages crawled by sitebulb
  1. In Google Search Console, if you insert new URLs that you launched into the “inspect URL” top box, you will be able to see if the Googlebot discovered this URL. Do this test for a sample size of pages. In this case, try to ignore the “discovered through sitemap” as you test internal linking crawling.
  2. You can see if the Googlebot is visiting your website by doing a manual or automatic test.
  3. You need to check if the critical content on the page is also being rendered properly, and you can do this by taking a random sentence from the front end and searching it in the “view source” code of the website.

Secondary crawling: sitemap.xml

While the primary crawling occurs through the website’s internal links, Google also uses the sitemap to find new URLs that may need to be crawled and have been missed during internal linking.

According to Google, focusing on sitemap optimization for websites that are internally linked in a proper manner and have a very small number of pages is not as important as it’s for websites with a significant amount of pages (think of websites with 10,000+ pages such as ecommerce or news sites).

Remember that if you do not have essential pages links through the internal linking structure and have them in the sitemap, the crawl rate for the URLs may be reduced, as Google might think the reason you are not internally linking to these URLs is that you do not think they are very valuable for users (remember that Google is heavily focusing on helpful content).

How the robots.txt file fits into crawling

The primary purpose of the robots.txt file is to manage crawl requests to a website. It can also be used to keep media files off of search engines and very specific JavaScript that can be causing trouble to a website’s SEO, such as announcement bars.

It must be noted that disallowing necessary JavaScript or CSS files in the robots file can cause a page to not render properly for the Googlebot and quite possibly be negatively affected in search results.

Need help with improving your site health and linking logic? Start with an SEO audit.

Frequently asked questions

Are your pages being crawled and discovered but not being indexed?

Google does not guarantee the indexation of any URL. Websites that output a lot of thin content very quickly will find that their pages are not indexed. Only a percentage of a website will be indexed — something determined by Google algorithms.

How can you increase the likelihood of a page being crawled?

You can increase pages’ likelihood of crawling by adding more internal linking to these pages. You should have links to these pages from the index (home page), which is the most important page of the website. Internal and external signals to a page increase page indexation probability.

What is crawl rate?

Crawl rate is the frequency at which the Googlebot crawler visits the pages of your website. It is something Google algorithms determine by individual website analysis.

Can Google crawl Yoast sitemaps?

Yoast is a wonderful plugin for WordPress that generates and updates sitemaps. It also automatically splits up sitemaps. Google can fully crawl the sitemaps and the URLs listed inside it.

Can Productive Shop Inc. help me audit my site for correct internal linking?

Yes, we provide an internal linking service. We can fully analyze your website’s code and how Googlebot sees your website, helping you align better with standard Google guidelines. Reach out today.

Momin Malik

Momin Malik

Momin Malik is Senior SEO Consultant and Project Manager with experience in optimizing search engine rankings for B2B SaaS clients. He believes a deep understanding of search engine algorithms and data-driven strategies is important to drive measurable results. Here he posts his musings to help viewers understand Search and manage SEO and Web projects.

Get the latest blog updates from Productive Shop! Subscribe to our blog: