Is Your Website’s Robots.txt File Hurting Your Site’s Performance?
Does your website have a robots.txt file? Is it configured correctly to enhance your site’s performance? Do you know what it does or how and why?
Millions of websites use a robots txt file to control how search engines interact with their website.
Google actively checks the file to ensure it can crawl your pages. However, not all sites want their pages on a search engine yet they want to give their customers the best visit possible.
This article offers a simple guide to robotos.txt files and how they affect your B2B website’s performance.
Learn how a robots.txt file works and why it’s needed. Discover the drawbacks of not including the file or misconfiguring it.
Read on to see how to improve your traffic count by implementing the best SEO practices.
What Is a Robots Txt File?
The robots.txt or text file instructs search engines and other web robots how to crawl your website’s pages.
It acts as a sentry to your site and if desired can actually block Google and other search engines from indexing content. For example, Facebook’s robots txt file ‘disallows’ all the major search engines from access its content.
Apart from preventing access to every page, you can instruct search engine bots to only scan certain areas. Subdirectories that contain information that you would prefer not to appear in Google yet work in this way.
Why Was robots.txt Invented?
The robots txt file standard is part of the Robots Exclusion Protocol (REP) which attempts to control web crawler access.
Martijn Koster developed the standard in 1994 during the infancy of the world wide web. His website was being inundated with search engine bots. That lead to the creation of a spec to direct automated traffic in a way that webmasters wished.
The REP is not an official web standard. This means that crawlers and web bots can interpret the instructions as they wish. Or disregard them completely.
The major search engines like Google and Bing do conform to the main specifications. Therefore, including the robots.txt file on your site will affect most of your organic inbound traffic.
How robots.txt Works
A robots.txt file is a simple text document with no formatting.
It must reside in the root or top-level folder of your website. For example, the Google robots txt file is https://www.google.com/robots.txt
The basic premise is to allow or disallow access to bots to areas of your site.
Bots, spiders, web crawlers, and user agents all amount to the same thing — automated traffic. Normal visitors won’t be affected by the robot.txt file as they will access your pages directly.
Googlebot can visit your site multiple times per day and checks your robots.txt file first. If you specify a user agent as Googlebot you can then disallow or block Google from crawling certain sections.
By default, web crawlers can access all the pages of your website that they find. The robots.txt file tells them not to.
Robots Txt Examples
A typical robots txt file looks like this:
Notice the user-agent. In this case, it’s Google i.e. Googlebot but others exist including:
- Bingbot – Microsoft’s Bing search engine
- Baiduspider – the Chinese search engine Baidu
- Googlebot-Image – Google Images own spider
- Yandex – Russian’s main search engine
The Disallow: line should include a relative path to the folder or files you wish to exclude from crawling.
You can disallow an entire directory by adding a leading / or enter a subdirectory e.g. /images/dec-2020. Individual pages work in the same way so Disallow: /contact.php will exclude the contact page, for example.
Choosing Multiple Options
The * wildcard is a powerful way to allow or disallow user-agents and files. For example, to prevent all bots from crawling your ‘files’ folder use:
However, this comes with a warning. By using wildcards you could effectively block all of your SEO traffic!
Be wary when you use these rules as it could impact your site’s ranking on Google search. Googlebot ignores any errors it finds in your robots.txt file but it will follow your directions.
Drawbacks of Not Including a robots.txt File
Does your website need a robots txt file to operate?
No, it doesn’t. Neither will excluding a robots file prevent Google, etc. from crawling your website. Why have one in place, then?
Add a robots.txt file to control the pages that will appear on Google Search.
If you have a login page that you want certain visitors to access but not the general public then exclude it in robots.txt. Perhaps you have a new website that resides under the /new-site folder. Disallow this testing directory in the robots file.
Enterprise websites with thousands of pages may face a crawl budget issue.
This occurs when Googlebot runs out of time while indexing your site. It allocates a certain amount of resources and may bypass pages to move onto other links.
By disallowing directories with less popular content you can present the pages you want indexing first. You can also tell Google not to crawl media files like PDFs or images.
SEO and WordPress Robots Txt Files
WordPress creates a robots file by default. The content looks like this:
You’ll notice that WordPress excludes the admin area except for the admin-ajax.php page. This file generates content that search engines find useful but doesn’t need to remain private.
Some plugins also create a Sitemap: directive that shares the URL of the site map page with the crawler.
How to Create an SEO Google Robots Txt File
First, check for the file using Google’s Robots.txt Tester
Although creating a robots.txt file is fairly simple, adding it to the root folder of your site is complex.
Creating the necessary rules to exclude private content can also take time. Mistakes can happen and the worst outcome would be to disappear from Google altogether.
Build the best robots.txt file to increase sales and qualified leads by partnering with an SEO professional.
Productive SEO and Site Performance
A robots.txt file directs a search engine when it crawls your website.
It can block certain directories, files, and media from being spidered. Yet, one mistake can effectively block Google and harm SEO results.
If you have further questions, feel free to reach out to make your site’s performance more productive.
Imran is the founder of Productive Shop, he writes on B2B demand generation and SEO strategy topics to help startups understand how to win digital share of voice. Prior to Productive Shop, Imran led demand generation at an Oracle consultancy, ran an eCommerce site servicing LE teams, and prior to that, helped build PMO offices at technology startup companies. When he's not at work, Imran can be spotted hiking in the Rockies, honing his clay shooting skills, and tumbling off of black diamond ski tracks due to overconfidence in his skiing abilities.