how to fix robots.txt file errors
By Carrie-ann | Apr 18, 2024 | Useful Tips

How to Fix Robots.txt File Errors

I feel I need to apologise for such a technical and boring post, not my usual style but it is needed and I hope it is useful for you.

There are a number of potential robots.txt file errors that can arise, which can affect your online search visibility. Let’s explore some of these problems and run through the fixes that you can easily implement to address the problem.

What is Robots.txt?

Robots.txt is a system file that enables you to set out how you want search engine crawlers to crawl your site, which is an important part of a robust technical search engine optimisation (SEO) strategy.

What Role Does Robots.txt Perform?

It can be used to achieve a number of outcomes, including:

1. Blocking webpages from being crawled

Although robots.txt can’t be used to prevent a page from being indexed, it can ensure that they don’t have a text description. Additionally, non-HTML content on a blocked page won’t be crawled.

2. Blocking media files from search results

Whilst public files will always exist online, audio, video and images that are blocked by robots.txt won’t appear in search engine results pages (SERPs).

How to Resolve Robots.txt Mistakes

There are many common errors that are made in relation to robots.txt files and although these mistakes have consequences, the good news is that they are usually easy to fix and won’t cause any long-term damage to your online presence.

These mistakes include:

1. Misplaced robots.txt

Robots.txt must be positioned in your site’s root directory because it will be completely overlooked by search engines if it is located within a subdirectory.

Achieving this will require you to have root access, which is worth noting because there are many content management systems out there that, as a default, upload files to a separate media subdirectory. So, you may be required to find a way around this to ensure your robots.txt file is correctly located.

2. Using defunct noindex tags

Google hasn’t utilised noindex rules for robots.txt files for more than three years so if your website is only a couple of years old the chances are this won’t affect you. However, if you are using noindex tags and are wondering why your content is appearing in SERPs, I recommend implementing an alternative method.

There are a few options available, and I usually prefer adding the robots meta tag to the top of webpages you don’t want to be indexed.

3. Poorly positioned wildcards

Robots.txt supports the asterisk and the dollar sign, however I always advise minimising their use to avoid running into any issues. For example, you might be surprised at just how easy it is to inadvertently prevent your entire site from being accessed with an incorrectly positioned asterisk!

It’s good practice to utilise a testing tool just to double check that everything is working as you expect it to.

4. Blocked stylesheets

Googlebot needs to have access to JavaScript and stylesheets in order to correctly see your PHP and HTML pages. So, if you find that your webpages aren’t behaving as you would expect them to in SERPs, I recommend checking to make sure that you aren’t preventing crawlers from accessing important external files.

There are a couple of fixes you can implement here, with the simplest option being to remove the line that is blocking access from your robots.txt file. In situations where there are certain files you want to block access to, ensure that an exception is inserted to guarantee that important JavaScript and CSS files remain accessible.

5. Unsupported elements

Although Bing does support crawl-delay, this isn’t something that Google supports. Until recently there were crawl settings located within the Google Search Console, however these settings were removed in 2023 because Googlebot now has the ability to detect when a server is nearing its capacity and to respond accordingly.

6. Unnecessary access to pages under development

Just as restricting crawler access to live pages is poor practice, it is important that you don’t allow crawlers to index pages that are under construction. Blocking access is a very simple process involving the careful placement of a disallow instruction to your robots.txt file. However, you must, of course, ensure that any disallow instructions are removed when the new pages are ready to go.

If you notice that a recently launched site isn’t performing as expected in SERPs or if a development site appears to be seeing actual traffic, your first step should always be to search your robots.txt file for a disallow rule.

7. Missing XML Sitemap URL

Although not technically an error that will impact the appearance of your site in SERPs or affect its functionality, it is good practice to incorporate your XML sitemap URL into your robots.txt for SEO purposes.

Don’t forget, Googlebot will look at your robots.txt file first when crawling your site, so you should aim to do everything possible to help it to understand your site’s structure and your most important pages.

8. Incorrect use of absolute URLs

There are instances where the use of absolute URLs is best practice, but this really isn’t the case for robots.txt files. Instead, you should use relative paths to indicate which areas of your site you don’t want crawlers to access because there are no guarantees that crawlers will interpret the use of absolute URLs in the way you intended them to be interpreted.

Tools to Support Robots.txt Error Recovery

If you have already identified that a robots.txt error is negatively impacting your search appearance, take steps to correct those mistakes and determine whether your fixes have remedied the situation. The great news is that you don’t need to wait for your site to be crawled again, as there are SEO crawling tools out there that you can utilise.

When you have determined that everything is performing properly, submit a sitemap to Google Search Console and/or Bing Webmaster Tools and submit a request for affected pages to be re-crawled. Although there are no specific timeframes in which you can expect re-crawls to happen and errors to be addressed, you’ll at least know that you’ve taken all the necessary steps to optimise your online presence.

If this guide on how to fix robots.txt file errors was useful then please stick around. Have a look at the digital marketing services we offer and let us grow your business together.