What is robots.txt and how can I use it? Print

  • 0

The robots.txt file is a regular text file, containing instructions for web robots (crawlers) used by search engines to access specific sections of your website. You may locate it in the root folder of your domains, and inside, you will typically find directives that either allow or restrict all user-agents or some of them from visiting your site or an area of the website which you do not wish to be crawled.

Why would web crawlers want to visit your site, you might ask? Well, it's quite simple - the content they go over will be indexed and shown to visitors as search results in various search engines, such as Google, Bing, and Yandex. Well, you might think that's not a bad thing at all, why should you even bother using a robots.txt file?!

Here are a few reasons you may want to do so:

1 You are working on a specific page on your site, and you want to prevent the bots from indexing it until it is finished. Having an unfinished page indexed could harm your SEO and your ranking in search engines.
2 Bots may crawl files and URLs on your website, which contain sensitive information or your website's code structure. This act can compromise the site's security, so disallowing bots from crawling specific directories is a good idea.
3 Bots are swarming you regularly, which increases your resource utilization, thus hindering the performance of your website.

The file does not seem that bad now. Let's go over how you can create a robots.txt file if it is missing from your website's directory.

 

How to create a robots.txt file

There are three ways you can use to create a robots.txt file, here they are below:

  • Through FTP - When you have logged in through FTP, please navigate to your domain's root directory and create the robots.txt file. If you are not certain what is the root folder of the domain name.
  • Through the cPanel's File Manager - Once you are inside the file manager, please go to the root directory of your domain. When you are inside, please press the "+ File" button, located on the top left corner of the File Manager. This act will open a popup window and in the first text field within it type "robots.txt", then press the "Create New File" button at the bottom of the window.
  • Through SSH - When you connect through SSH, please navigate to the website's directory using the commands provided in our Linux Commands Basics. When you are within the correct folder, please type in the following line:
touch robots.txt 

Now that you have created the file, you can start editing it and allow or, respectively, disallow the web crawlers on your site.

Underneath, you will find a few examples which you can implement on your site:

Block every user-agent from visiting the error_logs and cache folders on your site

User-Agent: *
Disallow: /cache/
Disallow: /error_logs/

Allow only a specific web crawler to index a page on your website and prevent the rest of user-agents

User-Agent: bingbot
Allow: /
User-Agent: *
Disallow: /

Prevent only a single bot from crawling your entire site

User-Agent: BadBotExample
Disallow: /

Stop all users-agents from accessing the site:

User-Agent: *
Disallow: /

Allow every user-agent out there to visit and index your site:

User-Agent: *
Allow: /
 

 


Was this answer helpful?

« Back