what is robot.txt & its usage in blocking files

Ramya Panneerselvam
2 min readMay 9, 2018

--

Most often,many users may like to block some files and folders from web crawlers to crawl those unwanted files which may include images, style sheets and java script, which may take more bandwidth.so,if you want to save some bandwidth by excluding some links from indexing, we also must indicate search engine to keep away from those files and folders.As we know that all the web pages should be indexed by the search engine so that it could be ranked well.During the process of indexing we may use robot.txt to avoid visiting those web pages.

We may also use robots meta tag instead of robots.txt,but that is not possible since the search engine may not always read all the meta tags.

What is robot.txt?

Robot.txt is a file we put on our site to inform the search robots which pages you would like them not to visit.It is mandatory for the search engines to follow the robot.txt to know what is not supposed to do.Moreover,we should be clear that,robot.txt is not a way to prevent your search engine from crawling the pages but to intimate gently to search engines to not visit some sensitive information that we put on the site.So it is always better to avoid those sensitive information to be indexed by search engine and displayed in search results page.

Location of Robot.txt:

the Robots.txt should always be located in the main directory so that the search engine always see the main directory and note the sites that should not be indexed and move on to next pages.but when the robots.txt is not properly located,no surprise that the search engine index the entire sites.

Syntax of robot.txt:

user agent:

Disallow:

The user agent is the one which indicates the search engine and disallow indicates the files and folders that is to be excluded from indexing.

Eg: user agent: *

disallow:/temp/ it means all user agents disallow to view temp directory.

A robotted page can still be indexed if linked to from from other sites.Sometimes we use anchor text to create internal and external links,in that case even though it is blocked by robots.txt,the search engine index it.So we have to prevent the URL from appearing in results page,by protecting the file with passwords or noindex meta tag.

hence the robot.txt is used to disallow the web sites from indexing which reduces the overwhelmed traffic of search engine while frequently accessed and avoid reading some repeated URL’s.This article clearly gives an overview of robot.txt and its uses.

--

--