Robots.txt Exclusion Protocol Or REP
posted in Seo Advice |If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
How To Hide Sensitive Information From The Search Engine
In some cases there are certain areas on a web site, which we do not want to be displayed in search engine results. Aspects such as customer information or company sensitive data should never be allowed to get indexed on search engines. However given the fact that search engine spiders are meant to crawl as much as they can – how do you manage to keep these data away? Here’s how!
The Robots Exclusion Protocol Or REP
This is a protocol commonly used to prevent search engine robots from gaining access to private or confidential information on a web site. It is present in the form of a dot txt file that instructs robots not to gain access to the contents of a specific file or a directory. So let’s say you don’t want the contents of the directory “custom_info” to be indexed, simply changing the robots.txt file you will be able to do so. You will write:
User-Agent: *
Disallow: /custom_info/
This will automatically prevent, search engine robots from gaining access to the contents of this entire directory. This is in fact one of the first things any search engine robot will check for, in any URL.
Why It Is The Robot Exclusion Protocol Useful
Besides the fact that you get to keep confidential information that way, using Robot Exclusion Protocols can prevent search engine robots from wasting their time trying to index sites that do not require indexing. Also if your site has large sections which are not relevant and do not require indexing then using the robots.txt file in such cases helps search engine robots to eliminate time wastage in indexing redundant content.
The Culprit
Webmasters which have failed to create properly the robots.txt or that haven’t created it at all, may not get their web page indexed. In some cases they try to figure out the reason, but it doesn’t occur to them that the problem is the wrong syntax of this simple file. Perhaps a site in existence for several months or even more still hasn’t been indexed by search engine robots.
If you have been looking at just the HTML code hoping to find the reason behind the site not getting indexed, chances are you won’t be able to figure it out!
Specific Terms In The File
The robots.txt file is both a boon and a curse and it all depends on how well you use it to your advantage. If you place the coding within the file appropriately then your site can appropriately prevent the indexing of parts of the web site. For example the ‘disallow’ line of the robots.txt file actually means to ‘disallow reading’ and does not indicate preventing the indexing. That means in such cases the search engine robots will index the site but will never read any of the content within the file. Thus the robots will continue to add these pages into its directory without ever really visiting them.
How To Prevent Indexing
If you have the ‘disallow reading’ option in your robots.txt file the page will get indexed but the search engine robots will not read the content on the page. If you want to prevent any indexing of the web page by search engine robots you need to ensure you don’t have this disallow line. Usually any page that gets added into the robots.txt file will instruct the search engine robots not to index the page. Also note that if the web page has already been indexed by the search engine, robots will remove it from their directory. Many times the robots.txt file will also prevent external linkages from other sites to certain pages in your site. Thus if you use the robots.txt file properly you can save a lot of confidential and proprietary information on your web site from being displayed in the search engines.
Popularity: 13% [?]
Fabio Uncinotti is from Italy where he graduated as a webmaster. He is an Internet Marketer and Seo Consultant,
owns several websites that are on the first pages of Google, and that is what he is most passionate about.
He regularly posts about free Seo advice, Internet Marketing and Social Media
Optimization.