What Is Web Crawler (Web Spider or Web Robot)

What Is A Web Crawler (Web Robot or Web Spider)

Table of Contents

Web Crawler: Web Crawler is also known as Web Spider or Web Robot. Web Crawler is a program or automated script used by the Search Engines to get details of web sites and its web pages present in the World Wide Web. The process of getting details of the web pages of the website is known as web crawling. Web crawling process is done by the search engine in a automatic manner with a regular time interval. Each search engine has there own spider and follows unique method to crawl and index websites from the WWW (World Wide Web).

Web Crawling: Web Crawling process is also known as Web Indexing. In this process search engine send a Spider or Robot to retrieve all the details of the visible web pages of the websites and store that data into there server. So that whenever someone has search something on the search result, he will be get the best answer in the SERP. Web Spider or Web Robot do the crawling process regularly and in a timely manner to be up to date with the website and provide best result on Search Engine Result Page.

How A Web Spider Works (Web Crawling Process)

Web Indexing Process/ Working Of Web Robot: When a web robot visits a web page. It collects all the words present on the webpages and store in the database. Then it check all the meta data and store on the database of the search engine. Then it follows the link present on that webpage. The link may be within the website or out of that website (Other website). After following that links, Web spider again collects all the information (meta data, word in the webpages) of that links and follow the link of that web page. The process will go so on.

That means, Web Robot will only see the following things in the web page

Words present on that web page
Meta Data (Title, Subtitles And Meta Tag) of the web page
Links present in that website

Note: Sometimes website owner don’t want to index some of the webpages of there website. It Means they want to hide that web page from web spider for crawling. On that case they create a robots.txt file and define there which web page should not be crawled by the Web Spider.