Search Engine is “the Algorithm/Program that searches databases and internet sites for the documents containing keywords specified by a user”. The search is carried out through the Algorithm like Linear, Binary & Merge Sort to look for specific data / content based on the keyword.

 

There are three basic stages for a search engine: crawling – where content is discovered; indexing, where it is analyzed and stored in huge databases; and retrieval, where a user query fetches a list of relevant pages.   

 

 

Let’s take a brief of each steps: -

 

  1. Crawling: Crawling is the acquisition of data about a website. This involves scanning the site and getting a complete list of everything on there – the page title, images, keywords it contains, and any other pages it links to – at a bare minimum. Most crawlers may cache a copy of the whole page, as well as look for some additional information such as the page layout, where the advertising units are, where the links are on the page (featured prominently in the article text, or hidden in the footer?).
  2. Indexing:  Process of creating index for all the fetched web pages and keeping them into a giant database from where it can later be retrieved. Essentially, the process of indexing is identifying the words and expressions that best describe the page and assigning the page to particular keywords.
  3. Processing - When a search request comes, the search engine processes it, i.e., it compares the search string in the search request with the indexed pages in the database.
  4. Calculating Relevancy - It is likely that more than one page contains the search string, so the search engine starts calculating the relevancy of each of the pages in its index to the search string.
  5. Retrieving Results - The last step in search engine activities is retrieving the best matched results. Basically, it is nothing more than simply displaying them in the browser. Algorithm has to decide the which page to rank first, second, third and so on.