We all need exactly what we want. The search engines should also give exactly what we are looking for. I will just brief how search engines work basically. The first step is crawling.
There are crawlers or spiders which are automated scripts/bots which will visit a random page and gather information about the page like what the page is about, the titles, the images, the videos, the text. Basically a crawler finds what a web page is about and who might be interested in the site.
The crawlers finds information from sitemaps. A sitemap is a file where information about the site is present. A sitemap tells the crawlers what is important in the site, when the site is last updated, how often the page is changed, the site is available in what other languages, etc.
After gathering all required information, the crawler will move to the links referred by the current site. By this way, the crawlers will visit all web pages in a chain fashion.
In Google webmaster tools, we can configure what information a crawler can take from your site, request to recrawl etc.
The information gathered by the crawlers are stored in search index. The repository of web pages is called index. There are multiple indices to store all the data. When a user enters a search query, the pages are sorted and displayed using various search algorithms. The algorithms are specific and confidential. The result of a search query is SERP (Search Engine Result Pages).
The algorithms are highly confidential and that makes the search engines unique in its way. But I will discuss some of the algorithms used by Google here.
Google uses Pagerank algorithm at the beginning. Pagerank will sort pages from index and gives a score. The high scored pages are displayed on the top. The ranking depends on many factors like the frequency of occurrence of keyword within the site, how long the site has existed, how many other sites link this site etc.
Other algorithms used by Google are,
Google Panda gives a ‘quality score’ to each webpage. The score is used as a ranking factor.
Google Penguin algorithm down ranks (gives less score) to sites that link to spam or over-manipulative content.
Google hummingbird algorithm helps understanding the intent of the query. So even the query doesn’t exactly contain the exact search term, but still google understands it.
In this example, I typed ‘ I want pizza’. There are no keywords like ‘restaurant’, ‘eat’ etc. But google shows me the nearest pizza restaurants.
For location based search, Google Pigeon algorithm is used. This establishes a strong tie between local algorithm and core algorithm.
Example, if my query is ‘weather’, it should display the current weather of my location and not somewhere else.
Google Fred algorithm filters out sites which violate webmaster’s rules. They are low quality low content sites that are not actually useful.
There are a lot of algorithms used by Google that works combined and give us the best search result.
Have you ever wondered the sources of income for Google? One of the sources are through advertisements. Google provides two advertising services — Google Ads and Google Adsense.
Formerly known as Google AdWords, this appears in the Google Search Engine Result Page (SERP). Google Ads are used by advertisers. For example, if I sell waterbottles, my brand should appear on SERP. I bid Google.
Ads works on the concept of PPC (Pay Per Click). When user clicks an ad, the advertiser pays Google.
Web page owners use AdSense. They can allow an ad to be placed on their site. The position, size, font of the ad can be adjusted by the site owner. This is exactly the ads found in blogs. In space.com, Soch ad appears.
When a user clicks the ad, part of the revenue goes to the site owner and the rest goes to Google.
— — — — — — — — — — — — — — — — — —