The Science of the Google Crawl

When Google finds, rates and ranks websites, there are three distinct parts of Google working together to achieve the end result; Googlebot, Google Indexer and Google query processor. Larry, Moe and Curly?

Googlebot is the crawling robot (or spider) that finds your webpage, except it doesn’t actually crawl the web like a big robotic spider. Googlebot is more like your web browser; it’s a program on a computer. Much like a browser, Googlebot sends a request to your webserver and then downloads your page. Except, Googlebot is much faster than your computer and can load thousands of webpages at a time.

Google Index Search Console Guide

When Googlebot downloads a page, it scans the page for links and adds all the links to a queue for additional pages to view. Once Googlebot has culled all the links off your page, it hands your page off to Google Indexer google inverted index.

Google Indexer stores the text from your page in Google’s index database. The indexer strips all the “stop” words from your text… like “is, and, how, the,” etc. The index is sorted alphabetically by search term, with each entry containing a list of documents that contain the term.

So, for example, the index entry for “weight loss” contains a list of millions of pages that contain the word “weight loss” in the page text, whereas the index entry for “weight loss tea” might contain fewer terms because not all weight loss sites are about tea. See?

Google Query Processor (QP) has three distinct parts of its own. First, there’s the user field on Google’s main page, where you type in a keyword and submit your search to the query processor. Then there’s an engine that matches the query to the data in Google’s index, and the engine that formats the results in the right order.

So, when Jane Doe goes to Google and searches for weight loss, the query is sent off to the query engine to pull up all the pages containing those words. Then the results are examined to find the best match. Google’s engine looks at over 100 factors to decide how to arrange all those results.

They look at things like page rank, number of incoming links, is the phrase in just the body copy or is it also in the page title and url? It looks at stored data regarding user interaction, too. Do most people who click this link hit the back button and come right back? Is this a page that is updated often and current? Based on all the factors, Google determines the pages they feel would be the best match for the searcher’s query.

Leave a comment

Your email address will not be published. Required fields are marked *