If you use Google (like most of us) you might wonder how it works.
Recently Rachid Guerraoui, professor at the Federal Institute of Technology, Lausanne, France wrote on a blog for the French newspaper Le Monde:
(base translation from the original French by Google Translate)
“What will happen if we search for Michael Jackson on Google? One sees links to pages about the singer: his life, his photos, videos, family, fan clubs, death, etc. All this seems very logical a priori. But if you dig a little, it should intrigue us. After all, there are millions of pages with the string “Michael Jackson” on the Web. Why does Google offer us nothing about Michael Jackson the carpenter in Dallas? Or Michael Jackson the singing teacher in San Francisco? If you were the carpenter, you might even be shocked to see no link to a page that talks about you, so that you have published dozens by including your name each time. Why such injustice?”
Professor Guerraoui goes on to discuss the “… PageRank algorithm, invented by Sergey Brin and Larry Page, the two founders of Google, inspired by the work of Jon Kleinberg of IBM. It’s the algorithm that creates what Guerraoui calls “injustice”, but does such a good job of finding the right information about the right Michael Jackson.
Google’s “…PageRank was originally the ranking of the results of the search engine Google. Today, more than two hundred other criteria are used to classify these results. The recipe is secret, which opens the door to all sorts of speculation on this ranking.”
You should know that while the formula is secret Google’s ultimate goal is very clear. According to Tamar Yehoshua, director of product management on Google’s search in an interview with Slate Magazine: “Our vision is the Star Trek computer,” she shot back with a smile. “You can talk to it—it understands you, and it can have a conversation with you.” Amit Singhal head of Google’s search rankings team confirmed this: “The destiny of [Google’s search engine] is to become that Star Trek computer, and that’s what we are building.”
Back to the present, there are a few things that are known about parts of Google’s original secret formula:
- indexes pages by keywords
- calculates a numeric score for each page relative to its content
- the score is calculated in part by the scores of the links to the page
- as of March 2013 Google had indexed 30 trillion web pages
The name “Google” is a homonym for “googol” the digit 1 followed by 100 zeroes. You can see that the current number of pages indexed fit comfortably inside the “Star Trek computer” that Google is building.
An article at venturebeat.com lists some likely other ingrediants in the secret formula:
- the freshness of the results
- quality of the website
- age of the domain
- safety and appropriateness of the content
- user context like
- location
- prior searches
- Google+ history and connections
Professor Guerraoui continues:
“Intuitively, it is as if each page has a number of votes represented by its score, and could share their votes between all the pages that reference.”
He concludes:
“Lots of details are hidden from you. It’s scary to think that (in order to reproduce Google’s results) you (would need to) calculate the ‘fixed-point of a matrix equation with a matrix with billions of rows and columns.’ … Is this a big calculation? No! It’s really a very, very … very enormous calculation. And it takes a lot of computers to achieve it. And this is only one of the functions of a search engine …
It’s difficult to visualize the magnitude of just one component of the more than two hundred parts of Google’s algorithm. You don’t need to. Just call, email or search for us on Google. Knowing how to work with Google and other search engines is part of what we do.