Google Algorithm

Ricard Santiago Raigada García
3 min readOct 20, 2023

Extract

I invite you to Google the word maths. How many results do you get? About 260 million results from a single word. So how can Google suggest a display order, or is it just random?

Article

I invite you to Google the word maths. How many results do you get? About 260 million results from a single word. So how can Google suggest a display order, or is it just random? I guess you will not think that it is a random result and that is why when you do a search in the first 10 entries you find the answer. Actually, PageRank is the algorithm that Google created to order and display those websites that are highly relevant to the search that the client makes in the browser. It is easy to understand, a numerical value is given to each website and from this the search is ordered.

For data science, it is important to know how to model mathematically because it allows us to understand and predict phenomena. It is evident that having data in real time or deferred and applying suitable modelling can become a great source of value.

So, what does Google’s PageRank algorithm have to do with maths and data science in particular? I will start by making a brief intuitive introduction.

The first thing you should know are dynamic systems. These systems are known as stationary systems, since they are based on states. It is then that from the initial states the future evolution of the system can be calculated.

Delimiting dynamical systems a bit, for linear algebra there are discrete-time linear dynamical systems with a finite set of state variables, therefore, linear matrix models. That is, the system is described in these models by a vector that evolves over discrete time (each of the states of the system: t=0,1, 2…n) multiplied by a fixed matrix.

This type of system that I have intuitively described is known as a homogeneous Markov chain. It is very common to use Markov chains to represent the transition probabilities between states. This is known as a stochastic matrix. Precisely, this is the theory in which Google makes use of it in a simplified way.

The stochastic process that arises from Markov chains only depends on the present and not on past states, which is why this system is said to have no memory. Since, exclusively, it depends on the current moment and not on the concatenation of past events or states. For example, when a soccer player is about to shoot a penalty. There are different future states, but the past states (where he scored, scored or failed…) do not condition the future states that the system can take.

When this happens, the transition probability is said to be independent of time. In other contexts, it is known as an autonomous dynamic system. Therefore, with discrete time t ≥ 0 where the matrix P of the system is described by column or by row, a linear matrix model is obtained. The matrix P can be graphed using a diagram of states, nodes, and transition probabilities between states, the union of nodes. The evolution of t is obtained by iterating by multiplying the system matrix by the vector (column) of the states.

The matrices used are known as positive matrices. They have special characteristics such as the positive dominant eigenvalue and the ability to predict future states.

--

--

Ricard Santiago Raigada García

Data Scientist & AWS Architect. Skilled in data mining, ML, and cloud solutions. Loves teamwork and innovative challenges. Open to global opportunities.