Markov Chains: Modelling Systems That Have No Memory of the Past

Andrei Markov's Chain

Andrei Markov was motivated by a very specific mathematical question: could the law of large numbers (which requires independent random variables) be extended to sequences of dependent variables? He proved that it could — under the condition that the dependence in the sequence was "Markovian": each outcome depended on the immediately preceding one, not on the full history. This memoryless property, now called the Markov property, turned out to be exactly the right balance of dependence to make mathematical analysis tractable while still modelling a wide class of real phenomena.

Defining a Markov Chain

A discrete-time Markov chain is a sequence of random variables X₀, X₁, X₂, ... taking values in a finite or countable state space, satisfying the Markov property: the conditional distribution of X_{n+1} given all previous states depends only on X_n. This is the "memoryless" property — the future depends on the present, not the past.

Transition Matrices and Long-Run Behaviour

The long-run behaviour of an ergodic (irreducible and aperiodic) Markov chain is governed by its stationary distribution π — a probability distribution over states that satisfies π = πP. The stationary distribution tells you the long-run fraction of time the chain spends in each state, regardless of the starting state.

PageRank: The Billion-Dollar Markov Chain

Google's original PageRank algorithm — the innovation that made Google's search results dramatically better than competitors in 1998 — is a Markov chain. Imagine a "random surfer" who starts at a random web page and repeatedly follows a random link on the current page (or with probability 0.15, teleports to a completely random page). The stationary distribution of this Markov chain gives each page a score equal to the long-run fraction of time the random surfer would spend there. Pages that many important pages link to have high stationary probability — high PageRank. Sergey Brin and Larry Page computed this stationary distribution for the entire web.

Queuing Theory and Service Systems

The M/M/1 queue — the simplest queuing model — is a continuous-time Markov chain. Customers arrive according to a Poisson process (rate λ), service times are exponential (rate μ), and there is one server. The state is the number of customers in the system. The stationary distribution gives the probability of n customers in the queue:

MCMC: Turning Markov Chains Into a Computational Tool

Markov Chain Monte Carlo methods construct a Markov chain whose stationary distribution is a target distribution of interest — typically a Bayesian posterior that cannot be computed analytically. By running the chain for long enough, we collect samples from the target distribution, which can be used to compute expectations, credible intervals, and predictions. This is why Bayesian inference became computationally feasible in the 1990s: MCMC made it possible to work with complex posterior distributions that had no analytical form.