The Gambler’s Ruin Problem
A unique application of Markov chains
Introduction
Conditional probability theory often leads to unique and interesting solutions to certain mathematical problems. There is a mix of excitement and disbelief when a complex probabilistic problem can be solved in a relatively simply manner. This is certainly the case with the Gambler’s Ruin Problem, a famous statistical scenario centered around conditional probabilities and experimental outcomes. What is possibly even more fascinating about this problem is that its structure extends beyond conditional probability into random variables and distributions, particularly as an application of unique Markov chains with interesting properties.
Elementary probability problems are based on mathematical descriptions of uncertain outcomes. Elementary statistics gives us a set of tools that we can use to determine probabilities of uncertain outcomes using theorems as well as demystifying complex scenarios. Solving the Gambler’s Ruin Problem is not a task that we must do on a daily basis, but understanding its structure helps us develop a critical and mathematical mindset when approaching situations rooted in uncertainty.
Note: This post requires that the reader understands elementary probability and statistical theory, which itself requires exposure to both calculus and linear algebra. It is also assumed that the reader has some basic knowledge on Markov chains.
Problem Statement
The Gambler’s Ruin Problem in its most basic form consists of two gamblers A and B who are playing a probabilistic game multiple times against each other. Every time the game is played, there is a probability p (0 < p < 1) that gambler A will win against gambler B. Likewise, using basic probability axioms, the probability that gambler B will win is 1 – p. Each gambler also has an initial wealth that limits how much they can bet. The total combined wealth is denoted by k and gambler A has an initial wealth denoted by i, which implies that gambler B has an initial wealth of k – i. Wealth is required to be positive. The last condition we apply to this problem is that both gamblers will play indefinitely until one of them has lost all their initial wealth and thus cannot play anymore.
Imagine that gambler A’s initial wealth i is an integer dollar amount and that each game is played for one dollar. That means that gambler A will have to play at least i games for their wealth to drop to zero. The probability that they win one dollar in each game is p, which will be equal to 1/2 if the game is fair for both gamblers. If p > 1/2, then gambler A has a systematic advantage and if p < 1/2 then gambler A has a systematic disadvantage. The series of games can only end in two outcomes: gambler A has a wealth of k dollars (gambler B lost all their money), or gambler A has a wealth of 0 dollars (gambler B has all the wealth). The main focus of the analysis is to determine the probability that gambler A will end up with a wealth of k dollars instead of 0 dollars. Regardless of the outcome, one of the gamblers will end up in financial ruin, hence the name Gambler’s Ruin.
Problem Solution
Continuing with the same structure outlined above, we now want to determine the probability aᵢ that gambler A will end up with k dollars given that they started out with i dollars. For the sake of simplicity, we will add an additional assumption here: all games are identical and independent. Every time the gamblers play a new game, it can be interpreted as a new iteration of the Gambler’s Ruin problem where the initial wealth of each gambler are different, depending on the outcome of the most recent game. Mathematically, we can think of each sequence of games that ends in gambler A having j dollars where j = 0, … , k. The probability gambler A wins given that a specific sequence occurs is aⱼ. Any sequence that ends in gambler A having k dollars means that they won, so aₖ=1. Likewise, any sequence than ends in gambler A having 0 dollars means that they are in ruin so a₀=0. We are interested in determining the probabilities for all values of i=1, … , k – 1.
Using event notation, we can denote A₁ as the event that gambler A wins game 1. Similarly, B₁ is the event that gambler B wins game 1. The event W occurs when gambler A ends up with k dollars before they end up with 0 dollars. The probability that this event occurs can be derived using properties of conditional probability.
Given that gambler A starts with i dollars, the probability that they win is P(W)=aᵢ. If gambler A wins one dollar in the first game, then their wealth becomes i+1. If they lost one dollar in the first game, their fortune becomes i – 1. The probability that they win the entire sequence will then depend on whether they won they won the first game. When we apply this logic to the previous equation, we know have an expression of the probability of winning the entire sequence of games that depends on the probability of winning one dollar each game and the conditional probabilities of winning the sequence given the gamblers wealth.
The wealth of gambler A at any given point in time varies between the total initial wealth of both gamblers and zero. In other words, given i=1, … , k – 1, we can plug in all possible values of i into the equation above to obtain k – 1 equations that determine the probability of winning based on adjacent values of i. We can use elementary algebra to aggregate these equations into a standardized format that can be simplified into a single formula. This formula specifies a fundamental relation between the probability of winning each game p, the total initial wealth of both gamblers k, and the probability of winning given a wealth of one dollar a₁. Once we have determined a₁, we can iteratively loop through all k – 1 to derive the probability aᵢ for all possible values of i.
We will now consider two possibilities: a fair game and an unfair game, which depends on the value of p that we plug into the equation above. In a fair game, where p=1/2, the base of the exponent in the right side of the equation can be simplified to (1 – p)/p=1. We can then simplify the entire equation as follows: 1 – a₁=(k – 1)a₁, which can be rearranged to a₁=1/k. If we iterate over all previous equations that determine the probabilities of winning for different values of i, we arrive at a general solution for fair games.
The equation above is a remarkable result: given that the game is fair, the probability that gambler A will end up with k dollars before they end up with zero dollars is equal to their initial wealth i divided by the total wealth of both gamblers k.
If p is not equal to 1/2 the game is unfair, as one of the two gamblers has a systematic advantage. We can similarly derive a general solution which depends on the value of p and the wealth parameters.
Note: If the reader is interested in learning the mathematical proofs to arrive at the general solutions, consult the references at the end.
Markov Chains
A sequence of random variables X that represent different points in time defined by discrete time intervals can be referred to as a stochastic process. The first random variable in the process is known as the initial state, and the rest of the random variables in the process define the state of each process at time n. Markov chains are a specific kind of stochastic process where conditional distributions of future states depend only on the present state. That is, the conditional distribution of X at time n + j for j > 0 depends only on the state of the process at time n, not on any of the earlier states up to n – 1. We can express this mathematically using the following notation.
A Markov chain is finite if the number of states that can occur at any point in time is noninfinite. Consider a chain with k states. The probability distribution of a state at time n+1 taking a specific value j given the state at time n is known as the transition distribution of the Markov chain. These distributions are considered stationary if the distribution is the same for each time n. We can use the notation pᵢⱼ to denote these distributions.
The total number of different values that pᵢⱼ can take depends on the total number of possible states k, which we can arrange in a k by k matrix P known as the transition matrix of the Markov chain.
Transition matrices have important properties that make them unique. Since every element of the matrix represents a probability, all elements are positive. Since each row represents the entire conditional distribution of the next state given the value of the previous state, the values in each row sum to 1. We can extend the probability calculations to multiple steps in a straightforward way by using the transition matrix. That is, we can calculate the probability that the Markov chain will move from a state i to a state j in multiple steps m by taking the transition matrix P to the power of m. That is, the m-step transition matrix Pᵐ can be used to calculate the probabilities of specific states between any range of steps in the chain.
If we have a transition matrix where any of the diagonals pᵢᵢ are equal to 1, then the state i is considered an absorbing state. Once a Markov chain goes into an absorbing state, it cannot go into any other state thereafter. Another interesting property is that if a Markov chain has one or more absorbing states, the chain will be absorbed into one of those states eventually.
At the beginning of the Markov chain at time 1, we can specify a vector whose elements represent the probabilities that the chain will be in each state. This is known as the initial probability vector v. We can determine the marginal distribution of the chain at any time n by multiplying the initial probability vector v by the transition matrix P to the power of n – 1. For example, we can find the marginal distribution of the chain at time 2 by the expression vP.
A special case occurs when a probability vector multiplied by the transition matrix is equal to itself: vP=v. When this occurs, we call the probability vector the stationary distribution for the Markov chain.
Gambler’s Ruin Markov Chains
Using the theoretical framework for Markov chains, we can now apply these concepts to the Gambler’s Ruin problem. The fact that we are able to do this shows how intertwined probability and statistical theory can be. We initially framed the problem using exclusively conditional probability theorems derived from the basic probability axioms. Now we can formalize the problem even further and give it more structure using Markov chains. The process of taking an abstract concept and using various tools to analyze it and expand it underlines the magic of statistics.
The Gambler’s Ruin problem is essentially a Markov chain where the sequence of wealth amounts that gambler A has at any point in time determines the underlying structure. That is, at any point in time n, gambler A can have i wealth, where i also represents the state of the chain at time n. When the gambler reaches either 0 wealth or k wealth, the chain has moved into an absorbing state and no more games are played.
If the chain moves into any of the remaining k – 1 states 1, … , k – 1, a game is played again where the probability that gambler A will win p determines the marginal distribution of the state at that specific point in time. The transition matrix has two absorbing states. The first row corresponds to the scenario when gambler A has 0 wealth and the elements of the row are (1, 0, … , 0). Likewise, the last row of the transition matrix corresponds to the scenario when gambler A has reached k wealth and the elements are (0, … , 1). For every other row i, the elements are all zero expect for the terms with coordinates i – 1 and i + 1, which have values 1 – p and p respectively.
We can also determine the probabilities of states in multiple steps using the m-step transition matrix Pᵐ. As m goes to infinity, the m-step matrix converges but the stationary distributions are not unique. The limit matrix of Pᵐ has all zeroes except in its first and last columns. The last column contains all probabilities aᵢ that gambler A will end up with k dollars given that they started out with i dollars, while the first column contains all the respective complements of aᵢ. Since the stationary distributions are not unique, that means that all of the probabilities belong to the absorbing states.
This last point is especially important because it confirms our initial logical sequence of steps when deriving the solution formulas to the Gambler’s Ruin. In other words, we are able to derive a general formula for the problem as a whole because the stochastic processes (sequence of games) that occur in the problem converge to one of two absorbing states: gambler A walks away with k dollars or gambler A walks away with 0 dollars. Even though infinitely many games can be played, we can still determine the probability that either one of these two events occur because the transition matrix of the Markov chain converges to both stationary distributions.
Conclusion
The Gambler’s Ruin Problem is a great example of how you can take a complex situation and derive a simple general structure from it using statistical tools. It might be difficult to believe that, given a fair game, the probability that someone will win enough games to claim the total wealth of both players is determined by their initial wealth and the total wealth. This is known not only at the beginning of the sequence, but also at each step. Using Markov chains, we can determine the same probabilities between any sequences of games using the transition matrix and the probability vector at the initial state. Consider this, the conclusion that we came to in the first section of this post was enhanced by use of an additional concept. Applying different perspectives to the same problem can open the door to insightful analysis. This is the power of theoretical statistical thinking.
References
[1] M. H. Degroot and M. J. Schervish, Probability and Statistics (2018), Pearson