Generalizing the Coin Toss Markov Model

This is a continuation of a series of posts on weighted coin toss games. In previous posts, we explored variations of the weighted coin toss game using two, three, and four flips per round. In each variation, the game was described using a Markov model with a fixed number of coin toss events. This post presents a generalized form of the Markov model that can be used to model a game with an arbitrary number of coin toss events. I also show a few examples using a model of the coin toss game with ten flips per round.

Markov Model with 2 Coin Tosses

Let’s start with a very simple model of the coin toss game that uses only two flips of the coin per round. This is the model used in the previous post titled Visualizing Saddle Points and Minimums. Here is what the Markov model looks like:

$Figure 1$

In this model, there are five possible states that the system can be in and six possible state transitions. Each arrow represents a state transition. The state diagram above can be represented using a state transition matrix:

$Figure 2$

This transition matrix determines the probability of moving from one state to the next. This is a square matrix with a row and a column for each state. The rows represent the starting states, and the columns represent the subsequent states. We can also use a vector with five elements—one for each state—to represent the probability of being in each one of the states at a particular point in time. Since we always start in the zero state, the initial vector looks like this:

$Figure 3$

We can compute the probability of being in each one of the five states after the first coin toss by taking the product of the state vector and the state transition matrix:

$Figure 4$

After the first coin toss, there are two possible states that the system can be in. The product above works out to the following:

$Figure 5$

Since there are two flips of the coin per round, we can compute the final outcome distribution by multiplying the vector above by the transition matrix one more time:

$Figure 6$

After the second coin toss, there are three possible states the system can be in. The product above works out to the following:

$Figure 7$

As you can see, for the final outcome, the system can only be in one of three out of the five possible states after the second coin toss. The other two states only serve as intermediate states that the system transitions through. Note that there is a possibility that the system returns to the initial state after the second coin toss.

Markov Model with 3 Coin Tosses

A slightly more complicated model is that of a coin toss game with three flips of the coin per round. This is the model used previously in the post titled Visualizing the Climb up the Hill. Here is what the Markov model looks like:

$Figure 8$

In this model, there are seven possible states that the system can be in and a total of ten possible state transitions. The state diagram illustrated above can be represented with the following state transition matrix:

$Figure 9$

This is a square matrix with seven rows and seven columns. We can also use a seven-element vector to represent the likelihood of the system being in a particular state at a given point in time. The initial vector looks like this:

$Figure 10$

We can multiply this vector by the transition matrix three times to determine where we might find the state of the system after three flips of the coin:

$Figure 11$

After the third coin toss, there are four possible states the system can be in. The product above works out to the following:

$Figure 12$

The system can be in one of four possible states for the final outcome. The other three states are intermediate states that the system transitions through. Notice that the initial state is not one of the final states since there is an odd number of coin tosses in this case.

Generalized Markov Model

In addition to the two Markov models outlined above, you can also find a detailed discussion of a model of the coin toss game with four flips per round in one of my earlier posts titled Estimating the Weights of Biased Coins. All of these models can be described by a generalized model. Suppose we have a coin toss game with an arbitrary number of flips per round. We can represent the generalized form of the Markov model with a state diagram that looks like this:

$Figure 13$

The total number of states the system can be in depends on the number of coin toss events. We can determine the number of states using the following equation:

$Figure 14$

The number of states the system can be in determines the size of the state transition matrix. Since this can be an arbitrarily large matrix, it is more practical to describe the contents of the matrix using an algorithm:

$Figure 15$

This is a square matrix with a nonzero value for every possible state transition in the Markov model. We also need to define an initial state vector. Since there is only one initial state, this vector can be described with a very simple algorithm:

$Figure 16$

Once we have a state transition matrix and the initial state vector, we can compute the final outcome using the following formula:

$Figure 17$

The final outcome tells us how likely it is for each state to be the final state of the system after a single round of the coin toss game. We can also represent the final outcome like this:

$Figure 18$

Each element contains the probability that the system terminates in the corresponding state after the final coin toss. Since our model is symmetric about the initial state, there is a symmetry to the resulting values in the final outcome.

Equality Constraints

Given the model of the coin toss game described above, suppose we know the values of the final outcome but not the values of the weights of the biased coins. Starting with the final outcome—sometimes referred to as the target distribution—we can find a valid set of weights using the method of Lagrange multipliers described in Equality Constraints and Lagrange Multipliers. To use this method, we need to come up with a set of equality constraints based on the model. Let’s start with some equality conditions that must hold true:

$Figure 19$

The left-hand side of these equations represents the value of the known target distribution for the corresponding state. The right-hand side represents the computed result based on the values of the weights of the biased coins. These equality conditions are true if we have a valid set of weights. Notice also the symmetry for states above and below the initial state. We can leave out the duplicate conditions because they are redundant. We can also eliminate states that we know are never terminal states. Consider the following:

$Figure 20$

These two sets contain even and odd numbers, respectively. We can select one or the other based on whether the number of coin toss events per round is even or odd:

$Figure 21$

Using the selected set, we can determine which states are never terminal states. Non-terminal states have a probability of zero in the final outcome. We know the following holds true, no matter what weights are used for the biased coins:

$Figure 22$

We can eliminate non-terminal states from consideration because they have no bearing on the equality constraints needed for the method of Lagrange multipliers. The total number of equality constraints then is given by the following:

$Figure 23$

The number of equality constraints is a function of the total number of coin toss events. We can establish a set of equality constraints like this:

$Figure 24$

Each constraint function must be equal to zero. The functions we want to use here are functions that take the difference between the target values and the computed values based on a given set of weights:

$Figure 25$

Using these equality constraints, we can construct a Lagrangian function that can be used to find a valid set of weights:

$Figure 26$

We can use this Lagrangian function to find a valid set of weights by applying the optimization and root-finding methods outlined in previous posts.

Example with Exponential Distribution

Now let’s take a look at an example with ten coin toss events per round. Suppose we start with the following target distribution:

We want to find a valid set of weights that yields this distribution in the final outcome. We’ll start with the following initial guess and iteratively work towards a solution:

We can use the multivariate form of Newton’s method to find the weights for which the gradient of the Lagrangian function is equal to zero. This method is described in detail in the post titled Finding the Roots with Newton’s Method. Here we apply the following iterative formula:

$Figure 29$

Note the use of the alternative notation scheme above for the gradient of the Lagrangian function. The Lagrangian function we use in this example must have a concrete definition of the scoring function defined. In this example, we use two different scoring functions, defined below.

Here is the definition of scoring function A:

$Figure 30$

Here is the definition of scoring function B:

$Figure 31$

Here is the solution found using scoring function A:

Here is the solution found using scoring function B:

Here are the number of iterations required for each scoring function:

$Figure 34$

In both cases, the initial guess is not too far from the optimal solution. As with previous examples using Newton’s method, the system converges in very few iterations. Both solutions are very similar, taking on a sort of zigzag shape.

Example with Triangular Distribution

Let’s take a look at another example with ten coin toss events per round. Suppose we start with the following target distribution:

We want to find a valid set of weights that yields this distribution in the final outcome. We’ll start with the following initial guess and iteratively work towards a solution:

To find a valid set of weights, we’ll use the multivariate form of Newton’s method like we did in the last example. But this time, we’ll include a damping factor to slow down the convergence. Here is the iterative formula:

$Figure 37$

We’ll use a damping factor of ten percent:

$Figure 38$

The damping factor slows down the convergence and prevents the method from overshooting. In this particular example, the method fails the converge without slowing down the iterative process. The use of the damping factor would not be necessary if we started with an initial guess closer to the final solution.

Here is the solution found using scoring function A:

Here is the solution found using scoring function B:

Here are the number of iterations required for each scoring function:

$Figure 41$

The damped version of Newton’s method requires many more iterations to converge than it does with the undamped version used in the previous example. As with the previous example, the two solutions are very similar to one another, this time taking on a slanted shape.

Shortcomings

The methods used here allow us to find a valid set of weights for a given target distribution using a model of the coin toss game that allows for an arbitrary number of coin toss events. We used the method of Lagrange multipliers to set up an equation, and we used Newton’s method to solve the equation. But these methods are not without their shortcomings. As we saw in the previous section, we had to adapt the iterative formula with a damping factor to get the iterative process to converge to a solution.

Besides the overshoot problem, there are cases where these methods might converge to a solution outside the acceptable range of values. The weights of the biased coins must always be a probability between zero and one. There is nothing in the method of Lagrange multipliers that limits values to a particular range. I may have to explore an approach using Karush–Kuhn–Tucker conditions to include inequality constraints.

Another problem is runtime performance. With the implementation used in this post, finding the solution for a model with ten coin toss events per round takes a couple of seconds to execute on my current hardware. A model with one additional coin toss event per round takes about twice the amount of time to execute. This implementation seems to have an exponential time complexity. There is no parallelism, and I have made no attempt at performance tuning. But I do think there is room for improvement. There might also be other approaches that offer a good enough solution. Ideally, I would like to be able to solve for models with twenty or even fifty flips per round.

Accompanying source code is available on GitHub.

Markov Mod­el with 2 Coin Toss­es

Markov Mod­el with 3 Coin Toss­es

Gen­er­al­ized Markov Mod­el

Equal­i­ty Con­straints

Ex­am­ple with Ex­po­nen­tial Dis­tri­b­u­tion

Ex­am­ple with Tri­an­gu­lar Dis­tri­b­u­tion

Short­com­ings