Jim Killingsworth

Gen­er­al­iz­ing the Coin Toss Markov Mod­el

This is a con­tin­u­a­tion of a se­ries of posts on weight­ed coin toss games. In pre­vi­ous post­s, we ex­plored vari­a­tions of the weight­ed coin toss game us­ing two, three, and four flips per round. In each vari­a­tion, the game was de­scribed us­ing a Markov mod­el with a fixed num­ber of coin toss events. This post presents a gen­er­al­ized form of the Markov mod­el that can be used to mod­el a game with an ar­bi­trary num­ber of coin toss events. I al­so show a few ex­am­ples us­ing a mod­el of the coin toss game with ten flips per round.

Markov Mod­el with 2 Coin Toss­es

Let’s start with a very sim­ple mod­el of the coin toss game that us­es on­ly two flips of the coin per round. This is the mod­el used in the pre­vi­ous post ti­tled Vi­su­al­iz­ing Sad­dle Points and Min­i­mums. Here is what the Markov mod­el looks like:

Figure 1

In this mod­el, there are five pos­si­ble states that the sys­tem can be in and six pos­si­ble state tran­si­tion­s. Each ar­row rep­re­sents a state tran­si­tion. The state di­a­gram above can be rep­re­sent­ed us­ing a state tran­si­tion ma­trix:

Figure 2

This tran­si­tion ma­trix de­ter­mines the prob­a­bil­i­ty of mov­ing from one state to the nex­t. This is a square ma­trix with a row and a col­umn for each state. The rows rep­re­sent the start­ing states, and the columns rep­re­sent the sub­se­quent states. We can al­so use a vec­tor with five el­e­ments—one for each state—to rep­re­sent the prob­a­bil­i­ty of be­ing in each one of the states at a par­tic­u­lar point in time. Since we al­ways start in the ze­ro state, the ini­tial vec­tor looks like this:

Figure 3

We can com­pute the prob­a­bil­i­ty of be­ing in each one of the five states af­ter the first coin toss by tak­ing the prod­uct of the state vec­tor and the state tran­si­tion ma­trix:

Figure 4

Af­ter the first coin toss, there are two pos­si­ble states that the sys­tem can be in. The prod­uct above works out to the fol­low­ing:

Figure 5

Since there are two flips of the coin per round, we can com­pute the fi­nal out­come dis­tri­b­u­tion by mul­ti­ply­ing the vec­tor above by the tran­si­tion ma­trix one more time:

Figure 6

Af­ter the sec­ond coin toss, there are three pos­si­ble states the sys­tem can be in. The prod­uct above works out to the fol­low­ing:

Figure 7

As you can see, for the fi­nal out­come, the sys­tem can on­ly be in one of three out of the five pos­si­ble states af­ter the sec­ond coin toss. The oth­er two states on­ly serve as in­ter­me­di­ate states that the sys­tem tran­si­tions through. Note that there is a pos­si­bil­i­ty that the sys­tem re­turns to the ini­tial state af­ter the sec­ond coin toss.

Markov Mod­el with 3 Coin Toss­es

A slight­ly more com­pli­cat­ed mod­el is that of a coin toss game with three flips of the coin per round. This is the mod­el used pre­vi­ous­ly in the post ti­tled Vi­su­al­iz­ing the Climb up the Hill. Here is what the Markov mod­el looks like:

Figure 8

In this mod­el, there are sev­en pos­si­ble states that the sys­tem can be in and a to­tal of ten pos­si­ble state tran­si­tion­s. The state di­a­gram il­lus­trat­ed above can be rep­re­sent­ed with the fol­low­ing state tran­si­tion ma­trix:

Figure 9

This is a square ma­trix with sev­en rows and sev­en column­s. We can al­so use a sev­en-ele­ment vec­tor to rep­re­sent the like­li­hood of the sys­tem be­ing in a par­tic­u­lar state at a giv­en point in time. The ini­tial vec­tor looks like this:

Figure 10

We can mul­ti­ply this vec­tor by the tran­si­tion ma­trix three times to de­ter­mine where we might find the state of the sys­tem af­ter three flips of the coin:

Figure 11

Af­ter the third coin toss, there are four pos­si­ble states the sys­tem can be in. The prod­uct above works out to the fol­low­ing:

Figure 12

The sys­tem can be in one of four pos­si­ble states for the fi­nal out­come. The oth­er three states are in­ter­me­di­ate states that the sys­tem tran­si­tions through. No­tice that the ini­tial state is not one of the fi­nal states since there is an odd num­ber of coin toss­es in this case.

Gen­er­al­ized Markov Mod­el

In ad­di­tion to the two Markov mod­els out­lined above, you can al­so find a de­tailed dis­cus­sion of a mod­el of the coin toss game with four flips per round in one of my ear­li­er posts ti­tled Es­ti­mat­ing the Weights of Bi­ased Coins. All of these mod­els can be de­scribed by a gen­er­al­ized mod­el. Sup­pose we have a coin toss game with an ar­bi­trary num­ber of flips per round. We can rep­re­sent the gen­er­al­ized form of the Markov mod­el with a state di­a­gram that looks like this:

Figure 13

The to­tal num­ber of states the sys­tem can be in de­pends on the num­ber of coin toss events. We can de­ter­mine the num­ber of states us­ing the fol­low­ing equa­tion:

Figure 14

The num­ber of states the sys­tem can be in de­ter­mines the size of the state tran­si­tion ma­trix. Since this can be an ar­bi­trar­i­ly large ma­trix, it is more prac­ti­cal to de­scribe the con­tents of the ma­trix us­ing an al­go­rith­m:

Figure 15

This is a square ma­trix with a nonze­ro val­ue for every pos­si­ble state tran­si­tion in the Markov mod­el. We al­so need to de­fine an ini­tial state vec­tor. Since there is on­ly one ini­tial state, this vec­tor can be de­scribed with a very sim­ple al­go­rith­m:

Figure 16

Once we have a state tran­si­tion ma­trix and the ini­tial state vec­tor, we can com­pute the fi­nal out­come us­ing the fol­low­ing for­mu­la:

Figure 17

The fi­nal out­come tells us how like­ly it is for each state to be the fi­nal state of the sys­tem af­ter a sin­gle round of the coin toss game. We can al­so rep­re­sent the fi­nal out­come like this:

Figure 18

Each el­e­ment con­tains the prob­a­bil­i­ty that the sys­tem ter­mi­nates in the cor­re­spond­ing state af­ter the fi­nal coin toss. Since our mod­el is sym­met­ric about the ini­tial state, there is a sym­me­try to the re­sult­ing val­ues in the fi­nal out­come.

Equal­i­ty Con­straints

Giv­en the mod­el of the coin toss game de­scribed above, sup­pose we know the val­ues of the fi­nal out­come but not the val­ues of the weights of the bi­ased coin­s. Start­ing with the fi­nal out­come—some­times re­ferred to as the tar­get dis­tri­b­u­tion—we can find a valid set of weights us­ing the method of La­grange mul­ti­pli­ers de­scribed in Equal­i­ty Con­straints and La­grange Mul­ti­pli­ers. To use this method, we need to come up with a set of equal­i­ty con­straints based on the mod­el. Let’s start with some equal­i­ty con­di­tions that must hold true:

Figure 19

The left­-hand side of these equa­tions rep­re­sents the val­ue of the known tar­get dis­tri­b­u­tion for the cor­re­spond­ing state. The right-hand side rep­re­sents the com­put­ed re­sult based on the val­ues of the weights of the bi­ased coin­s. These equal­i­ty con­di­tions are true if we have a valid set of weight­s. No­tice al­so the sym­me­try for states above and be­low the ini­tial state. We can leave out the du­pli­cate con­di­tions be­cause they are re­dun­dan­t. We can al­so elim­i­nate states that we know are nev­er ter­mi­nal states. Con­sid­er the fol­low­ing:

Figure 20

These two sets con­tain even and odd num­ber­s, re­spec­tive­ly. We can se­lect one or the oth­er based on whether the num­ber of coin toss events per round is even or odd:

Figure 21

Us­ing the se­lect­ed set, we can de­ter­mine which states are nev­er ter­mi­nal states. Non-ter­mi­nal states have a prob­a­bil­i­ty of ze­ro in the fi­nal out­come. We know the fol­low­ing holds true, no mat­ter what weights are used for the bi­ased coin­s:

Figure 22

We can elim­i­nate non-ter­mi­nal states from con­sid­er­a­tion be­cause they have no bear­ing on the equal­i­ty con­straints need­ed for the method of La­grange mul­ti­pli­er­s. The to­tal num­ber of equal­i­ty con­straints then is giv­en by the fol­low­ing:

Figure 23

The num­ber of equal­i­ty con­straints is a func­tion of the to­tal num­ber of coin toss events. We can es­tab­lish a set of equal­i­ty con­straints like this:

Figure 24

Each con­straint func­tion must be equal to ze­ro. The func­tions we want to use here are func­tions that take the dif­fer­ence be­tween the tar­get val­ues and the com­put­ed val­ues based on a giv­en set of weight­s:

Figure 25

Us­ing these equal­i­ty con­straints, we can con­struct a La­grangian func­tion that can be used to find a valid set of weight­s:

Figure 26

We can use this La­grangian func­tion to find a valid set of weights by ap­ply­ing the op­ti­miza­tion and root-find­ing meth­ods out­lined in pre­vi­ous post­s.

Ex­am­ple with Ex­po­nen­tial Dis­tri­b­u­tion

Now let’s take a look at an ex­am­ple with ten coin toss events per round. Sup­pose we start with the fol­low­ing tar­get dis­tri­b­u­tion:

Figure 27

We want to find a valid set of weights that yields this dis­tri­b­u­tion in the fi­nal out­come. We’ll start with the fol­low­ing ini­tial guess and it­er­a­tive­ly work to­wards a so­lu­tion:

Figure 28

We can use the mul­ti­vari­ate form of New­ton’s method to find the weights for which the gra­di­ent of the La­grangian func­tion is equal to ze­ro. This method is de­scribed in de­tail in the post ti­tled Find­ing the Roots with New­ton’s Method. Here we ap­ply the fol­low­ing it­er­a­tive for­mu­la:

Figure 29

Note the use of the al­ter­na­tive no­ta­tion scheme above for the gra­di­ent of the La­grangian func­tion. The La­grangian func­tion we use in this ex­am­ple must have a con­crete de­f­i­n­i­tion of the scor­ing func­tion de­fined. In this ex­am­ple, we use two dif­fer­ent scor­ing func­tion­s, de­fined be­low.

Here is the de­f­i­n­i­tion of scor­ing func­tion A:

Figure 30

Here is the de­f­i­n­i­tion of scor­ing func­tion B:

Figure 31

Here is the so­lu­tion found us­ing scor­ing func­tion A:

Figure 32

Here is the so­lu­tion found us­ing scor­ing func­tion B:

Figure 33

Here are the num­ber of it­er­a­tions re­quired for each scor­ing func­tion:

Figure 34

In both cas­es, the ini­tial guess is not too far from the op­ti­mal so­lu­tion. As with pre­vi­ous ex­am­ples us­ing New­ton’s method, the sys­tem con­verges in very few it­er­a­tions. Both so­lu­tions are very sim­i­lar, tak­ing on a sort of zigzag shape.

Ex­am­ple with Tri­an­gu­lar Dis­tri­b­u­tion

Let’s take a look at an­oth­er ex­am­ple with ten coin toss events per round. Sup­pose we start with the fol­low­ing tar­get dis­tri­b­u­tion:

Figure 35

We want to find a valid set of weights that yields this dis­tri­b­u­tion in the fi­nal out­come. We’ll start with the fol­low­ing ini­tial guess and it­er­a­tive­ly work to­wards a so­lu­tion:

Figure 36

To find a valid set of weight­s, we’ll use the mul­ti­vari­ate form of New­ton’s method like we did in the last ex­am­ple. But this time, we’ll in­clude a damp­ing fac­tor to slow down the con­ver­gence. Here is the it­er­a­tive for­mu­la:

Figure 37

We’ll use a damp­ing fac­tor of ten per­cent:

Figure 38

The damp­ing fac­tor slows down the con­ver­gence and pre­vents the method from over­shoot­ing. In this par­tic­u­lar ex­am­ple, the method fails the con­verge with­out slow­ing down the it­er­a­tive process. The use of the damp­ing fac­tor would not be nec­es­sary if we start­ed with an ini­tial guess clos­er to the fi­nal so­lu­tion.

Here is the so­lu­tion found us­ing scor­ing func­tion A:

Figure 39

Here is the so­lu­tion found us­ing scor­ing func­tion B:

Figure 40

Here are the num­ber of it­er­a­tions re­quired for each scor­ing func­tion:

Figure 41

The damped ver­sion of New­ton’s method re­quires many more it­er­a­tions to con­verge than it does with the un­damped ver­sion used in the pre­vi­ous ex­am­ple. As with the pre­vi­ous ex­am­ple, the two so­lu­tions are very sim­i­lar to one an­oth­er, this time tak­ing on a slant­ed shape.

Short­com­ings

The meth­ods used here al­low us to find a valid set of weights for a giv­en tar­get dis­tri­b­u­tion us­ing a mod­el of the coin toss game that al­lows for an ar­bi­trary num­ber of coin toss events. We used the method of La­grange mul­ti­pli­ers to set up an equa­tion, and we used New­ton’s method to solve the equa­tion. But these meth­ods are not with­out their short­com­ings. As we saw in the pre­vi­ous sec­tion, we had to adapt the it­er­a­tive for­mu­la with a damp­ing fac­tor to get the it­er­a­tive process to con­verge to a so­lu­tion.

Be­sides the over­shoot prob­lem, there are cas­es where these meth­ods might con­verge to a so­lu­tion out­side the ac­cept­able range of val­ues. The weights of the bi­ased coins must al­ways be a prob­a­bil­i­ty be­tween ze­ro and one. There is noth­ing in the method of La­grange mul­ti­pli­ers that lim­its val­ues to a par­tic­u­lar range. I may have to ex­plore an ap­proach us­ing Karush–Kuh­n–Tuck­er con­di­tions to in­clude in­equal­i­ty con­straints.

An­oth­er prob­lem is run­time per­for­mance. With the im­ple­men­ta­tion used in this post, find­ing the so­lu­tion for a mod­el with ten coin toss events per round takes a cou­ple of sec­onds to ex­e­cute on my cur­rent hard­ware. A mod­el with one ad­di­tion­al coin toss event per round takes about twice the amount of time to ex­e­cute. This im­ple­men­ta­tion seems to have an ex­po­nen­tial time com­plex­i­ty. There is no par­al­lelis­m, and I have made no at­tempt at per­for­mance tun­ing. But I do think there is room for im­prove­men­t. There might al­so be oth­er ap­proach­es that of­fer a good enough so­lu­tion. Ide­al­ly, I would like to be able to solve for mod­els with twen­ty or even fifty flips per round.

Ac­com­pa­ny­ing source code is avail­able on GitHub.

Com­ments

Show comments