Jim Killingsworth

Vi­su­al­iz­ing the Climb up the Hill

The hill climb­ing al­go­rithm de­scribed in my pre­vi­ous post finds the weights of bi­ased coins for a coin toss game in which the dis­tri­b­u­tion of pos­si­ble out­comes is known. In the ex­am­ple pre­sent­ed, there are many pos­si­ble so­lu­tion­s. A cost func­tion is used to find a valid so­lu­tion, and a scor­ing func­tion is used to nar­row down the set of valid so­lu­tions to a sin­gle re­sult. In this post, I want to look at some vi­su­al­iza­tions to get a bet­ter feel for how the al­go­rithm work­s.

The Mod­el

We’ll use the same mod­el of the coin toss game de­scribed in an ear­li­er post ti­tled Es­ti­mat­ing the Weights of Bi­ased Coins. But to be able to plot the val­ue of the cost func­tion for every com­bi­na­tion of weights on a two-di­men­sion­al il­lus­tra­tion, we need to lim­it the num­ber of coin toss­es in each round so that there are on­ly two dif­fer­ent weights that can vary. Since the first coin toss al­ways us­es a fair­ly weight­ed coin, we can mod­el the coin toss game us­ing three flips per round. The Markov mod­el looks like this:

Figure 1

In this mod­el, every round starts in the ze­ro state. The coin in the ze­ro state is al­ways a fair coin. The mod­el can be rep­re­sent­ed in tab­u­lar form like this:

Figure 2

Since there are three coin toss­es per round, with two pos­si­ble out­comes for each flip, there are a to­tal of eight unique coin toss se­quences pos­si­ble. The fol­low­ing ta­ble lists all pos­si­ble out­comes along with the ter­mi­nal state and prob­a­bil­i­ty for­mu­la for each com­bi­na­tion:

Figure 3

While the mod­el has a to­tal of sev­en states, there are on­ly four pos­si­ble ter­mi­nal states. And since the mod­el is sym­met­ri­cal, the prob­a­bil­i­ty of end­ing on one of the pos­i­tive ter­mi­nal states is equal to the prob­a­bil­i­ty of end­ing on the cor­re­spond­ing neg­a­tive ter­mi­nal state:

Figure 4

Plug­ging in the prob­a­bil­i­ty for­mu­las, the equa­tions above can be rewrit­ten as fol­lows:

Figure 5

Sim­pli­fy­ing, these equa­tions can be re­duced to the fol­low­ing:

Figure 6

Based on the set of equa­tions above, we can state a re­la­tion­ship be­tween the weight of the coin in the +1 state and the weight of the coin in the +2 state:

Figure 7

The bias of each coin must be be­tween a min­i­mum val­ue of ze­ro and a max­i­mum val­ue of one. If we know the ex­pect­ed out­come of the coin toss game, the min­i­mum and max­i­mum val­ues might be fur­ther con­strained. Sup­pose the coin in the +2 state al­ways lands on head­s:

Figure 8

The case above shows the min­i­mum val­ue for the weight of the coin in the +1 state when the coin in the +2 state is at the max­i­mum. The op­po­site ex­treme oc­curs when the coin in the +1 state al­ways lands on head­s:

Figure 9

As demon­strat­ed in my two pre­vi­ous post­s, there is a range of valid val­ues that are pos­si­ble for a giv­en tar­get dis­tri­b­u­tion. There is no unique so­lu­tion with­out in­clud­ing an ad­di­tion­al con­straint such as a scor­ing func­tion.

Cost Func­tion

A cost func­tion as­signs a val­ue to a set of weights based on a tar­get dis­tri­b­u­tion. The val­ue of the cost func­tion de­ter­mines how close an es­ti­mat­ed set of weights is to a valid set of weights for a giv­en prob­a­bil­i­ty mass func­tion. Sup­pose the tar­get dis­tri­b­u­tion looks like this:

Figure 10

The prob­a­bil­i­ty mass func­tion il­lus­trat­ed above is sym­met­ri­cal. The val­ues can al­so be rep­re­sent­ed like this:

Figure 11

Our cost func­tion must com­pare the dif­fer­ence be­tween the tar­get dis­tri­b­u­tion and the ex­pect­ed out­come of a giv­en set of weight­s. In this ex­am­ple, as with the pre­vi­ous post, I want to use the sum of squared er­rors as the cost func­tion:

Figure 12

A val­ue of ze­ro is the most ide­al. A ze­ro val­ue means the giv­en weights will have an ex­pect­ed out­come that match­es the tar­get dis­tri­b­u­tion. A non-ze­ro val­ue in­di­cates an er­ror rel­a­tive to the ide­al. Here is what the cost func­tion looks like on a three­-di­men­sion­al sur­face plot:

Figure 13

This gives a vi­su­al­iza­tion of where the high and low spots are. Since the cost func­tion is used in the con­text of a hill climb­ing al­go­rith­m, it might be more in­tu­itive to dis­play the val­ues on an in­vert­ed sur­face plot be­cause it makes it look more like a hill:

Figure 14

A sur­face plot does a nice job of il­lus­trat­ing the cur­va­ture of the cost func­tion. For over­lay­ing ad­di­tion­al in­for­ma­tion on top of the cost func­tion, a heatmap might be a more use­ful al­ter­na­tive. Here is a vi­su­al­iza­tion of the cost func­tion as a heatmap:

Figure 15

In the chart above, the plateau of valid val­ues is su­per­im­posed on top of the heatmap as a green line. These are in­puts to the cost func­tion that will yield a ze­ro val­ue. As you can see, there is a range of so­lu­tions that will yield an ex­pect­ed out­come that match­es the tar­get dis­tri­b­u­tion.

Steep­est As­cent Hill Climb

Giv­en a tar­get dis­tri­b­u­tion, the steep­est as­cent hill climb­ing al­go­rithm can be used to find a valid set of weights start­ing with an ar­bi­trary ini­tial guess. The steep­est as­cent method al­ways choos­es the most op­ti­mal move for every it­er­a­tion un­til the cost func­tion is min­i­mized. My pre­vi­ous post de­scribes this al­go­rithm in depth. Let’s con­sid­er an ini­tial guess that looks like this:

Figure 16

Start­ing with the ini­tial guess above, the al­go­rithm con­verges on the fol­low­ing set of val­ues:

Figure 17

The al­go­rithm ter­mi­nates af­ter 6,047 it­er­a­tions. The fol­low­ing trace shows the path it takes from start to fin­ish:

Figure 18

Us­ing the same tech­nique, we can run through sev­er­al more ex­am­ples, each with dif­fer­ent start­ing points:

Figure 19
Figure 20
Figure 21

In each ex­am­ple above, the path fol­lows a ver­ti­cal, hor­i­zon­tal, or di­ag­o­nal route to­wards an op­ti­mum set of val­ues. Some­times the route bends in a dog-leg fash­ion part of the way through. The fi­nal set of weights dis­cov­ered by the hill climb­ing al­go­rithm dif­fers based on the start­ing po­si­tion of the ini­tial guess.

Sto­chas­tic Hill Climb

Un­like the steep­est as­cent hill climb­ing al­go­rithm used in the pre­vi­ous sec­tion, a sto­chas­tic hill climb­ing al­go­rithm does not al­ways choose the best move on each it­er­a­tion. In­stead, it ran­dom­ly choos­es from any move that im­proves the cost func­tion. In this ex­am­ple, the ran­dom de­ci­sion is weight­ed based on how much each po­ten­tial move im­proves the cost func­tion. Con­sid­er the fol­low­ing ini­tial guess:

Figure 22

Start­ing with the ini­tial guess above, the al­go­rithm con­verges on the fol­low­ing set of val­ues:

Figure 23

The al­go­rithm ter­mi­nates af­ter 6,070 it­er­a­tions. The fol­low­ing trace shows the path it takes from start to fin­ish:

Figure 24

Us­ing the same tech­nique, we can run through sev­er­al more ex­am­ples, each with dif­fer­ent start­ing points:

Figure 25
Figure 26
Figure 27

Again, the fi­nal set of weights dis­cov­ered by the hill climb­ing al­go­rithm varies based on the start­ing po­si­tion. But the fi­nal re­sults are not the same as those found by the steep­est as­cent al­go­rith­m. In these ex­am­ples, the trace fol­lows a fair­ly di­rect route to the fi­nal set of val­ues. Al­though slight­ly curved, the routes do not make any sharp changes in di­rec­tion.

Scor­ing Func­tion A

A scor­ing func­tion can be used in ad­di­tion to a cost func­tion to con­strain the valid set of weights to a sin­gle op­ti­mum val­ue. One scor­ing func­tion an­a­lyzed in my pre­vi­ous post com­putes the sum of squared dif­fer­ences be­tween the weights and the cen­ter­most val­ue. Here is the func­tion:

Figure 28

Keep in mind we on­ly want to con­sid­er in­puts in which the cost func­tion eval­u­ates to ze­ro. We can use a nu­mer­i­cal method to find the op­ti­mum pa­ra­me­ter val­ues that min­i­mize the scor­ing func­tion when the cost func­tion is ze­ro:

Figure 29

Here is a vi­su­al rep­re­sen­ta­tion of the min­i­miza­tion of the scor­ing func­tion:

Figure 30

And here is the op­ti­mum set of weights for this scor­ing func­tion:

Figure 31

Since the scor­ing func­tion can com­pute a score for all com­bi­na­tions of weight­s, the scor­ing func­tion can be rep­re­sent­ed on a sur­face plot or heatmap in much the same way as the cost func­tion. Here is a vi­su­al­iza­tion of the scor­ing func­tion as a heatmap:

Figure 32

No­tice the op­ti­mum val­ue does not fall on the point with the best ab­solute score. This is be­cause it is con­strained to val­ues where the cost func­tion is ze­ro. Here is the same de­pic­tion over­laid on a heatmap of the cost func­tion:

Figure 33

In the two il­lus­tra­tions above, the in­ten­si­ty of the plateau line varies based on the score. A bet­ter score is de­pict­ed with a thick­er and more bright­ly col­ored line.

Scor­ing Func­tion B

The scor­ing func­tion in the pre­vi­ous sec­tion is not the on­ly one pos­si­ble. An­oth­er scor­ing func­tion an­a­lyzed in my pre­vi­ous post com­putes the sum of squared dif­fer­ences be­tween each weight and the weight of its pre­ced­ing neigh­bor. Here is the func­tion:

Figure 34

Keep in mind we on­ly want to con­sid­er in­puts in which the cost func­tion eval­u­ates to ze­ro. We can use a nu­mer­i­cal method to find the op­ti­mum pa­ra­me­ter val­ues that min­i­mize the scor­ing func­tion when the cost func­tion is ze­ro:

Figure 35

Here is a vi­su­al rep­re­sen­ta­tion of the min­i­miza­tion of the scor­ing func­tion:

Figure 36

And here is the op­ti­mum set of weights for this scor­ing func­tion:

Figure 37

Since the scor­ing func­tion can com­pute a score for all com­bi­na­tions of weight­s, the scor­ing func­tion can be rep­re­sent­ed on a sur­face plot or heatmap in much the same way as the cost func­tion. Here is a vi­su­al­iza­tion of the scor­ing func­tion as a heatmap:

Figure 38

No­tice the op­ti­mum val­ue does not fall on the point with the best ab­solute score. This is be­cause it is con­strained to val­ues where the cost func­tion is ze­ro. Here is the same de­pic­tion over­laid on a heatmap of the cost func­tion:

Figure 39

In the two il­lus­tra­tions above, the in­ten­si­ty of the plateau line varies based on the score. A bet­ter score is de­pict­ed with a thick­er and more bright­ly col­ored line.

Next Steps

Sim­pli­fy­ing the mod­el to three coin toss­es per round makes it pos­si­ble to vi­su­al­ize a cost func­tion, scor­ing func­tion, and path tak­en by a hill climb­ing al­go­rith­m. It al­so makes the mod­el eas­i­er to rea­son about. But I still haven’t come up with a gen­er­al so­lu­tion to find­ing the op­ti­mum set of weights for a giv­en cost func­tion and scor­ing func­tion in a mod­el with an ar­bi­trar­i­ly large num­ber of coin toss­es per round. I still have some ideas I want to ex­plore. I am hop­ing these vi­su­al­iza­tions can help aid fur­ther analy­sis.

Ac­com­pa­ny­ing source code is avail­able on GitHub.

Com­ments

Show comments