Jim Killingsworth

Ap­prox­i­ma­tions with Poly­no­mi­als

The pre­vi­ous post demon­strates the use of bi­as­es de­rived from a sim­ple line for­mu­la to find an ap­prox­i­mate so­lu­tion to the weight­ed coin toss prob­lem. In this post, I want to ex­pand on some of these ideas us­ing var­i­ous poly­no­mi­al for­mu­las to de­scribe the weights of the bi­ased coin­s. As this ex­per­i­ment demon­strates, high­er or­der poly­no­mi­als do seem to yield bet­ter re­sult­s.

Lin­ear Poly­no­mi­al

For the first ex­am­ple, let’s start with a lin­ear poly­no­mi­al. This is sim­i­lar to the two-pa­ra­me­ter for­mu­la used in the pre­vi­ous post, but with a larg­er range of pos­si­ble slopes. The poly­no­mi­al for­mu­la looks like this:

Figure 1

Our ob­jec­tive is to find the co­ef­fi­cients that give us the low­est ap­prox­i­ma­tion er­ror, as de­scribed in the pre­vi­ous post. But in­stead of op­ti­miz­ing the co­ef­fi­cients di­rect­ly, we can use the weights of the bi­ased coins in the +1 and +2 states as a prox­y. Here is the re­la­tion­ship:

Figure 2

We can re­arrange these two equa­tions to get the val­ues of the co­ef­fi­cients in terms of the weights of these two bi­ased coin­s:

Figure 3

Us­ing these two co­ef­fi­cients in the poly­no­mi­al for­mu­la, we can then com­pute the weights of each one of the bi­ased coins in the mod­el:

Figure 4

But there is one small prob­lem. The com­put­ed val­ue might be out­side the range of pos­si­ble val­ues for the coin’s bi­as. Re­mem­ber, a coin can­not land on heads more than 100% of the time or less than 0% of the time. We can get around this by cap­ping the val­ue at the low­er and up­per bound­s:

Figure 5

This cap­ping func­tion can be com­posed with the poly­no­mi­al func­tion to give us a valid com­put­ed val­ue for the weights of the bi­ased coin­s:

Figure 6

So now, sup­pose we start with an ini­tial guess for the weights of the coins in the +1 and +2 states. From this ini­tial guess, we can com­pute the weights for all coins based on the poly­no­mi­al for­mu­la. We can then com­pute the ex­pect­ed out­come of the coin toss game giv­en the set of weight­s. Once we know the ex­pect­ed out­come, we can com­pare that to the tar­get dis­tri­b­u­tion we are try­ing to ap­prox­i­mate. In this ex­am­ple, we’ll use the same ex­po­nen­tial dis­tri­b­u­tion that was used in the last two posts as our tar­get dis­tri­b­u­tion:

Figure 7

With this tech­nique, we can now com­pute the ap­prox­i­ma­tion er­ror for all pos­si­ble com­bi­na­tions of in­put­s. Since there are two vari­ables in the case of a lin­ear poly­no­mi­al, we can cre­ate a sur­face plot or a heatmap of the er­ror func­tion. Here is a sur­face plot of the ap­prox­i­ma­tion er­ror across the range of pos­si­ble val­ues:

Figure 8

To find the point with the small­est ap­prox­i­ma­tion er­ror, we can start with an ar­bi­trary ini­tial guess and use a hill climb­ing al­go­rithm to find the most op­ti­mal ap­prox­i­ma­tion, just as we did in the pre­vi­ous post. But this time around, let’s use a hill climb­ing al­go­rithm that can move di­ag­o­nal­ly in ad­di­tion to ver­ti­cal­ly and hor­i­zon­tal­ly. Here are the paths tak­en by the hill climb­ing al­go­rithm from sev­er­al dif­fer­ent start­ing points:

Figure 9
Figure 10
Figure 11
Figure 12

No­tice that the last one does­n’t con­verge to the same point as the oth­er three. In­stead of con­verg­ing to the glob­al min­i­mum, it ar­rives at a point that seems to be lo­cat­ed on ei­ther a lo­cal min­i­mum, a flat plateau, or per­haps a nar­row ridge. To avoid this kind of haz­ard, we need to be care­ful about choos­ing the start­ing point when us­ing this op­ti­miza­tion tech­nique. The best bet is prob­a­bly to choose a start­ing point as close to the op­ti­mum as pos­si­ble. Here is the op­ti­mal re­sult for the lin­ear poly­no­mi­al ex­am­ple:

Figure 13
Figure 14

Here is the val­ue of the cost func­tion at the most op­ti­mal point:

Figure 15

Com­pare this re­sult to the ap­prox­i­ma­tion for the same tar­get dis­tri­b­u­tion found in the pre­vi­ous post. In this case, the re­sults are iden­ti­cal. But keep in mind these two re­sults might not have nec­es­sar­i­ly been the same had we used a dif­fer­ent tar­get dis­tri­b­u­tion. Let’s see what hap­pens if we use this ap­prox­i­ma­tion tech­nique with high­er or­der poly­no­mi­al­s.

Qua­drat­ic Poly­no­mi­al

If we use a qua­drat­ic poly­no­mi­al in­stead of a lin­ear poly­no­mi­al, can we find a bet­ter ap­prox­i­ma­tion? Let’s find out. Here is the for­mu­la for a qua­drat­ic poly­no­mi­al:

Figure 16

Us­ing the same tech­nique as be­fore, we’ll use the bi­as­es of the coins in the +1, +2, and +3 states as a proxy for the co­ef­fi­cients. Here is the re­la­tion­ship:

Figure 17

To re­arrange these three equa­tions to state the co­ef­fi­cients in terms of the three bi­ased coin­s, the best ap­proach might be to use Gauss–Jor­dan elim­i­na­tion. We can rep­re­sent the equa­tions above in aug­ment­ed ma­trix for­m, like this:

Figure 18

From here, we can per­form a se­ries of el­e­men­tary row op­er­a­tions to con­vert the ma­trix above in­to the re­duced row ech­e­lon for­m:

Figure 19

Once we do that, we can eas­i­ly ex­press the three co­ef­fi­cients of the qua­drat­ic poly­no­mi­al in terms of the weights of the three bi­ased coin­s:

Figure 20

And now that we know how to com­pute our co­ef­fi­cients, we can use the same tech­nique that was used in the pre­vi­ous sec­tion to find the op­ti­mal so­lu­tion. Here it is:

Figure 21
Figure 22

Here is the val­ue of the cost func­tion at the most op­ti­mal point:

Figure 23

This re­sult is very sim­i­lar to the one found in the pre­vi­ous sec­tion. The fit­ted poly­no­mi­al has on­ly a slight curve to it. As mea­sured by the cost func­tion, it is on­ly a mar­ginal im­prove­men­t. While it’s a good ap­prox­i­ma­tion, there does­n’t seem to be much ben­e­fit to us­ing a qua­drat­ic poly­no­mi­al over a lin­ear poly­no­mi­al. At least in this in­stance.

Cu­bic Poly­no­mi­al

If we use a cu­bic poly­no­mi­al in­stead of a qua­drat­ic poly­no­mi­al, will the re­sults be sig­nif­i­cant­ly bet­ter? Again, let’s find out. Here is the for­mu­la for a cu­bic poly­no­mi­al:

Figure 24

Us­ing the same tech­nique again, we can use the bi­as­es of the coins in the +1, +2, +3, and +4 states as a proxy for the four co­ef­fi­cients. Here is the re­la­tion­ship:

Figure 25

We can use Gauss–Jor­dan elim­i­na­tion as we did be­fore to ex­press the four co­ef­fi­cients in terms of the weights of the four bi­ased coin­s. And like we did be­fore, we can use the hill climb­ing op­ti­miza­tion method to find the op­ti­mum re­sult:

Figure 26
Figure 27

Here is the val­ue of the cost func­tion at the most op­ti­mal point:

Figure 28

The fit­ted poly­no­mi­al here ex­hibits a no­tice­able curve. This re­sult is quite a bit bet­ter than the one found us­ing a qua­drat­ic poly­no­mi­al. The er­ror is much small­er, in­di­cat­ing a much bet­ter ap­prox­i­ma­tion. Com­pare this ap­prox­i­ma­tion to the ex­act re­sults found by a dif­fer­ent method in a pre­vi­ous study. This ap­prox­i­ma­tion is very sim­i­lar.

Suc­cess­ful Ex­per­i­ment

This has turned out to be a suc­cess­ful ex­per­i­men­t. I want to ex­plore this tech­nique fur­ther. Specif­i­cal­ly, I want to use this tech­nique to find ap­prox­i­mate so­lu­tions for larg­er mod­els with a lot more than ten coin toss events per round. The code I’ve writ­ten for this study runs too slow­ly for larg­er mod­el­s, but I think there is room for im­prove­ment in terms of run­time per­for­mance.

Al­so, I’m cu­ri­ous to know what de­gree of a poly­no­mi­al would be re­quired to use this tech­nique to find an ex­act so­lu­tion in­stead of just an ap­prox­i­ma­tion. At most, I think a poly­no­mi­al of a de­gree equal to the num­ber of coin toss events per round would be re­quired. But I sus­pect a small­er or­der poly­no­mi­al might be suf­fi­cien­t. It might al­so be in­ter­est­ing to ex­per­i­ment with this ap­prox­i­ma­tion tech­nique us­ing si­nu­soidal func­tions in­stead of poly­no­mi­al­s.

Ac­com­pa­ny­ing source code is avail­able on GitHub.

Com­ments

Show comments