Approximations with Polynomials
The previous post demonstrates the use of biases derived from a simple line formula to find an approximate solution to the weighted coin toss problem. In this post, I want to expand on some of these ideas using various polynomial formulas to describe the weights of the biased coins. As this experiment demonstrates, higher order polynomials do seem to yield better results.
Linear Polynomial
For the first example, let’s start with a linear polynomial. This is similar to the two-parameter formula used in the previous post, but with a larger range of possible slopes. The polynomial formula looks like this:
Our objective is to find the coefficients that give us the lowest approximation error, as described in the previous post. But instead of optimizing the coefficients directly, we can use the weights of the biased coins in the +1 and +2 states as a proxy. Here is the relationship:
We can rearrange these two equations to get the values of the coefficients in terms of the weights of these two biased coins:
Using these two coefficients in the polynomial formula, we can then compute the weights of each one of the biased coins in the model:
But there is one small problem. The computed value might be outside the range of possible values for the coin’s bias. Remember, a coin cannot land on heads more than 100% of the time or less than 0% of the time. We can get around this by capping the value at the lower and upper bounds:
This capping function can be composed with the polynomial function to give us a valid computed value for the weights of the biased coins:
So now, suppose we start with an initial guess for the weights of the coins in the +1 and +2 states. From this initial guess, we can compute the weights for all coins based on the polynomial formula. We can then compute the expected outcome of the coin toss game given the set of weights. Once we know the expected outcome, we can compare that to the target distribution we are trying to approximate. In this example, we’ll use the same exponential distribution that was used in the last two posts as our target distribution:
With this technique, we can now compute the approximation error for all possible combinations of inputs. Since there are two variables in the case of a linear polynomial, we can create a surface plot or a heatmap of the error function. Here is a surface plot of the approximation error across the range of possible values:
To find the point with the smallest approximation error, we can start with an arbitrary initial guess and use a hill climbing algorithm to find the most optimal approximation, just as we did in the previous post. But this time around, let’s use a hill climbing algorithm that can move diagonally in addition to vertically and horizontally. Here are the paths taken by the hill climbing algorithm from several different starting points:
Notice that the last one doesn’t converge to the same point as the other three. Instead of converging to the global minimum, it arrives at a point that seems to be located on either a local minimum, a flat plateau, or perhaps a narrow ridge. To avoid this kind of hazard, we need to be careful about choosing the starting point when using this optimization technique. The best bet is probably to choose a starting point as close to the optimum as possible. Here is the optimal result for the linear polynomial example:
Here is the value of the cost function at the most optimal point:
Compare this result to the approximation for the same target distribution found in the previous post. In this case, the results are identical. But keep in mind these two results might not have necessarily been the same had we used a different target distribution. Let’s see what happens if we use this approximation technique with higher order polynomials.
Quadratic Polynomial
If we use a quadratic polynomial instead of a linear polynomial, can we find a better approximation? Let’s find out. Here is the formula for a quadratic polynomial:
Using the same technique as before, we’ll use the biases of the coins in the +1, +2, and +3 states as a proxy for the coefficients. Here is the relationship:
To rearrange these three equations to state the coefficients in terms of the three biased coins, the best approach might be to use Gauss–Jordan elimination. We can represent the equations above in augmented matrix form, like this:
From here, we can perform a series of elementary row operations to convert the matrix above into the reduced row echelon form:
Once we do that, we can easily express the three coefficients of the quadratic polynomial in terms of the weights of the three biased coins:
And now that we know how to compute our coefficients, we can use the same technique that was used in the previous section to find the optimal solution. Here it is:
Here is the value of the cost function at the most optimal point:
This result is very similar to the one found in the previous section. The fitted polynomial has only a slight curve to it. As measured by the cost function, it is only a marginal improvement. While it’s a good approximation, there doesn’t seem to be much benefit to using a quadratic polynomial over a linear polynomial. At least in this instance.
Cubic Polynomial
If we use a cubic polynomial instead of a quadratic polynomial, will the results be significantly better? Again, let’s find out. Here is the formula for a cubic polynomial:
Using the same technique again, we can use the biases of the coins in the +1, +2, +3, and +4 states as a proxy for the four coefficients. Here is the relationship:
We can use Gauss–Jordan elimination as we did before to express the four coefficients in terms of the weights of the four biased coins. And like we did before, we can use the hill climbing optimization method to find the optimum result:
Here is the value of the cost function at the most optimal point:
The fitted polynomial here exhibits a noticeable curve. This result is quite a bit better than the one found using a quadratic polynomial. The error is much smaller, indicating a much better approximation. Compare this approximation to the exact results found by a different method in a previous study. This approximation is very similar.
Successful Experiment
This has turned out to be a successful experiment. I want to explore this technique further. Specifically, I want to use this technique to find approximate solutions for larger models with a lot more than ten coin toss events per round. The code I’ve written for this study runs too slowly for larger models, but I think there is room for improvement in terms of runtime performance.
Also, I’m curious to know what degree of a polynomial would be required to use this technique to find an exact solution instead of just an approximation. At most, I think a polynomial of a degree equal to the number of coin toss events per round would be required. But I suspect a smaller order polynomial might be sufficient. It might also be interesting to experiment with this approximation technique using sinusoidal functions instead of polynomials.
Comments