Higher Order Polynomial Approximation
This is an extension of one of my earlier posts on polynomial approximations. Previously, I showed how to find approximate solutions to the weighted coin toss problem using first, second, and third order polynomials to describe the weights of the biased coins. In this post, I demonstrate a generalized method for applying this technique using higher order polynomials.
Generalized Method
Assume we’re using a model of the coin toss game with an arbitrary number of coin toss events per round. The purpose of this technique is to use a polynomial of an arbitrary degree to describe the weights of the biased coins. Given the degree of the polynomial, let’s define the following:
This number tells us how many coefficients are in our polynomial. It also defines the size of a matrix we will use to compute the values of the coefficients. Let’s assume this number is smaller than the number of coin toss events:
Here is what our polynomial function looks like in expanded form:
Here is what it looks like using a more compact notation:
Recall from the last few posts the iterative procedure we used to find approximate solutions to the coin toss problem using polynomials to represent the weights of the biased coins. What we need to do now is find the values of the coefficients based on a subset of the weights of the biased coins. Let’s use the following notation to represent this subset as a vector:
This subset of the weights represents a guess for a particular iteration of the optimization process. It is a set of arguments we can use to find the corresponding coefficients. And knowing the coefficients, we can then compute the remaining weights based on the polynomial. Let’s use the following notation to represent the coefficients as a vector:
Once we know the values of the coefficients, we can compute the values for the remaining weights and then calculate the value of the cost function. We repeat this process at each iteration to find incrementally better solutions. But how do we compute the values of the coefficients? Consider the following matrix:
With this matrix, we can set up a system of equations. The matrix multiplied by the coefficient vector is equal to the argument vector representing a subset of the weights of the biased coins. Here is this relationship expressed as a matrix equation:
Thus, we have a system of equations. Using the inverse of the matrix, we can rearrange the matrix equation above to express the coefficients in terms of the inputs depicting a subset of the weights of the biased coins:
Once we have computed the coefficients, we can use the polynomial to compute all of the weights of the biased coins in the same manner as described in the last few posts. Note that we only need to compute the matrix and its inverse one time. They do not need to be recomputed for each iteration of the optimization process. In the examples that follow, we will use the Nelder–Mead method instead of the hill climbing method as the iterative method to find the most optimal solution.
Example with Second Order Polynomial
To illustrate how this technique works, let’s do an example using a second order polynomial. This example is similar to another example illustrated in a previous post. We’ll start with the following definition:
Here is what our second order polynomial function looks like:
Here is the matrix for a second order polynomial:
Here is the inverse of the matrix:
And here is how we express the coefficients in terms of the weights:
Consistent with examples from some of the previous posts, let’s use the following probability mass function as the target distribution we want to find an approximation for:
Here is the solution found after 161 iterations:
Here is the value of the cost function at the most optimal point:
As you can see, based on the value of the cost function at the most optimal point, this is not an exact solution. But it is still a pretty good approximation. And this solution matches the one found in the other example.
Example with Sixth Order Polynomial
To demonstrate the capabilities of this approximation technique, let’s showcase another example. For this example, let’s use a sixth order polynomial. Using the same approach as before, we’ll start with the following definition:
Here is what our sixth order polynomial function looks like:
To be consistent with the example in the previous section, we’ll use the same target distribution. Here is the solution found after 1,308 iterations:
Here is the value of the cost function at the most optimal point:
Based on the value of the cost function at the most optimal point, this appears to be an exact solution. Applying this technique with fourth and fifth order polynomials also yields seemingly exact solutions. And it does so with fewer and less computationally expensive iterations. It is not clear to me what minimum degree of a polynomial is required to find an exact solution in the general case when there is an arbitrary number of coin toss events.
Comments