Convolutions and First Differences
This is a study of the convolution operation and its applications to time series data and probability distributions. In this post, I first demonstrate the use of the convolution operation to find the first differences of some randomly generated time series data. I then show how to find the distribution of the first differences based on the distribution of the values in the original series. I also show how to work backwards using deconvolution, which is the inverse of the convolution operation.
Times Series Data
Suppose we generate a sequence of random numbers at equally spaced time intervals. The random numbers are integers ranging from negative two to positive two. This is our time series data. Let’s use the following notation to represent them:
Using a random number generator, we can create a sample data set to work with. If we plot these numbers on a chart, this is what it looks like:
Now we want to find the first difference for this time series. That is to say, we want to compute the difference from one data point to the next, creating a new time series in the process. Here is a formula we can use to do this:
The first differencing operation requires a sequential pair of input data points to produce a single output data point. Because of this, the output data set will always contain one less data point than the input data set. Since our input data starts at time zero, our output data starts at time one. It’s simple enough to just compute the first differences using this formula. But there is an alternative method we can explore. Consider the following:
You can think of this as a filtering operation or an impulse response function. If we treat our time series data as a signal, we can use convolution as a general-purpose mechanism to apply various filters to the data—including the one above that computes first differences. Here is how to apply the filtering operation using convolution:
In theory, the convolution operation is a summation across an infinite range of values. Any function value that is not explicitly defined is just treated as a zero. In practice, however, it is only necessary to compute the sum across the range of values that actually exist:
Evaluating this function for each point in time within the valid range, we can obtain a time series consisting of the first differences of the original input data. Here is a visual representation of the first differences:
I think it’s worth mentioning here that the convolution operation is commutative. That means the order in which you place the operands doesn’t matter:
One of the interesting consequences of this is that you can rearrange the convolution formula so that it requires fewer operations to compute. Consider the following rearrangement:
This yields the same results as before. But notice that the limits of the summation go from zero to one instead of needlessly summing across the entire span of the input data. Arranging it this way can have a significant performance benefit when dealing with large data sets.
Discrete Probability Distribution
In the example above, the time series data was produced by generating a sequence of random integers between negative two and positive two. These numbers were generated by taking samples of a random variable with the following probability mass function:
Here is a visual representation of this function:
If this mass function represents the distribution of the input time series data, what is the distribution of the first differences? That is the question we want to answer. And to answer this question, we can use the convolution operation. Again. Only this time around, we take the convolution of the probability mass function with itself:
Like before, the limits of the summation need not span an infinite range of values for practical implementations. The convolution operation defined above gives us a new probability mass function that represents the distribution of the first differences. If you work it out, here is the result:
Here is a visual representation of this function:
As you can see, the distribution of first differences is more dispersed and has a wider range of possible outcomes. If the result of this convolution operation is not intuitive to you, then I would recommend that you stop here and work this example out manually on a piece of paper. That will help you understand it better.
Continuous Normal Distribution
What if the sequence of random numbers used to generate the time series is distributed according to a continuous probability distribution? How can we determine the distribution of the first differences in this case? For this example, suppose the random numbers are normally distributed. This can be represented by the following probability density function:
Here is a visual representation of this function in standard form using a mean value of zero and a standard deviation value of one:
To find the distribution of the first differences, we can use the convolution operation in the same manner as we did in the previous example. But instead of a sum, we use an integral:
This might be a difficult integral to solve. It might not have a closed-form solution. And you might have to resort to numerical methods to find an approximate solution. Using numerical integration, we can easily create a plot to see what this function looks like:
At first glance, this looks like a more dispersed version of the standard normal distribution that we started with. And indeed, that’s what it is. I went ahead and worked out the integral. Here is the solution I came up with:
This takes the form of a normal distribution. Applying the convolution operation effectively doubles the mean and doubles the variance while maintaining the shape of the distribution. The shape of the distribution is maintained in this example because it’s a normal distribution. This outcome may not hold for other types of probability distributions.
Continuous Laplace Distribution
I am curious to see what happens if the distribution of the values in the initial time series data is a Laplace distribution instead of a normal distribution. What will the shape of the distribution of the first differences look like? Here is the density function of the Laplace distribution:
Here is a visual representation of this function using a location parameter value of zero and a scale parameter value of one:
We can apply the convolution operation the same way we did in the previous example. And then, using numerical integration, we can create a plot to see what the density function looks like for the first differences:
The shape of this density function looks like a cross between that of a Laplace distribution and that of a normal distribution. I wasn’t able to work out the integral analytically. But my suspicion is that it takes the form of a generalized normal distribution with a shape parameter somewhere between one and two. I’ll leave this as an unsolved problem for now.
Deconvolution
Now let’s say we know the distribution of the first differences. How do we work backwards to compute the distribution of the values in the original time series based on our knowledge of the distribution of the first differences? For this example, let’s assume the first differences are distributed according to a Laplace distribution. Here is the density function:
Here is a visual representation of this function using a location parameter value of zero and a scale parameter value of two:
Given this function, can we find the inverse of the self-convolution operation demonstrated in the two previous examples? One approach we can use to answer this question is based on the convolution theorem. The convolution theorem states that the Fourier transform of the convolution of two functions is the product of the Fourier transform of each of the two functions individually. To phrase this another way, we can say that convolution on the primary domain is equivalent to multiplication on the frequency domain:
Taking this relationship into consideration, we can see that the Fourier transform of the distribution of the time series data—the function we are trying to solve for—is equal to the square root of the Fourier transform of the distribution of the first differences:
We know the distribution of the first differences is a Laplace distribution. We can find the Fourier transform of the density function using the following formula:
The Fourier transform of the Laplace density function works out to this:
Taking the square root of the above, we get the following:
And now we can take the inverse Fourier transform to get the result:
As with the previous example, I have yet to figure out how to extract a solution for this integral analytically. Fortunately, however, we can use numerical integration to plot the values and see what the result looks like on a chart:
This seems to have a similar shape to that of the Laplace distribution. But it seems to be taller and skinnier in the middle, and it seems to have flatter and broader tails. Like the previous example, I suspect this takes the form of a generalized normal distribution. But in this case, I think the shape parameter is less than one.
Comments