Jim Killingsworth

Con­vo­lu­tions and First Dif­fer­ences

This is a study of the con­vo­lu­tion op­er­a­tion and its ap­pli­ca­tions to time se­ries da­ta and prob­a­bil­i­ty dis­tri­b­u­tion­s. In this post, I first demon­strate the use of the con­vo­lu­tion op­er­a­tion to find the first dif­fer­ences of some ran­dom­ly gen­er­at­ed time se­ries data. I then show how to find the dis­tri­b­u­tion of the first dif­fer­ences based on the dis­tri­b­u­tion of the val­ues in the orig­i­nal se­ries. I al­so show how to work back­wards us­ing de­con­vo­lu­tion, which is the in­verse of the con­vo­lu­tion op­er­a­tion.

Times Se­ries Da­ta

Sup­pose we gen­er­ate a se­quence of ran­dom num­bers at equal­ly spaced time in­ter­val­s. The ran­dom num­bers are in­te­gers rang­ing from neg­a­tive two to pos­i­tive two. This is our time se­ries data. Let’s use the fol­low­ing no­ta­tion to rep­re­sent them:

Figure 1

Us­ing a ran­dom num­ber gen­er­a­tor, we can cre­ate a sam­ple da­ta set to work with. If we plot these num­bers on a chart, this is what it looks like:

Figure 2

Now we want to find the first dif­fer­ence for this time se­ries. That is to say, we want to com­pute the dif­fer­ence from one da­ta point to the nex­t, cre­at­ing a new time se­ries in the process. Here is a for­mu­la we can use to do this:

Figure 3

The first dif­fer­enc­ing op­er­a­tion re­quires a se­quen­tial pair of in­put da­ta points to pro­duce a sin­gle out­put da­ta point. Be­cause of this, the out­put da­ta set will al­ways con­tain one less da­ta point than the in­put da­ta set. Since our in­put da­ta starts at time ze­ro, our out­put da­ta starts at time one. It’s sim­ple enough to just com­pute the first dif­fer­ences us­ing this for­mu­la. But there is an al­ter­na­tive method we can ex­plore. Con­sid­er the fol­low­ing:

Figure 4

You can think of this as a fil­ter­ing op­er­a­tion or an im­pulse re­sponse func­tion. If we treat our time se­ries da­ta as a sig­nal, we can use con­vo­lu­tion as a gen­er­al-pur­pose mech­a­nism to ap­ply var­i­ous fil­ters to the data—in­clud­ing the one above that com­putes first dif­fer­ences. Here is how to ap­ply the fil­ter­ing op­er­a­tion us­ing con­vo­lu­tion:

Figure 5

In the­o­ry, the con­vo­lu­tion op­er­a­tion is a sum­ma­tion across an in­fi­nite range of val­ues. Any func­tion val­ue that is not ex­plic­it­ly de­fined is just treat­ed as a ze­ro. In prac­tice, how­ev­er, it is on­ly nec­es­sary to com­pute the sum across the range of val­ues that ac­tu­al­ly ex­ist:

Figure 6

Eval­u­at­ing this func­tion for each point in time with­in the valid range, we can ob­tain a time se­ries con­sist­ing of the first dif­fer­ences of the orig­i­nal in­put data. Here is a vi­su­al rep­re­sen­ta­tion of the first dif­fer­ences:

Figure 7

I think it’s worth men­tion­ing here that the con­vo­lu­tion op­er­a­tion is com­mu­ta­tive. That means the or­der in which you place the operands does­n’t mat­ter:

Figure 8

One of the in­ter­est­ing con­se­quences of this is that you can re­arrange the con­vo­lu­tion for­mu­la so that it re­quires few­er op­er­a­tions to com­pute. Con­sid­er the fol­low­ing re­arrange­men­t:

Figure 9

This yields the same re­sults as be­fore. But no­tice that the lim­its of the sum­ma­tion go from ze­ro to one in­stead of need­less­ly sum­ming across the en­tire span of the in­put da­ta. Ar­rang­ing it this way can have a sig­nif­i­cant per­for­mance ben­e­fit when deal­ing with large data set­s.

Dis­crete Prob­a­bil­i­ty Dis­tri­b­u­tion

In the ex­am­ple above, the time se­ries da­ta was pro­duced by gen­er­at­ing a se­quence of ran­dom in­te­gers be­tween neg­a­tive two and pos­i­tive two. These num­bers were gen­er­at­ed by tak­ing sam­ples of a ran­dom vari­able with the fol­low­ing prob­a­bil­i­ty mass func­tion:

Figure 10

Here is a vi­su­al rep­re­sen­ta­tion of this func­tion:

Figure 11

If this mass func­tion rep­re­sents the dis­tri­b­u­tion of the in­put time se­ries data, what is the dis­tri­b­u­tion of the first dif­fer­ences? That is the ques­tion we want to an­swer. And to an­swer this ques­tion, we can use the con­vo­lu­tion op­er­a­tion. Again. On­ly this time around, we take the con­vo­lu­tion of the prob­a­bil­i­ty mass func­tion with it­self:

Figure 12

Like be­fore, the lim­its of the sum­ma­tion need not span an in­fi­nite range of val­ues for prac­ti­cal im­ple­men­ta­tion­s. The con­vo­lu­tion op­er­a­tion de­fined above gives us a new prob­a­bil­i­ty mass func­tion that rep­re­sents the dis­tri­b­u­tion of the first dif­fer­ences. If you work it out, here is the re­sult:

Figure 13

Here is a vi­su­al rep­re­sen­ta­tion of this func­tion:

Figure 14

As you can see, the dis­tri­b­u­tion of first dif­fer­ences is more dis­persed and has a wider range of pos­si­ble out­comes. If the re­sult of this con­vo­lu­tion op­er­a­tion is not in­tu­itive to you, then I would rec­om­mend that you stop here and work this ex­am­ple out man­u­al­ly on a piece of pa­per. That will help you un­der­stand it bet­ter.

Con­tin­u­ous Nor­mal Dis­tri­b­u­tion

What if the se­quence of ran­dom num­bers used to gen­er­ate the time se­ries is dis­trib­uted ac­cord­ing to a con­tin­u­ous prob­a­bil­i­ty dis­tri­b­u­tion? How can we de­ter­mine the dis­tri­b­u­tion of the first dif­fer­ences in this case? For this ex­am­ple, sup­pose the ran­dom num­bers are nor­mal­ly dis­trib­ut­ed. This can be rep­re­sent­ed by the fol­low­ing prob­a­bil­i­ty den­si­ty func­tion:

Figure 15

Here is a vi­su­al rep­re­sen­ta­tion of this func­tion in stan­dard form us­ing a mean val­ue of ze­ro and a stan­dard de­vi­a­tion val­ue of one:

Figure 16

To find the dis­tri­b­u­tion of the first dif­fer­ences, we can use the con­vo­lu­tion op­er­a­tion in the same man­ner as we did in the pre­vi­ous ex­am­ple. But in­stead of a sum, we use an in­te­gral:

Figure 17

This might be a dif­fi­cult in­te­gral to solve. It might not have a closed-form so­lu­tion. And you might have to re­sort to nu­mer­i­cal meth­ods to find an ap­prox­i­mate so­lu­tion. Us­ing nu­mer­i­cal in­te­gra­tion, we can eas­i­ly cre­ate a plot to see what this func­tion looks like:

Figure 18

At first glance, this looks like a more dis­persed ver­sion of the stan­dard nor­mal dis­tri­b­u­tion that we start­ed with. And in­deed, that’s what it is. I went ahead and worked out the in­te­gral. Here is the so­lu­tion I came up with:

Figure 19

This takes the form of a nor­mal dis­tri­b­u­tion. Ap­ply­ing the con­vo­lu­tion op­er­a­tion ef­fec­tive­ly dou­bles the mean and dou­bles the vari­ance while main­tain­ing the shape of the dis­tri­b­u­tion. The shape of the dis­tri­b­u­tion is main­tained in this ex­am­ple be­cause it’s a nor­mal dis­tri­b­u­tion. This out­come may not hold for oth­er types of prob­a­bil­i­ty dis­tri­b­u­tion­s.

Con­tin­u­ous Laplace Dis­tri­b­u­tion

I am cu­ri­ous to see what hap­pens if the dis­tri­b­u­tion of the val­ues in the ini­tial time se­ries da­ta is a Laplace dis­tri­b­u­tion in­stead of a nor­mal dis­tri­b­u­tion. What will the shape of the dis­tri­b­u­tion of the first dif­fer­ences look like? Here is the den­si­ty func­tion of the Laplace dis­tri­b­u­tion:

Figure 20

Here is a vi­su­al rep­re­sen­ta­tion of this func­tion us­ing a lo­ca­tion pa­ra­me­ter val­ue of ze­ro and a scale pa­ra­me­ter val­ue of one:

Figure 21

We can ap­ply the con­vo­lu­tion op­er­a­tion the same way we did in the pre­vi­ous ex­am­ple. And then, us­ing nu­mer­i­cal in­te­gra­tion, we can cre­ate a plot to see what the den­si­ty func­tion looks like for the first dif­fer­ences:

Figure 22

The shape of this den­si­ty func­tion looks like a cross be­tween that of a Laplace dis­tri­b­u­tion and that of a nor­mal dis­tri­b­u­tion. I was­n’t able to work out the in­te­gral an­a­lyt­i­cal­ly. But my sus­pi­cion is that it takes the form of a gen­er­al­ized nor­mal dis­tri­b­u­tion with a shape pa­ra­me­ter some­where be­tween one and two. I’ll leave this as an un­solved prob­lem for now.

De­con­vo­lu­tion

Now let’s say we know the dis­tri­b­u­tion of the first dif­fer­ences. How do we work back­wards to com­pute the dis­tri­b­u­tion of the val­ues in the orig­i­nal time se­ries based on our knowl­edge of the dis­tri­b­u­tion of the first dif­fer­ences? For this ex­am­ple, let’s as­sume the first dif­fer­ences are dis­trib­uted ac­cord­ing to a Laplace dis­tri­b­u­tion. Here is the den­si­ty func­tion:

Figure 23

Here is a vi­su­al rep­re­sen­ta­tion of this func­tion us­ing a lo­ca­tion pa­ra­me­ter val­ue of ze­ro and a scale pa­ra­me­ter val­ue of two:

Figure 24

Giv­en this func­tion, can we find the in­verse of the self­-con­vo­lu­tion op­er­a­tion demon­strat­ed in the two pre­vi­ous ex­am­ples? One ap­proach we can use to an­swer this ques­tion is based on the con­vo­lu­tion the­o­rem. The con­vo­lu­tion the­o­rem states that the Fouri­er trans­form of the con­vo­lu­tion of two func­tions is the prod­uct of the Fouri­er trans­form of each of the two func­tions in­di­vid­u­al­ly. To phrase this an­oth­er way, we can say that con­vo­lu­tion on the pri­ma­ry do­main is equiv­a­lent to mul­ti­pli­ca­tion on the fre­quen­cy do­main:

Figure 25

Tak­ing this re­la­tion­ship in­to con­sid­er­a­tion, we can see that the Fouri­er trans­form of the dis­tri­b­u­tion of the time se­ries data—the func­tion we are try­ing to solve for—is equal to the square root of the Fouri­er trans­form of the dis­tri­b­u­tion of the first dif­fer­ences:

Figure 26

We know the dis­tri­b­u­tion of the first dif­fer­ences is a Laplace dis­tri­b­u­tion. We can find the Fouri­er trans­form of the den­si­ty func­tion us­ing the fol­low­ing for­mu­la:

Figure 27

The Fouri­er trans­form of the Laplace den­si­ty func­tion works out to this:

Figure 28

Tak­ing the square root of the above, we get the fol­low­ing:

Figure 29

And now we can take the in­verse Fouri­er trans­form to get the re­sult:

Figure 30

As with the pre­vi­ous ex­am­ple, I have yet to fig­ure out how to ex­tract a so­lu­tion for this in­te­gral an­a­lyt­i­cal­ly. For­tu­nate­ly, how­ev­er, we can use nu­mer­i­cal in­te­gra­tion to plot the val­ues and see what the re­sult looks like on a chart:

Figure 31

This seems to have a sim­i­lar shape to that of the Laplace dis­tri­b­u­tion. But it seems to be taller and skin­nier in the mid­dle, and it seems to have flat­ter and broad­er tail­s. Like the pre­vi­ous ex­am­ple, I sus­pect this takes the form of a gen­er­al­ized nor­mal dis­tri­b­u­tion. But in this case, I think the shape pa­ra­me­ter is less than one.

Ac­com­pa­ny­ing source code is avail­able on GitHub.

Com­ments

Show comments