Jim Killingsworth

Gen­er­al­ized Nor­mal Dis­tri­b­u­tions

The gen­er­al­ized nor­mal dis­tri­b­u­tion is a fam­i­ly of prob­a­bil­i­ty dis­tri­b­u­tions that vary ac­cord­ing to a shape pa­ra­me­ter. The sym­met­ri­cal vari­ant of this dis­tri­b­u­tion may go by oth­er names such as the gen­er­al­ized er­ror dis­tri­b­u­tion, the gen­er­al­ized Gaussian dis­tri­b­u­tion, etc. In this post, we will ex­plore this prob­a­bil­i­ty dis­tri­b­u­tion and its re­la­tion­ship with the nor­mal dis­tri­b­u­tion and the Laplace dis­tri­b­u­tion. I’ll al­so show some ex­am­ples il­lus­trat­ing the use of the max­i­mum like­li­hood method to es­ti­mate the pa­ra­me­ters of the dis­tri­b­u­tion us­ing re­al-life data.

Prob­a­bil­i­ty Den­si­ty Func­tion

The gen­er­al­ized nor­mal dis­tri­b­u­tion is a con­tin­u­ous prob­a­bil­i­ty dis­tri­b­u­tion with three pa­ra­me­ter­s: a lo­ca­tion pa­ra­me­ter, a scale pa­ra­me­ter, and a shape pa­ra­me­ter. The prob­a­bil­i­ty den­si­ty func­tion takes the fol­low­ing for­m:

Figure 1

The lo­ca­tion pa­ra­me­ter can be neg­a­tive or pos­i­tive. The scale pa­ra­me­ter and the shape pa­ra­me­ter are al­ways pos­i­tive re­al num­ber­s. Note the use of the gam­ma func­tion above. The gam­ma func­tion looks like this:

Figure 2

For prac­ti­cal pur­pos­es, we can just use a nu­mer­i­cal method to ap­prox­i­mate the gam­ma func­tion. I am us­ing a third-par­ty im­ple­men­ta­tion of the Lanc­zos ap­prox­i­ma­tion for the il­lus­tra­tions in this post. If we hold the lo­ca­tion and scale pa­ra­me­ters con­stant and then vary the shape pa­ra­me­ter, we can see what the shape of the den­si­ty func­tion looks like for dif­fer­ent val­ues of the shape pa­ra­me­ter. Here are some il­lus­tra­tions:

Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

If you think the den­si­ty func­tion looks like that of a Laplace dis­tri­b­u­tion when the shape pa­ra­me­ter is equal to one, then you would be cor­rec­t. And if you think the den­si­ty func­tion looks like that of a nor­mal dis­tri­b­u­tion when the shape pa­ra­me­ter is equal to two, then you would be cor­rect again. In­deed, both the nor­mal dis­tri­b­u­tion and the Laplace dis­tri­b­u­tion are spe­cial cas­es of the gen­er­al­ized nor­mal dis­tri­b­u­tion. The gen­er­al­ized nor­mal dis­tri­b­u­tion can al­so take the form of a uni­form dis­tri­b­u­tion as the shape pa­ra­me­ter ap­proach­es in­fin­i­ty.

Nor­mal Dis­tri­b­u­tion

The nor­mal dis­tri­b­u­tion is a spe­cial case of the gen­er­al­ized nor­mal dis­tri­b­u­tion when the shape pa­ra­me­ter is equal to two. Con­sid­er the fol­low­ing:

Figure 10

Plug these val­ues in­to the den­si­ty func­tion and re­place the scale pa­ra­me­ter with the fol­low­ing:

Figure 11

We now have a fa­mil­iar rep­re­sen­ta­tion of the nor­mal dis­tri­b­u­tion:

Figure 12

As you can see, by hold­ing the shape pa­ra­me­ter to a fixed val­ue of two, the gen­er­al­ized nor­mal dis­tri­b­u­tion can be treat­ed like a reg­u­lar nor­mal dis­tri­b­u­tion.

Laplace Dis­tri­b­u­tion

The Laplace dis­tri­b­u­tion is a spe­cial case of the gen­er­al­ized nor­mal dis­tri­b­u­tion when the shape pa­ra­me­ter is equal to one. Con­sid­er the fol­low­ing:

Figure 13

Plug these val­ues in­to the den­si­ty func­tion and re­place the scale pa­ra­me­ter with the fol­low­ing:

Figure 14

We now have a fa­mil­iar rep­re­sen­ta­tion of the Laplace dis­tri­b­u­tion:

Figure 15

As you can see, by hold­ing the shape pa­ra­me­ter to a fixed val­ue of one, the gen­er­al­ized nor­mal dis­tri­b­u­tion can be treat­ed like a Laplace dis­tri­b­u­tion.

Nu­mer­i­cal Pa­ra­me­ter Es­ti­ma­tion

If you have a set of ob­served da­ta that is dis­trib­uted ac­cord­ing to a known prob­a­bil­i­ty dis­tri­b­u­tion, you can use the max­i­mum like­li­hood method to es­ti­mate the pa­ra­me­ters of the dis­tri­b­u­tion. If the dis­tri­b­u­tion is a nor­mal dis­tri­b­u­tion or a Laplace dis­tri­b­u­tion, the pa­ra­me­ter val­ues can be solved for an­a­lyt­i­cal­ly by tak­ing the par­tial de­riv­a­tive of the like­li­hood func­tion with re­spect to each one of the pa­ra­me­ter­s. You can ref­er­ence my ear­li­er post ti­tled Nor­mal and Laplace Dis­tri­b­u­tions for a deep­er ex­pla­na­tion. But what if tak­ing the de­riv­a­tive is dif­fi­cult or im­pos­si­ble to do? Con­sid­er the like­li­hood func­tion for the gen­er­al­ized nor­mal dis­tri­b­u­tion:

Figure 16

To fit the gen­er­al­ized nor­mal dis­tri­b­u­tion to an ob­served set of data, we need to find the pa­ra­me­ter val­ues that max­i­mize this func­tion. In­stead of com­ing up with an an­a­lyt­i­cal so­lu­tion, we can use a nu­mer­i­cal op­ti­miza­tion method. Tak­ing this ap­proach, we need to come up with a cost func­tion that our op­ti­miza­tion method can eval­u­ate it­er­a­tive­ly. Here is the cost func­tion that we will use in the ex­am­ples in the fol­low­ing sec­tion­s:

Figure 17

This is just the nega­tion of the log­a­rithm of the like­li­hood func­tion. We want to take the neg­a­tive in this case be­cause, in the ex­am­ples in the fol­low­ing sec­tion­s, we’re go­ing to use an im­ple­men­ta­tion of the Nelder–Mead op­ti­miza­tion method that finds min­i­mums in­stead of max­i­mum­s. And by us­ing the log­a­rithm of the like­li­hood func­tion, we can avoid deal­ing with num­bers that are too large for a dou­ble-pre­ci­sion float­ing-point num­ber. Since our cho­sen op­ti­miza­tion method re­quires an ini­tial guess of the pa­ra­me­ter val­ues, we can start by giv­ing the shape pa­ra­me­ter a val­ue of two:

Figure 18

This would im­ply a nor­mal dis­tri­b­u­tion, so we might al­so set the ini­tial guess for the lo­ca­tion and scale pa­ra­me­ters ac­cord­ing­ly:

Figure 19

This puts our ini­tial pa­ra­me­ter es­ti­mates in the right ball­park. Our nu­mer­i­cal op­ti­miza­tion al­go­rithm can then it­er­a­tive­ly find in­creas­ing­ly bet­ter and bet­ter es­ti­mates un­til some ter­mi­nat­ing cri­te­ri­on is met. If we know that the dis­tri­b­u­tion of our da­ta more close­ly re­sem­bles that of a Laplace dis­tri­b­u­tion—as with the da­ta stud­ied in my post ti­tled The Dis­tri­b­u­tion of Price Fluc­tu­a­tions—we might choose an ini­tial guess based on the pa­ra­me­ters fit­ted to a Laplace dis­tri­b­u­tion in­stead. The nu­mer­i­cal ap­prox­i­ma­tion should come out rough­ly the same in ei­ther case.

Mi­crosoft Stock Prices

Now let’s take a look at some ex­am­ples of fit­ting the gen­er­al­ized nor­mal dis­tri­b­u­tion to some da­ta in the wild. For this ex­am­ple, we’ll use the his­tor­i­cal stock prices of Mi­crosoft Cor­po­ra­tion go­ing back to 1986. We’ll take the log­a­rithm of the dai­ly clos­ing prices, com­pute the first dif­fer­ences, and then put the da­ta in a his­togram. The fol­low­ing charts show the his­togram over­laid with the fit­ted nor­mal, Laplace, and gen­er­al­ized nor­mal den­si­ty func­tion­s, re­spec­tive­ly:

Figure 20
Figure 21
Figure 22

The fit­ted gen­er­al­ized nor­mal dis­tri­b­u­tion has the fol­low­ing shape pa­ra­me­ter:

Figure 23

This val­ue is very close to one, mean­ing that the shape of the den­si­ty func­tion is very close to that of the Laplace dis­tri­b­u­tion. Eye­balling the charts above, you can’t re­al­ly tell the dif­fer­ence be­tween the fit­ted Laplace den­si­ty func­tion and the fit­ted gen­er­al­ized nor­mal den­si­ty func­tion.

Bit­coin Prices

Let’s do an­oth­er ex­am­ple. This one us­es his­tor­i­cal bit­coin prices go­ing back to 2011. Like be­fore, we’ll take the log­a­rithm of the dai­ly prices, com­pute the first dif­fer­ences, and then put the da­ta in a his­togram. The fol­low­ing charts show the his­togram over­laid with the fit­ted nor­mal, Laplace, and gen­er­al­ized nor­mal den­si­ty func­tion­s, re­spec­tive­ly:

Figure 24
Figure 25
Figure 26

The fit­ted gen­er­al­ized nor­mal dis­tri­b­u­tion has the fol­low­ing shape pa­ra­me­ter:

Figure 27

This is a bit small­er than the shape pa­ra­me­ter that would con­form to a Laplace dis­tri­b­u­tion. As you can see in the charts above, the fit­ted den­si­ty func­tion for the gen­er­al­ized nor­mal dis­tri­b­u­tion is taller and thin­ner than the den­si­ty func­tion for the Laplace dis­tri­b­u­tion.

Nat­ural Gas Prices

For the third ex­am­ple, let’s use the his­tor­i­cal prices of a nat­ural gas ET­F. As with the pre­vi­ous two ex­am­ples, we’ll take the log­a­rithm of the dai­ly price quotes, com­pute the first dif­fer­ences, and then put the da­ta in a his­togram. The fol­low­ing charts show the his­togram over­laid with the fit­ted nor­mal, Laplace, and gen­er­al­ized nor­mal den­si­ty func­tion­s, re­spec­tive­ly:

Figure 28
Figure 29
Figure 30

The fit­ted gen­er­al­ized nor­mal dis­tri­b­u­tion has the fol­low­ing shape pa­ra­me­ter:

Figure 31

This val­ue is a bit larg­er than the shape pa­ra­me­ter that would con­form to a Laplace dis­tri­b­u­tion. In con­trast to the pre­vi­ous ex­am­ple, the fit­ted den­si­ty func­tion for the gen­er­al­ized nor­mal dis­tri­b­u­tion in this ex­am­ple is short­er and wider than the den­si­ty func­tion for the Laplace dis­tri­b­u­tion.

Oth­er Dis­tri­b­u­tions

For the da­ta sets used in the ex­am­ples above (and for sim­i­lar da­ta sets rep­re­sent­ing price fluc­tu­a­tions in fi­nan­cial mar­ket­s), there is no doubt that fit­ting the da­ta to a gen­er­al­ized nor­mal dis­tri­b­u­tion gives bet­ter re­sults than fit­ting the da­ta to a Laplace dis­tri­b­u­tion. But I am not con­vinced that this is the best kind of prob­a­bil­i­ty dis­tri­b­u­tion to use for mod­el­ing this type of data. In each of the ex­am­ples above, the peak of the dis­tri­b­u­tion im­plied by the his­togram seems to be much more round­ed than the den­si­ty func­tion of the fit­ted gen­er­al­ized nor­mal dis­tri­b­u­tion. Per­haps a Cauchy dis­tri­b­u­tion might be a bet­ter al­ter­na­tive. And per­haps the nu­mer­i­cal tech­niques used here could open the door to ex­plor­ing the use of oth­er types of prob­a­bil­i­ty dis­tri­b­u­tions as well.

Ac­com­pa­ny­ing source code is avail­able on GitHub.

Com­ments

Show comments