Generalized Normal Distributions
The generalized normal distribution is a family of probability distributions that vary according to a shape parameter. The symmetrical variant of this distribution may go by other names such as the generalized error distribution, the generalized Gaussian distribution, etc. In this post, we will explore this probability distribution and its relationship with the normal distribution and the Laplace distribution. I’ll also show some examples illustrating the use of the maximum likelihood method to estimate the parameters of the distribution using real-life data.
Probability Density Function
The generalized normal distribution is a continuous probability distribution with three parameters: a location parameter, a scale parameter, and a shape parameter. The probability density function takes the following form:
The location parameter can be negative or positive. The scale parameter and the shape parameter are always positive real numbers. Note the use of the gamma function above. The gamma function looks like this:
For practical purposes, we can just use a numerical method to approximate the gamma function. I am using a third-party implementation of the Lanczos approximation for the illustrations in this post. If we hold the location and scale parameters constant and then vary the shape parameter, we can see what the shape of the density function looks like for different values of the shape parameter. Here are some illustrations:
If you think the density function looks like that of a Laplace distribution when the shape parameter is equal to one, then you would be correct. And if you think the density function looks like that of a normal distribution when the shape parameter is equal to two, then you would be correct again. Indeed, both the normal distribution and the Laplace distribution are special cases of the generalized normal distribution. The generalized normal distribution can also take the form of a uniform distribution as the shape parameter approaches infinity.
Normal Distribution
The normal distribution is a special case of the generalized normal distribution when the shape parameter is equal to two. Consider the following:
Plug these values into the density function and replace the scale parameter with the following:
We now have a familiar representation of the normal distribution:
As you can see, by holding the shape parameter to a fixed value of two, the generalized normal distribution can be treated like a regular normal distribution.
Laplace Distribution
The Laplace distribution is a special case of the generalized normal distribution when the shape parameter is equal to one. Consider the following:
Plug these values into the density function and replace the scale parameter with the following:
We now have a familiar representation of the Laplace distribution:
As you can see, by holding the shape parameter to a fixed value of one, the generalized normal distribution can be treated like a Laplace distribution.
Numerical Parameter Estimation
If you have a set of observed data that is distributed according to a known probability distribution, you can use the maximum likelihood method to estimate the parameters of the distribution. If the distribution is a normal distribution or a Laplace distribution, the parameter values can be solved for analytically by taking the partial derivative of the likelihood function with respect to each one of the parameters. You can reference my earlier post titled Normal and Laplace Distributions for a deeper explanation. But what if taking the derivative is difficult or impossible to do? Consider the likelihood function for the generalized normal distribution:
To fit the generalized normal distribution to an observed set of data, we need to find the parameter values that maximize this function. Instead of coming up with an analytical solution, we can use a numerical optimization method. Taking this approach, we need to come up with a cost function that our optimization method can evaluate iteratively. Here is the cost function that we will use in the examples in the following sections:
This is just the negation of the logarithm of the likelihood function. We want to take the negative in this case because, in the examples in the following sections, we’re going to use an implementation of the Nelder–Mead optimization method that finds minimums instead of maximums. And by using the logarithm of the likelihood function, we can avoid dealing with numbers that are too large for a double-precision floating-point number. Since our chosen optimization method requires an initial guess of the parameter values, we can start by giving the shape parameter a value of two:
This would imply a normal distribution, so we might also set the initial guess for the location and scale parameters accordingly:
This puts our initial parameter estimates in the right ballpark. Our numerical optimization algorithm can then iteratively find increasingly better and better estimates until some terminating criterion is met. If we know that the distribution of our data more closely resembles that of a Laplace distribution—as with the data studied in my post titled The Distribution of Price Fluctuations—we might choose an initial guess based on the parameters fitted to a Laplace distribution instead. The numerical approximation should come out roughly the same in either case.
Microsoft Stock Prices
Now let’s take a look at some examples of fitting the generalized normal distribution to some data in the wild. For this example, we’ll use the historical stock prices of Microsoft Corporation going back to 1986. We’ll take the logarithm of the daily closing prices, compute the first differences, and then put the data in a histogram. The following charts show the histogram overlaid with the fitted normal, Laplace, and generalized normal density functions, respectively:
The fitted generalized normal distribution has the following shape parameter:
This value is very close to one, meaning that the shape of the density function is very close to that of the Laplace distribution. Eyeballing the charts above, you can’t really tell the difference between the fitted Laplace density function and the fitted generalized normal density function.
Bitcoin Prices
Let’s do another example. This one uses historical bitcoin prices going back to 2011. Like before, we’ll take the logarithm of the daily prices, compute the first differences, and then put the data in a histogram. The following charts show the histogram overlaid with the fitted normal, Laplace, and generalized normal density functions, respectively:
The fitted generalized normal distribution has the following shape parameter:
This is a bit smaller than the shape parameter that would conform to a Laplace distribution. As you can see in the charts above, the fitted density function for the generalized normal distribution is taller and thinner than the density function for the Laplace distribution.
Natural Gas Prices
For the third example, let’s use the historical prices of a natural gas ETF. As with the previous two examples, we’ll take the logarithm of the daily price quotes, compute the first differences, and then put the data in a histogram. The following charts show the histogram overlaid with the fitted normal, Laplace, and generalized normal density functions, respectively:
The fitted generalized normal distribution has the following shape parameter:
This value is a bit larger than the shape parameter that would conform to a Laplace distribution. In contrast to the previous example, the fitted density function for the generalized normal distribution in this example is shorter and wider than the density function for the Laplace distribution.
Other Distributions
For the data sets used in the examples above (and for similar data sets representing price fluctuations in financial markets), there is no doubt that fitting the data to a generalized normal distribution gives better results than fitting the data to a Laplace distribution. But I am not convinced that this is the best kind of probability distribution to use for modeling this type of data. In each of the examples above, the peak of the distribution implied by the histogram seems to be much more rounded than the density function of the fitted generalized normal distribution. Perhaps a Cauchy distribution might be a better alternative. And perhaps the numerical techniques used here could open the door to exploring the use of other types of probability distributions as well.
Comments