Jim Killingsworth

Least Squares and Nor­mal Dis­tri­b­u­tions

The method of least squares es­ti­mates the co­ef­fi­cients of a mod­el func­tion by min­i­miz­ing the sum of the squared er­rors be­tween the mod­el and the ob­served val­ues. In this post, I show the de­riva­tion of the pa­ra­me­ter es­ti­mates for a lin­ear mod­el. In ad­di­tion, I show that the max­i­mum like­li­hood es­ti­ma­tion is the same as the least squares es­ti­ma­tion when we as­sume the er­rors are nor­mal­ly dis­trib­ut­ed.

Least Squares Es­ti­ma­tion

Sup­pose we have a set of two-di­men­sion­al da­ta points that we ob­served by mea­sur­ing some kind of phe­nom­enon:

Figure 1

Al­so, sup­pose that the mea­sur­ing de­vice is in­ac­cu­rate. The da­ta we ob­served con­tain er­rors for val­ues on the ver­ti­cal ax­is. De­spite the er­rors, we know the cor­rect read­ings fall some­where on a line giv­en by the fol­low­ing lin­ear equa­tion:

Figure 2

For each da­ta point, we can com­pute the er­ror as the dif­fer­ence be­tween the ob­served val­ue and the cor­rect val­ue ac­cord­ing to the mod­el func­tion:

Figure 3

The er­rors can be pos­i­tive or neg­a­tive. Tak­ing the square of each er­ror al­ways yields a pos­i­tive num­ber. We can de­fine the sum of the squared er­rors like this:

Figure 4

Since the co­ef­fi­cients are un­known vari­ables, we can treat the sum of the squared er­rors as a func­tion of the co­ef­fi­cients:

Figure 5

To es­ti­mate the co­ef­fi­cients of the mod­el func­tion us­ing the least squares method, we need to fig­ure out what val­ues for the co­ef­fi­cients give us the small­est val­ue for the sum of the squared er­rors. We can find the min­i­mum by first tak­ing the par­tial de­riv­a­tive of the sum of squares func­tion with re­spect to each of the co­ef­fi­cients, set­ting the de­riv­a­tive to ze­ro, and then solv­ing for the co­ef­fi­cien­t. Here are the de­riv­a­tives with re­spect to each co­ef­fi­cien­t:

Figure 6

Set­ting the de­riv­a­tive with re­spect to the first co­ef­fi­cient to ze­ro, we get the fol­low­ing re­sult:

Figure 7

Re­ar­rang­ing the equa­tion and solv­ing for the co­ef­fi­cien­t:

Figure 8

Set­ting the de­riv­a­tive with re­spect to the sec­ond co­ef­fi­cient to ze­ro, we get the fol­low­ing re­sult:

Figure 9

Re­ar­rang­ing the equa­tion and solv­ing for the co­ef­fi­cien­t:

Figure 10

Each one of the co­ef­fi­cients is giv­en in terms of the oth­er. Since there are two equa­tions and two un­known­s, you can plug one in­to the oth­er to de­rive the fi­nal out­come. An­oth­er way to do this might be to treat the re­sults as a sys­tem of lin­ear equa­tions arranged as fol­lows:

Figure 11

The co­ef­fi­cients can then be found by solv­ing the fol­low­ing ma­trix equa­tion:

Figure 12

This method is per­haps a clean­er ap­proach. It can al­so work well for mod­el func­tions with many co­ef­fi­cients, such as high­er or­der poly­no­mi­als or mul­ti­vari­able func­tion­s.

Max­i­mum Like­li­hood Es­ti­ma­tion

Now let’s as­sume the er­rors are nor­mal­ly dis­trib­ut­ed. That is to say, the ob­served val­ues are nor­mal­ly dis­trib­uted around the mod­el. The prob­a­bil­i­ty den­si­ty func­tion for the nor­mal dis­tri­b­u­tion looks like this:

Figure 13

Here we treat our mod­el as the mean. We al­so con­sid­er the stan­dard de­vi­a­tion, de­pict­ed by sig­ma, which mea­sures the spread of the da­ta around the mean. Giv­en our ob­served da­ta points, we want to fig­ure out what the most like­ly val­ues are for the mean and stan­dard de­vi­a­tion. For a sin­gle da­ta point alone, the like­li­hood func­tion for a giv­en mean and stan­dard de­vi­a­tion is:

Figure 14

The like­li­hood is equal to the prob­a­bil­i­ty den­si­ty. For all da­ta points com­bined, the like­li­hood func­tion for a giv­en mean and stan­dard de­vi­a­tion is equal to the prod­uct of the den­si­ty at each in­di­vid­ual da­ta point:

Figure 15

At this point, we just need to find the mean and stan­dard de­vi­a­tion val­ues that max­i­mize the like­li­hood func­tion. Sim­i­lar to what we did in the pre­vi­ous sec­tion, we can find the max­i­mum by tak­ing the par­tial de­riv­a­tive of the like­li­hood func­tion with re­spect to each of the co­ef­fi­cients, set­ting the de­riv­a­tive to ze­ro, and then solv­ing for the co­ef­fi­cients. This might be eas­i­er to do if we first take the nat­ural log­a­rithm of the like­li­hood func­tion:

Figure 16

Since we’re in­ter­est­ed in find­ing the co­ef­fi­cients of the mod­el func­tion, we can re­place the mean pa­ra­me­ter with the body of the mod­el func­tion and treat the like­li­hood func­tion as a func­tion of the co­ef­fi­cients:

Figure 17

Let’s call this the log-like­li­hood func­tion. Since the nat­ural log­a­rithm func­tion is a monoton­i­cal­ly in­creas­ing func­tion, we can max­i­mize the log-like­li­hood func­tion and get the same re­sult we would get if we max­i­mized the orig­i­nal like­li­hood func­tion. Here are the par­tial de­riv­a­tives of the log-like­li­hood func­tion with re­spect to each of the co­ef­fi­cients:

Figure 18

Set­ting the de­riv­a­tive with re­spect to the first co­ef­fi­cient to ze­ro, we get the fol­low­ing re­sult:

Figure 19

Re­ar­rang­ing the equa­tion and solv­ing for the co­ef­fi­cien­t:

Figure 20

Set­ting the de­riv­a­tive with re­spect to the sec­ond co­ef­fi­cient to ze­ro, we get the fol­low­ing re­sult:

Figure 21

Re­ar­rang­ing the equa­tion and solv­ing for the co­ef­fi­cien­t:

Figure 22

As you can see, we get at the same re­sults we got from us­ing the method of least squares to es­ti­mate the co­ef­fi­cients. For com­plete­ness, the same pro­ce­dure can be used to find the stan­dard de­vi­a­tion:

Figure 23

Set­ting the de­riv­a­tive to ze­ro, we get the fol­low­ing re­sult for the stan­dard de­vi­a­tion:

Figure 24

Re­ar­rang­ing the equa­tion and solv­ing for sig­ma:

Figure 25

Note that this re­sult may yield a bi­ased es­ti­mate of the stan­dard de­vi­a­tion when com­put­ing the val­ue based on a lim­it­ed num­ber of sam­ples. It might be more ap­pro­pri­ate to use an un­bi­ased es­ti­ma­tor that takes the num­ber of de­grees of free­dom in­to con­sid­er­a­tion. But that’s out of scope for this post. Per­haps it’s a top­ic I’ll ex­plore at an­oth­er time.

Com­ments

Show comments