Weighted Linear Regression
When doing a regression analysis, you might want to weight some data points more heavily than others. For example, when fitting a model to historic stock price data, you might want to assign more weight to recently observed price values. In this post, I demonstrate how to estimate the coefficients of a linear model using weighted least squares regression. As with the previous post, I also show an alternative derivation using the maximum likelihood method.
Least Squares Estimation
Suppose we have a set of data points that we expect to fall on a line given by the following linear equation:
The observed data, however, contain errors for values on the vertical axis. For each data point, we define the error as the difference between the observed value and the fitted value of the linear model:
If we were performing an ordinary least squares regression, we would want to find the coefficients for the linear model that minimize the sum of the squared errors. But in this case, we want to consider the weighted sum of squares:
There is a unique weight associated with the error of each observation. Some values are counted more than others, depending on the scheme used to determine the weights. Let’s treat the weighted sum of squares as a function of the coefficients:
Following the same approach we used in the previous post, we can estimate the coefficients of the model function by finding the values that minimize the weighted sum of squares. We take the partial derivative of the weighted sum of squares function with respect to each of the coefficients, set the derivative to zero, and then solve for the coefficient. Here are the derivatives with respect to each coefficient:
Setting the derivative with respect to the first coefficient to zero, we get the following result:
Rearranging the equation and solving for the coefficient:
Setting the derivative with respect to the second coefficient to zero, we get the following result:
Rearranging the equation and solving for the coefficient:
If you plug in the weights and the observed values, finding the coefficients is fairly straightforward. Notice that if all the weights are equal, the result is the same as the ordinary least squares method presented in the previous post.
Maximum Likelihood Estimation
Let’s assume the errors are normally distributed around the model. Recall the probability density function for the normal distribution:
For a given set of observations, we know the likelihood of a particular mean and standard deviation value is the product of the probability density of each observation given that particular mean and standard deviation. But how do we weight one observation differently than another? For each observation, we can raise the probability density to the power of the weight associated with that observation:
If the weight of one observation is twice that of all the others, for example, then it is treated as if the measurement had appeared twice in the observed data set. The estimated mean and standard deviation values can be found by maximizing the likelihood function. To make things easier, we can work with the log-likelihood function instead:
Let’s replace the mean with the body of the model function and treat the log-likelihood function as a function of the coefficients we want to solve for:
Now we can find the maximum of the log-likelihood function and solve for the coefficients using the same approach as before. Here are the partial derivatives of the log-likelihood function with respect to each of the coefficients:
Setting the derivative with respect to the first coefficient to zero, we get the following result:
Rearranging the equation and solving for the coefficient:
Setting the derivative with respect to the second coefficient to zero, we get the following result:
Rearranging the equation and solving for the coefficient:
As expected, the weighted maximum likelihood estimation gives the same result as the weighted least squares estimation when we assume the errors are normally distributed. While I could go a step further and solve for standard deviation, I’m going to stop here. I’d like to do a more in-depth study of variances at another time.
Comments