Davidson and MacKinnon Chapter 7—Generalized Least Squares and Related Topics
Introduction
We will be concerned with the model
\[\begin{eqnarray} \label{eq7.1} y&=&X\beta+u\ \ ,\ \ E(uu^{T})=\Omega \end{eqnarray}\]where \(\Omega\) is the covariance matrix of the error terms, is a positive definite \(n\times n\) matrix. We consider following cases for \(\Omega\):
- \(\Omega=\sigma^{2}I\): (\ref{eq7.1}) is the classic linear regression model
- \(\Omega\) is diagonal with nonconstant diagonal elements: heteroskedastic
- \(\Omega\) is not diagonal
The GLS Estimator
We transform the model (\ref{eq7.1}) so that the transformed model satisfied the conditions of the Gauss-Markov theorem. Let \(\Psi\) be a \(n\times n\) triangular matrix that satisfies
\[\begin{eqnarray} \label{eq7.2} \Omega^{-1}&=&\Psi\Psi^{T} \end{eqnarray}\]Premultiplying (\ref{eq7.1}) by \(\Psi^{T}\) gives
\[\begin{eqnarray} \label{eq7.3} \Psi^{T}y&=&\Psi^{T}X\beta+\Psi^{T}u \end{eqnarray}\]The OLS estimator of \(\beta\) from regression (\ref{eq7.3}) is
\[\begin{eqnarray} \label{eq7.4} \hat{\beta}_{GLS}&=&\left(X^{T}\Psi\Psi^{T}X\right)^{-1}X^{T}\Psi\Psi^{T}y=\left(X^{T}\Omega^{-1}X\right)^{-1}X^{T}\Omega^{-1}y \end{eqnarray}\]This estimator is called the generalized least squares, or GLS, estimator of \(\beta\)
It is easy to show that the covariance matrix of transformed model (\ref{eq7.3}) is the identity matrix
\[\begin{eqnarray*} E(\Psi^{T}uu^{T}\Psi)&=&\Psi^{T}E(uu^{T})\Psi\\ &=&\Psi^{T}\Omega\Psi\\ &=&\Psi^{T}(\Psi\Psi^{T})^{-1}\Psi\\ &=&\Psi^{T}(\Psi^{T})^{-1}\Psi^{-1}\Psi=I\\ \end{eqnarray*}\]And we find
\[\begin{eqnarray} \label{eq7.5} Var(\hat{\beta}_{GLS})&=&(X^{T}\Psi\Psi^{T}X)^{-1}=(X^{T}\Omega^{-1}X)^{-1} \end{eqnarray}\]The generalized least squares estimator \(\hat{\beta}_{GLS}\) can also be obtained by minimizing the GLS criterion function
\[\begin{eqnarray} \nonumber (\Psi^{T}u)^{T}\Psi^{T}u&=&(\Psi^{T}y-\Psi^{T}X\beta)^{T}(\Psi^{T}y-\Psi^{T}X\beta)\\ \nonumber &=&(y-X\beta)^{T}\Psi\Psi^{T}(y-X\beta)\\ \label{eq7.6} &=&(y-X\beta)^{T}\Omega^{-1}(y-X\beta) \end{eqnarray}\]Efficiency of the GLS Estimator
The GLS estimator \(\hat{\beta}_{GLS}\) defined in (\ref{eq7.4}) is also the solution of the set of moment conditions
\[\begin{eqnarray} \nonumber (\Psi^{T}X)^{T}(\Psi^{T}y-\Psi^{T}X\beta)&=&0\\ \nonumber X^{T}\Psi\Psi^{T}(y-X\beta)&=&0\\ \label{eq7.7} X^{T}\Omega^{-1}(y-X\beta_{GLS})&=&0 \end{eqnarray}\]GLS estimator is a method of moments estimator. A general MM estimator for the linear regression model (\ref{eq7.1}) is defined in terms of an \(n\times k\) matrix of exogenous variables \(W\)
\[\begin{eqnarray} \label{eq7.8} W^{T}(y-X\beta)&=&0 \end{eqnarray}\]We obtain the MM estimator
\[\begin{eqnarray} \label{eq7.9} \hat{\beta}_{W}&\equiv&(W^{T}X)^{-1}W^{T}y \end{eqnarray}\]The GLS estimator (\ref{eq7.4}) is a special case of the MM estimator, with \(W=\Omega^{-1}X\).
We can rewrite \(\hat{\beta}_{W}=\beta_{0}+(W^{T}X)^{-1}W^{T}u\). Thus, the covariance matrix of \(\hat{\beta}_{W}\) is
\[\begin{eqnarray} \nonumber Var(\hat{\beta}_{W})&=&E\left((\hat{\beta}_{W}-\beta_{0})(\hat{\beta}_{W}-\beta_{0})^{T}\right)\\ \nonumber &=&E\left((W^{T}X)^{-1}W^{T}uu^{T}W(X^{T}W)^{-1}\right)\\ \label{eq7.10} &=&(W^{T}X)^{-1}W^{T}\Omega W(X^{T}W)^{-1} \end{eqnarray}\]The efficiency of the GLS estimator can be verified by showing that the difference between (\ref{eq7.10}), the covariance matrix of the MM estimator \(\hat{\beta}_{W}\), and (\ref{eq7.5}), the covariance matrix of GLS estimator, \(\hat{\beta}_{GLS}\), is a positive semidefinite matrix. That is, the matrix
\[\begin{eqnarray} \label{eq7.11} (W^{T}X)^{-1}W^{T}\Omega W(X^{T}W)^{-1} &-&(X^{T}\Omega^{-1}X)^{-1} \end{eqnarray}\]is a positive semidefinite matrix.
By the theory of linear algebra, (\ref{eq7.11}) is positive semidefinite if and only if
\[\begin{eqnarray*} \left((X^{T}\Omega^{-1}X)^{-1}\right)^{-1}&-&\left((W^{T}X)^{-1}W^{T}\Omega W(X^{T}W)^{-1}\right)^{-1}\\ X^{T}\Omega^{-1}X&-&X^{T}W(W^{T}\Omega W)^{-1}W^{T}X\\ \end{eqnarray*}\]is positive semidefinite.
The GLS estimator \(\hat{\beta}_{GLS}\) is typically more efficient than the more general MM estimator \(\hat{\beta}_{W}\). Because the OLS estimator \(\hat{\beta}_{OLS}\) is just special case of \(\hat{\beta}_{W}\) when \(W=X\), we could conclude that \(\hat{\beta}_{GLS}\) will in most cases be more efficient, and will never be less efficient, than the OLS estimator \(\hat{\beta}_{OLS}\).
Computing GLS Estimates
Suppose \(\Omega=\sigma^{2}\Delta\), and \(\Delta\) is known and \(\sigma^{2}\) is unknown. If we replace \(\Omega\) by \(\Delta\) in (\ref{eq7.2}) of \(\Psi\),
\[\begin{eqnarray*} \Delta^{-1}&=&\Psi\Psi^{T} \end{eqnarray*}\]we still run regression (\ref{eq7.3})
\[\begin{eqnarray*} \Psi^{T}y&=&\Psi^{T}X\beta+\Psi^{T}u \end{eqnarray*}\]The GLS estimates will be the same whether we use \(\Omega\) or \(\Delta\).
However, if \(\sigma^{2}\) is known, we can use the true covariance matrix (\ref{eq7.5}). Otherwise, we must estimate covariance matrix
\[\begin{eqnarray} \label{eq7.12} \widehat{Var}(\hat{\beta}_{GLS})&=&s^{2}(X^{T}\Delta^{-1}X)^{-1} \end{eqnarray}\]Weighted Least Squares
GLS estimation will be easy to do if the matrix \(\Psi\) is known and allow us to calculate \(\Psi^{T}x\).
When error terms are heteroskedastic but uncorrelated, i.e., \(\Omega\) is diagonal. Let \(\omega_{t}^{2}\) denote the \(t^{th}\) diagonal element of \(\Omega\), and \(\omega_{t}^{-2}\) is the \(t^{th}\) diagonal element of \(\Omega^{-1}\). \(\Psi\) can be chosen as the diagonal matrix with \(t^{th}\) diagonal element \(\omega_{t}^{-1}\)
\[\begin{eqnarray*} \Omega=\left[\begin{array}{cccc} \omega_{1}^{2}&0&\cdots&0\\ 0&\omega_{2}^{2}&\cdots&0\\ \vdots&\vdots&\ddots &\vdots\\ 0&0&\cdots&\omega_{k}^{2} \end{array} \right]&,&\Omega^{-1}=\left[\begin{array}{cccc} \omega_{1}^{-2}&0&\cdots&0\\ 0&\omega_{2}^{-2}&\cdots&0\\ \vdots&\vdots&\ddots &\vdots\\ 0&0&\cdots&\omega_{k}^{-2} \end{array} \right] \end{eqnarray*}\]and
\[\begin{eqnarray*} \Psi&=&\left[\begin{array}{cccc} \omega_{1}^{-1}&0&\cdots&0\\ 0&\omega_{2}^{-1}&\cdots&0\\ \vdots&\vdots&\ddots &\vdots\\ 0&0&\cdots&\omega_{k}^{-1} \end{array} \right] \end{eqnarray*}\]Thus we see that, for a typical observation, regression (\ref{eq7.3}) can be written as
\[\begin{eqnarray} \label{eq7.13} \omega_{t}^{-1}y_{t}&=&\omega_{t}^{-1}X_{t}\beta+\omega_{t}^{-1}u_{t} \end{eqnarray}\]the variance of the error term is clearly 1.
This special case of GLS estimation is often called weighted least squares, or GLS.
Feasible Generalized Least Squares
In practice, the covariance matrix \(\Omega\) is often not known even up to a scalar factor. This makes it impossible to compute GLS estimates.
However, it is reasonable to suppose that \(\Omega\), or \(\Delta\), depends in a known way on a vector of unknown parameters \(\gamma\). If so, it may be possible to estimate \(\gamma\) consistently, so as to obtain \(\Omega(\hat{\gamma})\). Then \(\Psi(\hat{\gamma})\) can be defined in (\ref{eq7.2}), and GLS estimates computed conditional on \(\Psi(\hat{\gamma})\). This type of procedure is called feasible generalized least squares, or feasible GLS.
For example, suppose the feasible GLS estimates of the linear regression model is
\[\begin{eqnarray} \label{eq7.14} y_{t}=X_{t}\beta+u_{t}&,&E(u_{t}^{2})=\exp(Z_{t}\gamma) \end{eqnarray}\]where \(\beta\) and \(\gamma\) are a \(k\)-vector and an \(l\)-vector of unknown parameters, respectively. \(X_{t}\) and \(Z_{t}\) are comformably dimensioned row vectors of observation set on which we are conditioning. Some of the elements of \(Z_{t}\) may well belong to \(X_{t}\). The function \(\exp(Z_{t}\gamma)\) is an example of a skedastic function.
Steps to do FGLS
- 
    To obtain consistent estimates of \(\gamma\), we must first obtain consistent estimates of the error terms in (\ref{eq7.14}). 
- 
    Computing OLS estimates \(\hat{\beta}\) allows us to calculate a vector of OLS residuals with typical element \(\hat{u}_{t}\). 
- 
    Run auxiliary linear regression 
to find OLS estimates \(\hat{\gamma}\).
- Compute
- Feasible GLS of \(\beta\) are obtained by using ordinary least squares to estimate regression (\ref{eq7.13}), with the estimates \(\hat{\omega}_{t}\) replacing the unknown \(\omega_{t}\).
Why FGLS Works
Under suitable regularity conditions, it can be shown that this type of procedure leads a feasible GLS estimator \(\hat{\beta}_{F}\) that is consistent and asymptotically equivalent to the GLS estimator \(\hat{\beta}_{GLS}\).
Heteroskedasticity
If we have no information on the form of the skedastic function, it may be prudent to employ an HCCME, especially if the sample size is large.
If we have information on the form of the skedastic function, we might well wish to use weighted least squares.
It makes no difference asymptotically whether the \(\omega_{t}\) are known or merely estimated consistently, although it can certainly make a substantial difference in finite samples. Asymptotically, at least, the usual OLS covariance matrix is just as valid with feasible WLS as with WLS.
Testing for Heteroskedasticity
Before doing so, it is advisable to perform a specification test of the \underline{null hypothesis that the error terms are homoskedastic against whatever heteroskedastic} alternatives may seem reasonable.
White showed that, in a linear regression model, if \(E(u_{t}^{2})\) is constant conditional on the squares and cross-products of all the regressors, then there is no need to use an HCCME.
Autoregressive Processes
The error terms for nearby observations may be correlated, or may appear to be correlated. This phenomenon is most commonly encountered in models estimated with time-series data, where it is known as serial correlation or autocorrelation.
If there is reason to believe that serial correlation may be present
- Step 1: Test the null hypothesis that the errors are serially uncorrelated against a plausible alternative that involves serial correlation.
- Step 2: If evidence of serial correlation is found, estimate a model that accounts for it based on NLS and GLS.
- Step 3: Verify that the model which accounts for serial correlation is compatible with the data
The AR(1) Process
One of the simplest and most commonly used stochastic processes is the first-order autoregressive process, or AR(1) process, which can be written as
\[\begin{eqnarray} \label{eq7.16} u_{t}=\rho u_{t-1}+\varepsilon_{t}&,&\varepsilon_{t}\sim \mbox{IID}(0,\sigma_{\varepsilon})^{2},\ \ |\rho|<1 \end{eqnarray}\]It is assumed that \(\varepsilon_{t}\) is independent of \(\varepsilon_{s}\) for all \(s\neq t\), \(\varepsilon_{t}\) is an innovation.
| $$ | \rho | <1$$ is called a stationarity condition, because it is necessary for the AR(1) process to be stationary. | 
The covariance stationarity/wide sense stationarity is defined as \(E(u_{t})\) and \(Var(u_{t}\) exist and are independent of \(t\) and if the covariance \(Cov(u_{t},u_{t-j})\) is independent of \(t\).
We can then compute the variance of \(u_{t}\) by substituting successively for \(u_{t-1}\), \(u_{t-2}\), \(u_{t-3}\), and so on in (\ref{eq7.16})
\[\begin{eqnarray} \label{eq7.17} u_{t}&=&\varepsilon_{t}+\rho\varepsilon_{t-1}+\rho^{2}\varepsilon_{t-2}+\rho^{3}\varepsilon_{t-3}+\cdots \end{eqnarray}\]The variance of \(u_{t}\) is seen to be
\[\begin{eqnarray} \label{eq7.18} \sigma_{u}^{2}&=&\sigma_{\varepsilon}^{2}+\rho^{2}\sigma_{\varepsilon}^{2}+\rho^{4}\sigma_{\varepsilon}^{2}+\rho^{6}\sigma_{\varepsilon}^{2}+\cdots=\frac{\sigma_{\varepsilon}^{2}}{1-\rho^{2}} \end{eqnarray}\]We can get that \(Var(u_{t})=\sigma_{\varepsilon}^{2}/(1-\rho^{2})\) for all \(t\).
The covariance for \(u_{t}\) and \(u_{t-1}\) of the AR(1) is
\[\begin{eqnarray*} Cov(u_{t},u_{t-1})&=&E(u_{t}u_{t-1})\\ &=&E((\rho u_{t-1}+\varepsilon_{t})u_{t-1})\\ &=&\rho \sigma_{u}^{2} \end{eqnarray*}\]If the AR(1) process (\ref{eq7.16}) is stationary, the covariance matrix of the vector \(u\) can be written as
\[\begin{eqnarray} \label{eq7.19} \Omega(\rho)&=&\frac{\sigma_{\varepsilon}^{2}}{1-\rho^{2}}\left[ \begin{array}{ccccc} 1&\rho&\rho^{2}&\cdots&\rho^{n-1}\\ \rho&1&\rho^{2}&\cdots&\rho^{n-2}\\ \vdots&\vdots&\vdots&&\vdots\\ \rho^{n-1}&\rho^{n-2}&\rho^{n-3}&\cdots&1 \end{array} \right] \end{eqnarray}\]Testing for Serial Correlation
- Run OLS and compute OLS residuals \(\hat{u}=y-X \hat{\beta}_{OLS}\)
- Auxiliary regression of OLS residuals \(\hat{u}\) on lagged values
- \(t\) test for \(\rho=0\)
FGLS for Autoregression Process
If the \(u_{t}\) follow a stationary AR(1) process, that is, if \(|\rho|<1\) and \(Var(u_{t})=\sigma_{u}^{2}=\sigma_{\varepsilon}^{2}/(1-\rho^{2})\), then the covariance matrix of the entire vector \(\hat{u}\) is \(\Omega(\rho)\) defined in (\ref{eq7.19}).
Moving Average Processes
The MA(1) Process
\[\begin{eqnarray} \label{eq7.20} u_{t}=\varepsilon_{t}+\alpha_{1}\varepsilon_{t-1}&,&\varepsilon_{t}\sim \mbox{IID}(0,\sigma_{\varepsilon}^{2}) \end{eqnarray}\]\(u_{t}\) is a weighted average of two successive innovations, \(\varepsilon_{t}\) and \(\varepsilon_{t-1}\).
The covariance matrix for an MA(1) process:
\[\begin{eqnarray*} Var(u_{t})&=&E\left((\varepsilon_{t}+\alpha_{1}\varepsilon_{t-1})^{2}\right)=\sigma_{\varepsilon}^{2}+\alpha_{1}^{2}\sigma_{\varepsilon}^{2}=(1+\alpha_{1}^{2})\sigma_{\varepsilon}^{2}\\ Cov(u_{t},u_{t-1})&=&E\left((\varepsilon_{t}+\alpha_{1}\varepsilon_{t-1})(\varepsilon_{t-1}+\alpha_{1}\varepsilon_{t-2})\right)=\alpha_{1}\sigma_{\varepsilon}^{2}\\ Cov(u_{t},u_{t-j})&=&E\left((\varepsilon_{t}+\alpha_{1}\varepsilon_{t-1})(\varepsilon_{t-j}+\alpha_{1}\varepsilon_{t-j-1})\right)=0 \end{eqnarray*}\]Therefore, the covariance matrix of the entire vector \(\hat{u}\) is
\[\begin{eqnarray} \label{eq7.21} \sigma_{\varepsilon}\Delta(\alpha_{1})&=&\sigma_{\varepsilon}^{2}\left[\begin{array}{cccccc} 1+\alpha^{2} &\alpha &0 &\cdots &0 &0 \\ \alpha &1+\alpha^{2} &\alpha &\cdots &0 &0\\ \vdots &\vdots &\vdots & &\vdots &\vdots\\ 0 &0 &0 &\cdots&\alpha &1+\alpha^{2} \end{array}\right] \end{eqnarray}\]Models for Panel Data
We restrict our attention to the linear regression model
\[\begin{eqnarray} \label{eq7.10.1} y_{it}&=&X_{it}\beta+u_{it} \end{eqnarray}\]where \(i=1,\dots, m\), \(t=1,\dots, T\), and \(X_{it}\) is a \(1\times k\) vector.
If certain shocks affect the same cross-sectional unit at all points in time, the error terms \(u_{it}\) and \(u_{is}\) will be correlated for all \(t \neq s\). Similarly, if certain shocks affect all cross-sectional units at the same point in time, the error terms \(u_{it}\) and \(u_{j}\)t will be correlated for all \(i \neq j\).
Error-Components Models
The idea is to specify the error term \(u_{it}\) in (\ref{eq7.10.1}) as consisting of two or three separate shocks, each of which is assumed to be independent of the others.
\[\begin{eqnarray} \label{eq7.10.2} u_{it}&=&e_{t}+v_{i}+\varepsilon_{it} \end{eqnarray}\]It is generally assumed that the \(e_t\) are independent across \(t\), the \(v_i\) are independent across \(i\), and the \(\varepsilon_{it}\), it are independent across all \(i\) and \(t\).
If the \(e_t\) and \(v_i\) are thought of as fixed effects, then they are treated as parameters to be estimated.
If they are thought of as random effects, then we must figure out the covariance matrix of the \(u_{it}\) as functions of the variances of the \(e_{t}\), \(v_{i}\), and \(\varepsilon_{it}\), and use feasible GLS.
Random-Effects Estimation
Random-effects estimation requires that the \(v_{i}\) should be independent of \(X\). Then we have
\[\begin{eqnarray} \label{eq7.10.3} E(u_{it}|X)&=&E(v_{i}+\varepsilon_{it}|X)=0 \end{eqnarray}\]However \(u_{it}\) are not IID. Assuming that \(v_{i}\sim IID(0, \sigma^{2}_{v})\), \(e_{t}\), and shocks are independent, we find that
\[\begin{eqnarray*} Var(u_{it})&=&\sigma_{v}^{2}+\sigma_{\varepsilon}^{2}\\ Cov(u_{it}, u_{is})&=&\sigma_{v}^{2}\\ Cov(u_{it}, u_{js})&=&0 \end{eqnarray*}\]These define the elements of the covariance matrix \(\Omega\)
\[\begin{eqnarray*} \Omega&=&\left[\begin{array}{cccc} \Sigma&0&\cdots&0\\ 0&\Sigma&\cdots&0\\ \vdots&\vdots&&\vdots\\ 0&0&\cdots&\Sigma\\ \end{array}\right] \end{eqnarray*}\]where
\[\begin{eqnarray} \label{eq7.10.4} \Sigma&=&\sigma_{\varepsilon}^{2}I_{T}+\sigma_{v}^{2}\iota\iota^{T} \end{eqnarray}\]Properties of OLS estimator in Random Effects Model
Under the assumption that \(E(u|X)=0\), the OLS estimator in random effects model is unbiased and consistent. However, since the error term is heteroskedastic, OLS estimator is not efficient.