Davidson and MacKinnon Chapter 8—Instrumental Variables Estimation
Introduction
It is not always reasonable to assume that the error terms are innovations. There are commonly encountered situations in which the error terms are necessarily correlated with some of the regressors for the same observation.
Even in these circumstances, however, it is usually possible, although not always easy, to define an information set \(\Omega_{t}\) for each observation such that
\[\begin{eqnarray} \label{eqiv1} E(u_{t}|\Omega_{t})&=&0 \end{eqnarray}\]Any regressor of which the value in period \(t\) is correlated with \(u_{t}\) cannot belong to \(\Omega_{t}\).
Correlation Between Error Terms and Regressors
Two common situations in which the error terms will be correlated with the regressors. They are errors in variables,occurs whenever the independent variables in a regression model are measured with error and simultaneity, occurs whenever two or more endogenous variables are jointly determined by a system of simultaneous equations.
Errors in Variables
Measurement errors in the dependent variable of a regression model are generally of no great consequence, unless they are very large. However, measurement errors in the independent variables cause the error terms to be correlated with the regressors that are measured with error, and this causes OLS to be inconsistent.
Consider the model
\[\begin{eqnarray} \label{eqiv2} y_{t}^{\circ}=\beta_{1}+\beta_{2}x_{t}^{\circ}+u_{t}^{\circ}&,&u_{t}^{\circ}\sim \mbox{IID}(0,\sigma^{2}) \end{eqnarray}\]where the variable \(x_{t}^{\circ}\) and \(y_{t}^{\circ}\) are not actually observed. Instead, we observe
\[\begin{eqnarray} \label{eqiv3} x_{t}=x_{t}^{\circ}+v_{1t}&,&v_{1t}\sim \mbox{IID}(0,\omega_{1}^{2})\\ \label{eqiv4} y_{t}=y_{t}^{\circ}+v_{2t}&,&v_{2t}\sim \mbox{IID}(0,\omega_{2}^{2}) \end{eqnarray}\]and \(v_{1t}\) and \(v_{2t}\) are independent of \(x_{t}^{\circ}\), \(y_{t}^{\circ}\), and \(u_{t}^{\circ}\).
We substitute (\ref{eqiv3}) and (\ref{eqiv4}) into (\ref{eqiv2}) and find that
\[\begin{eqnarray} \nonumber y_{t}&=&\beta_{1}+\beta_{2}(x_{t}-v_{1t})+u_{t}^{\circ}+v_{2t}\\\nonumber &=&\beta_{1}+\beta_{2}x_{t}+u_{t}^{\circ}+v_{2t}-\beta_{2}v_{1t}\\ \label{eqiv5} &=&\beta_{1}+\beta_{2}x_{t}+u_{t} \end{eqnarray}\]where \(u_{t}=u_{t}^{\circ}+v_{2t}-\beta_{2}v_{1t}\)
For model (\ref{eqiv5}), the variance of error term is
\[\begin{eqnarray*} Var(u_{t})&=&\sigma^{2}+\textcolor{blue}{\omega_{2}^{2}}+\textcolor{red}{\beta_{2}^{2}\omega_{1}^{2}} \end{eqnarray*}\]The effect of the measurement error in the dependent variable is simply to increase the variance of the error terms.
The measurement error in the independent variable also increases the variance, but it has more severe consequence.
\[\begin{eqnarray} \label{eqiv6} E(u_{t}|x_{t})&=&E(u_{t}^{\circ}+v_{2t}-\beta_{2}v_{1t}|x_{t}^{\circ}+v_{1t})=\textcolor{red}{-\beta_{2}v_{1t}} \end{eqnarray}\]and
\[\begin{eqnarray*} Cov(x_{t},u_{t})&=&E(x_{t}u_{t})\\ &=&E(x_{t}E(u_{t}|x_{t}))\\ &=&-E((x_{t}^{\circ}+v_{1t})\beta_{2}v_{1t})\\ &=&-\beta_{2}\omega_{1}^{2} \end{eqnarray*}\]Simultaneous Equations
By microeconomics, suppose \(q_{t}\) is quantity and \(p_{t}\) is price, both of which would often be in logarithms (same in our case), we have
\[\begin{eqnarray} \label{eqiv7} q_{t} &=& \gamma_{d}p_{t} + X_{t}^{d}\beta_{d} + u_{t}^{d} \\ \label{eqiv8} q_{t} &=& \gamma_{s}p_{t} + X_{t}^{s}\beta_{d}+ u_{t}^{s} \end{eqnarray}\]\(X_{t}^{d}\) and \(X_{t}^{s}\) are row vectors of observations on exogenous or predetermined variables that appear, respectively, in the demand and supply functions. Equations (\ref{eqiv7}) and (\ref{eqiv8}) is called a linear simultaneous equations model. We solve these equations and get
\[\begin{eqnarray*} \left[\begin{array}{c} q_{t}\\ p_{t} \end{array}\right]&=& \left[\begin{array}{cc} 1&-\gamma_{d}\\ 1&-\gamma_{s} \end{array}\right]^{-1} \left(\left[\begin{array}{c} X_{t}^{d}\beta_{d}\\ X_{t}^{s}\beta_{s} \end{array}\right]+\left[\begin{array}{c} u_{t}^{d}\\ u_{t}^{s} \end{array}\right]\right) \end{eqnarray*}\]It can be seen from this solution the \(p_{t}\) and \(q_{t}\) will depend on both \(u_{t}^{d}\) and \(u_{t}^{s}\), and on every exogeneous and predetermined variable that appears in either the demand function, the supply function, or both.
Instrumental Variables Estimation
We focus on the linear regression model
\[\begin{eqnarray} \label{eqiv10} y&=&X\beta+u,\ \ \ E(uu^{T})=\sigma^{2}I \end{eqnarray}\]where at least one of the explanatory variables in the \(n\times k\) matrix \(X\) is assumed not to be predetermined with respect to the error terms.
We can form an \(n\times k\) matrix \(W\) with typical row \(W_{t}\) such that all its elements belong to \(\Omega_{t}\). The \(k\) variables given by the \(k\) columns of \(W\) are called instrumental variables, or simply instruments. The number of instruments may exceed the number of regressors.
Instrumental variables may be either exogenous or predetermined, and they should always include any columns of \(X\) that are exogenous or predetermined.
The Simple IV Estimator
For the linear model (\ref{eqiv10}), the moment conditions is
\[\begin{eqnarray} \label{eqiv11} W^{T}(y-X\beta)&=&0 \end{eqnarray}\]Solve equations (\ref{eqiv11}) to obtain the simple IV estimator
\[\begin{eqnarray} \label{eqiv12} \hat{\beta}_{IV}&\equiv&(W^{T}X)^{-1}W^{T}y \end{eqnarray}\]Whenever \(W_{t}\in\Omega_{t}\)
\[\begin{eqnarray} \label{eqiv13} E(u_{t}|W_{t})&=&0 \end{eqnarray}\]\(\hat{\beta}_{IV}\) is consistent and asymptotically normal under an identification condition. For asymptotical identification, this condition can be written as
\[\begin{eqnarray} \label{eqiv14} S_{W^{T}X}&\equiv&\mathop{plim} \frac{1}{n}W^{T}X \end{eqnarray}\]is deterministic and nonsingular.
It is easy to see that simple IV estimator is consistent.
\[\begin{eqnarray} \nonumber \hat{\beta}_{IV}&=&(W^{T}X)^{-1}W^{-1}y\\ \nonumber &=&(W^{T}X)^{-1}W^{-1}(X\beta_{0}+u)\\ \label{eqiv15} &=&\beta_{0}+(n^{-1}W^{T}X)^{-1}n^{-1}W^{T}u \end{eqnarray}\]Given the assumption (\ref{eqiv14}) of asymptotic identification, \(\hat{\beta}_{IV}\) is consistent if and only if
\[\begin{eqnarray} \label{eqiv151} \mathop{plim} \frac{1}{n}W^{T}u&=&0 \end{eqnarray}\]error terms are asymptotically uncorrelated with the instruments.
Efficiency Considerations
The asymptotic covariance matrix of \(n^{1/2}(\hat{\beta}_{IV}-\hat{\beta})\) is
\[\begin{eqnarray} \nonumber Var\left(\mathop{plim}_{n\to\infty}n^{1/2}(\hat{\beta}_{IV}-\hat{\beta})\right)&=&\sigma^{2}(S_{W^{T}X})^{-1}(S_{W^{T}W})(S_{W^{T}X}^{T})^{-1}\\\nonumber &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}W^{T}X)^{-1}n^{-1}W^{T}W(n^{-1}X^{T}W)^{-1}\\\nonumber &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}(n^{-1}X^{T}W(W^{T}W)^{-1}W^{T}X)^{-1}\\\label{eqiv17} &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}X^{T}P_{W}X)^{-1} \end{eqnarray}\]The Generalized IV Estimator
Let \(W\) denote an \(n\times l\) matrix of instruments.
- \(l>k\): overidentified
- \(l=k\): just/exactly identified
- \(l<k\): underidentified
Method: seek an \(l\times k\) matrix \(J\) such that the \(n\times k\) matrix \(WJ\) is a valid instrument matrix, where \(J\) minimizes the asymptotic covariance matrix of the estimator in the class of \(IV\) estimator obtained by instrument matrix \(WJ^{\ast}\) with arbitrary \(J^{\ast}\).
Three requirements for matrix \(J\)
- \(J\) has full column rank \(k\)
- \(J\) is at least asymptotically deterministic
- \(J\) is chosen to minimize the asymptotic covariance matrix of the resulting IV estimator
We choose \(J\) as
\[\begin{eqnarray*} WJ&=&P_{W}X=W(W^{T}W)^{-1}W^{T}X\\ J&=&(W^{T}W)^{-1}W^{T}X \end{eqnarray*}\]The moment condition can be written as
\[\begin{eqnarray*} (WJ)^{T}(y-X\beta)&=&0\\ X^{T}P_{W}(y-X\beta)&=&0 \end{eqnarray*}\]and the generalized IV (GIV) estimator is
\[\begin{eqnarray} \label{eqiv16} \hat{\beta}_{GIV}&=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}y \end{eqnarray}\]Asymptotic Distribution of IV Estimator
We replace \(W\) in (\ref{eqiv17}) by \(P_{W}X\), we find that
\[\begin{eqnarray*} Var\left(\mathop{plim}_{n\to\infty}n^{1/2}(\hat{\beta}_{GIV}-\hat{\beta})\right)&=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}X^{T}P_{P_{W}X}X)^{-1}\\ &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}X^{T}P_{W}X((P_{W}X)^{T}P_{W}X)^{-1}(P_{W}X)^{T}X)^{-1}\\ &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}X^{T}P_{W}X(X^{T}P_{W}X)^{-1}X^{T}P_{W}X)^{-1}\\ &=&\sigma^{2}\mathop{plim}_{n\to\infty}(n^{-1}X^{T}P_{W}X(X^{T})^{-1} \end{eqnarray*}\]The asymptotic covariance matrix is unchanged if \(W\) is replaced by \(P_{W}X\).
Two-Stage Least Squares
\[\begin{eqnarray*} \hat{\beta}_{GIV}&=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}y \end{eqnarray*}\]is commonly known as the two-stage least squares, or 2SLS, estimator.
- Stage 1: \(X\) is regressed on \(W\)
- Stage 2: The fitted values from the first-stage regressions to form the matrix \(P_{W}X\). Then the second-stage regression
The OLS estimate of \(\beta\) from this second-stage regression is
\[\begin{eqnarray*} \hat{\beta}_{2sls}&=&((P_{W}X)^{T}P_{W}X)^{-1}(P_{W}X)^{T}y\\ &=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}y \end{eqnarray*}\]We should be careful about the standard error the regression and the covariance matrix of the parameter estimates.
Testing Overidentifying Restrictions
The degree of overidentification of an overidentified linear regression model is defined to be \(\ell-k\). Such a model incorporates \(\ell-k\) overidentifying restrictions.
\(S(W)=S(P_{W}X,W^{\ast})\) where \(W^{\ast}\) is an \(n\times (\ell-k)\) matrix of extra instruments. The instrument exogeniety require that \(u_{t}\) has mean zero conditional on \((P_{w}X)_{t}\). \textcolor{blue}{The overidentifying restrictions require that the extra instruments should also satisfy it.} And it can always be tested.
Durbin-Wu-Hausman Tests
The null and alternative hypotheses for the DWH test is
\[\begin{eqnarray} \label{eqiv19} H_{0}&:&y=X\beta+u, \ u\sim \mbox{IID}(0,\sigma^{2}I),\ \ E(X^{T}u)=0\\ H_{0}&:&y=X\beta+u, \ u\sim \mbox{IID}(0,\sigma^{2}I),\ \ E(W^{T}u)=0 \end{eqnarray}\]Under \(H_{1}\), the IV estimator \(\hat{\beta}_{IV}\) is consistent, but OLS estimator \(\hat{\beta}_{OLS}\) is not. Under \(H_{0}\), both are consistent. Thus, \(\mathop{plim}(\hat{\beta}_{IV}-\hat{\beta}_{OLS})\) is zero under the null hypothesis and nonzero under the alternative. DWH test is to check whether the difference is significantly different from zero in the available sample.
\[\begin{eqnarray*} \hat{\beta}_{IV}-\hat{\beta}_{OLS}&=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}y-(X^{T}X)^{T}X^{T}y\\ &=&(X^{T}P_{W}X)^{-1}(X^{T}P_{W}y-X^{T}P_{W}X(X^{T}X)^{T}X^{T}y)\\ &=&(X^{T}P_{W}X)^{-1}(X^{T}P_{W}-X^{T}P_{W}X(X^{T}X)^{T}X^{T})y\\ &=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}(I-X(X^{T}X)^{T}X^{T})y\\ &=&(X^{T}P_{W}X)^{-1}X^{T}P_{W}M_{X}y\\ \end{eqnarray*}\]The first factor is a positive definite matrix. We test whether \(X^{T}P_{W}M_{X}y\) is significantly different from zero.
We partition the matrix of regressors \(X\)
\[\begin{eqnarray*} X&=&\left[\begin{array}{cc} Z&Y \end{array}\right] \end{eqnarray*}\]where the \(k_{1}\) columns of \(Z\) are included in the matrix of instruments \(W\), and \(k_{2}=k-k_{1}\) columns of \(Y\) are treated as potentially endogeneous. OLS residuals are orthogonal to all the columns of \(X\), in particular to those of \(Z\)
\[\begin{eqnarray*} Z^{T}P_{W}M_{x}y=(P_{W}Z)^{T}M_{X}y=Z^{T}M_{X}y=0 \end{eqnarray*}\]We only need to test \(Y^{T}P_{W}M_{x}y\), which will not in general be identically zero under \(H_{1}\), but should not differ from zero significantly under \(H_{1}\).
The easiest way to test \(Y^{T}P_{W}M_{x}y\) is significantly different from zero is to use \(F\) test for the \(k_{2}\) restriction \(\delta=0\) in the OLS regression
\[\begin{eqnarray} \label{eqiv20} y&=&X\beta+P_{W}Y\delta+u \end{eqnarray}\] \[\begin{eqnarray*} \hat{\delta}&=&(Y^{T}P_{W}M_{X}P_{W}Y)^{-1}Y^{T}P_{W}M_{x}y \end{eqnarray*}\]The degrees of freedom for \(F\) test is \(k_{2}\) and \(n-k-k_{2}\)