Introduction

Once we are confident that a model is correctly specified and incorporates whatever restrictions are appropriate, we often want to make inferences about the values of some of the parameters that appear in the model. It is usually more convenient to construct confidence intervals for the individual parameters of specific interest.

By definition, a confidence interval is an interval of the real line that contains all values $\theta_{0}$ for which the hypothesis that $\theta = \theta_{0}$ is not rejected by the appropriate test in the family. For level $\alpha$, a confidence interval so obtained is said to be at $1-\alpha$ confidence interval, or to be at confidence level $1-\alpha$.

Unlike the parameters we are trying to make inferences about, confidence intervals are random. Every different sample that we draw from the same DGP will yield a different confidence interval.

The probability that the random interval will include, or cover, the true value of the parameter is called the coverage probability, or just the coverage, of the interval.

Exact and Asymptotic Confidence Intervals

A confidence interval for some scalar parameter $\theta$ consists of all values $\theta_{0}$ for which the hypothesis $\theta = \theta_{0}$ cannot be rejected at some specified level $\alpha$. Thus, as we will see in a moment, we can construct a confidence interval by inverting a test statistic.

Let us denote the test statistic for the hypothesis that $\theta = \theta_{0}$ by the random variable $\tau(y,\theta_{0})$. Here $y$ denotes the sample used to compute the particular realization of the statistic.

If we write the critical value as $c_{\alpha}$, then for any $\theta_{0}$, we have by the definition of $c_{\alpha}$ that

\[\begin{eqnarray} \label{eq5.1} \Pr_{\theta_{0}}(\tau(y,\theta_{0})\leq c_{\alpha})&=&1-\alpha \end{eqnarray}\]

For $\theta_{0}$ to belong to the confidence interval obtained by inverting the family of test statistics $\tau(y,\theta_{0})$, it is necessary and sufficient that

\[\begin{eqnarray} \label{eq5.2} \tau(y,\theta_{0})&\leq& c_{\alpha} \end{eqnarray}\]

Thus the limits of the confidence interval can be found by solving the equation for $\theta$

\[\begin{eqnarray} \label{eq5.3} \tau(y,\theta_{0})&=& c_{\alpha} \end{eqnarray}\]

This equation normally has two solutions: upper limit $\theta_{u}$ and lower limit $\theta_{l}$.

If $c_{\alpha}$ is an exact critical value for the test statistic $\tau(y,\theta_{0})$ at level $\alpha$, then the confidence interval $[\theta_{l}, \theta_{u}]$ constructed in this way will have coverage $1-\alpha$, as desired.

Quantiles

The $\alpha$ quantile of $F$, $q_{\alpha}$, satisfies the equation

\[\begin{eqnarray*} F(q_{\alpha})&=&\alpha \end{eqnarray*}\]

The 0.5 quantile of a distribution is often called the median.

Asymptotic Confidence Intervals

Let us suppose that

\[\begin{eqnarray} \label{eq5.4} \tau(y,\theta_{0})&=&\left(\frac{\hat{\theta}-\theta_{0}}{s_{\theta}}\right)^{2} \end{eqnarray}\]

Thus $\tau(y,\theta_{0})$ is the square of the $t$ statistic for the null hypothesis that $\theta=\theta_{0}$. If $\hat{\theta}$ were an OLS estimate of a regression coefficient, then under some conditions, the test statistic defined in (\ref{eq5.4}) would be asymptotically distributed as $\chi^{2}(1)$ under the null hypothesis.

For the test statistic (\ref{eq5.4}), equation (\ref{eq5.3}) becomes

\[\begin{eqnarray} \nonumber \left(\frac{\hat{\theta}-\theta_{0}}{s_{\theta}}\right)^{2}&=&c_{\alpha}\\ \label{eq5.5} |\hat{\theta}-\theta_{0}|&=&s_{\theta}c_{\alpha}^{1/2}\\ \nonumber \theta&=&\hat{\theta}\pm s_{\theta}c_{\alpha}^{1/2} \end{eqnarray}\]

So the asymptotic $1-\alpha$ confidence interval for $\theta$ is

\[\begin{eqnarray} \label{eq5.6} \Big[\hat{\theta}- s_{\theta}c_{\alpha}^{1/2}&,&\hat{\theta}+ s_{\theta}c_{\alpha}^{1/2}\Big] \end{eqnarray}\]

Asymmetric Confidence Intervals

The confidence interval (\ref{eq5.6}) is a symmetric one, because $\theta_{l}$ is as far below $\hat{\theta}$ as $\theta_{u}$ is above it. Not all confidence intervals are symmetric.

It is possible to construct confidence intervals based on one-tailed tests. Such an interval will be open all the way out to infinity in one direction.

$P$ Values and Asymmetric Distributions

$P$ value is defined, as usual, as the smallest level for which the test rejects.

The $P$ value associated with a statistic $\tau$ should be

\[\begin{eqnarray*} \text{$P$-value}&=& \begin{cases} 2F(\tau) & \tau \text{is in the lower tail} \\ 2(1-F(\tau)) & \tau \text{is in the upper tail} \\ \end{cases} \end{eqnarray*}\]

In complete generality, we have the $P$ value is

\[\begin{eqnarray} \label{eq5.7} p(\tau)&=&2\min (F(\tau), 1-F(\tau)) \end{eqnarray}\]

Exact Confidence Intervals for Regression Coefficients

The $t$ statistic for the hypothesis that $\beta_{2} = \beta_{20}$ for any particular value $\beta_{20}$ can be written as

\[\begin{eqnarray} \label{eq5.8} \frac{\hat{\beta}_{2}-\beta_{20}}{s_{2}} \end{eqnarray}\]

the $t$ statistic (\ref{eq5.8}) has the $t(n-k)$ distribution, and so

\[\begin{eqnarray} \label{eq5.9} \Pr\left(t_{\alpha/2}\leq \frac{\hat{\beta}_{2}-\beta_{20}}{s_{2}}\leq t_{1-(\alpha/2)}\right)&=&1-\alpha \end{eqnarray}\]

where $t_{\alpha/2}$ denote the $\alpha/2$ quantiles of $t(n-k)$ distribution.

Therefore, the confidence interval is

\[\begin{eqnarray} \label{eq5.10} \Big[\hat{\beta}_{2}-s_{2}t_{1-(\alpha/2)}&,&\hat{\beta}_{2}-s_{2}t_{\alpha/2}\Big]\\ \label{eq5.11} \Big[\hat{\beta}_{2}-s_{2}t_{1-(\alpha/2)}&,&\hat{\beta}_{2}+t_{1-(\alpha/2)}\Big] \end{eqnarray}\]

Confidence Regions

Suppose that we have a $k$-vector of parameter estimates $\hat{\theta}$, of which the covariance matrix $Var(\hat{\theta})$ can be estimated by $\widehat{Var}(\hat{\theta})$. Then, the statistic

\[\begin{eqnarray} \label{eq5.12} (\hat{\theta}-\theta_{0})^{T}\left(\widehat{Var}(\hat{\theta})\right)^{-1}(\hat{\theta}-\theta_{0}) \end{eqnarray}\]

can be used to test the joint null hypothesis that $\theta=\theta_{0}$. (\ref{eq5.12}) is asymptotically distributed as $\chi^{2}(k)$ under the null hypothesis.

Asymptotic Normality and Root-$n$ Consistency

We introduce asymptotic normality for linear regression models.

The DGP is

\[\begin{eqnarray} \label{eq5.13} y&=&X\beta_{0}+u,\ \ u\sim IID(0, \sigma_{0}^{2}I) \end{eqnarray}\]

The random vector $v=n^{-1/2}X^{T}u$ follows the asymptotic normal distribution, with mean $0$ and covariance matrix $\sigma_{0}^{2}S_{X^{T}X}$, where $S_{X^{T}X}$ is the plim of $n^{-1}X^{T}X$ as $n\to\infty$.

For the DGP (\ref{eq5.13}), we have

\[\begin{eqnarray} \label{eq5.14} \hat{\beta}-\beta_{0}&=&(X^{T}X)^{-1}X^{T}u \end{eqnarray}\]

We know under fairly weak condition, $\hat{\beta}$ will be consistent, i.e. (\ref{eq5.14}) tends to a limit of $0$ as $n\to\infty$. Thus it would appear that asymptotic theory has nothing to say about limiting variances for consistent estimators.

However, we can introduce $n^{1/2}$ to correct.

\[\begin{eqnarray*} n^{1/2}(\hat{\beta}-\beta_{0})&=&\left(\frac{1}{n}X^{T}X\right)^{-1}n^{-1/2}X^{T}u \end{eqnarray*}\]

The first term of right-hand side tends to $S^{-1}_{X^{T}X}$ as $n\to\infty$, and the second factor ($v$) tends to random vector distributed as $N(0,\sigma_{0}^{2}S_{X^{T}X})$. Because $S_{X^{T}X}$ is deterministic, we find that, asymptotically,

\[\begin{eqnarray*} Var(n^{1/2}(\hat{\beta}-\beta_{0}))&=&\sigma^{2}_{0}S^{-1}_{X^{T}X}S_{X^{T}X}S^{-1}_{X^{T}X}=\sigma_{0}^{2}S^{-1}_{X^{T}X} \end{eqnarray*}\]

and

\[\begin{eqnarray} \label{eq5.15} n^{1/2}(\hat{\beta}-\beta_{0})&\sim_{a}&N(0,\sigma_{0}^{2}S^{-1}_{X^{T}X}) \end{eqnarray}\]

Heteroskedasticity-Consistent Covariance Matrices

We focus on the linear regression model with exogenous regressors,

\[\begin{eqnarray} \label{eq5.5.1} y&=&X\beta+u,\ \ E(u)=0,\ \ E(uu^{T})=\Omega \end{eqnarray}\]

where $\Omega$, the error covariance matrix, is an $n\times n$ matrix with $t^{th}$ diagonal element equal to $\omega_{t}^{2}$ and all the off-diagonal elements equal to 0. These error terms are said to be heteroskedastic, or exhibit heteroskedasticity.

The covariance matrix of OLS estimator $\hat{\beta}$ is equal to

\[\begin{eqnarray} \nonumber E\left((\hat{\beta}-\beta)(\hat{\beta}-\beta)^{T}\right)&=&(X^{T}X)^{-1}X^{T}E(uu^{T})X(X^{T}X)^{-1}\\ \label{eq5.5.2} &=&(X^{T}X)^{-1}X^{T}\Omega X(X^{T}X)^{-1} \end{eqnarray}\]

This form of covariance matrix is often called a sandwich covariance matrix.

Usually we do not know $\omega_{t}^{2}$. Since there are $n$ of them, we cannot hope to estimate the $\omega_{t}^{2}$ consistently without making additional assumptions.

We find that

\[\begin{eqnarray} \label{eq5.10.3} Avar(n^{1/2}(\hat{\beta}-\beta))&=&\lim_{n\to\infty}\left(\frac{1}{n}X^{T}X\right)^{-1}\lim_{n\to\infty}\left(\frac{1}{n}X^{T}\Omega X\right)\lim_{n\to\infty}\left(\frac{1}{n}X^{T}X\right)^{-1} \end{eqnarray}\]

Assuming $\mathop{plim} \frac{1}{n}X^{T}X=S_{X^{T}X}$, factor $\lim\left(\frac{1}{n}X^{T}X\right)^{-1}$ tends to a finite, deterministic, positive definite matrix $(S_{X^{T}X})^{-1}$.

White (1980) showed that under certain conditions, the second factor $\lim\left(\frac{1}{n}X^{T}\Omega X\right)$ can be estimated consistently by

\[\begin{eqnarray} \label{eq5.10.4} \frac{1}{n}X^{T}\hat{\Omega}X \end{eqnarray}\]

where $\hat{\Omega}$ is an inconsistent estimator of $\Omega$. One version of $\hat{\Omega}$ is a diagonal matrix with $t^{th}$ diagonal element equal to $\hat{u}_{t}^{2}$, the $t^{th}$ squared OLS residual.

The $k\times k$ matrix $\lim\left(\frac{1}{n}X^{T}\Omega X\right)$, is symmetric. Therefore, it has $\frac{1}{2}(k^{2}+k)$ distinct elements. Its $ij^{th}$ element is

\[\begin{eqnarray} \label{eq5.5.5} \lim_{n\to\infty}\left(\frac{1}{n}\sum_{t=1}^{n}\omega_{t}^{2}X_{ti}X_{tj}\right) \end{eqnarray}\]

For the simplest version of $\hat{\Omega}$ is

\[\begin{eqnarray} \label{eq5.5.6} \frac{1}{n}\sum_{t=1}^{n}\hat{u}_{t}^{2}X_{ti}X_{tj} \end{eqnarray}\] \[\begin{eqnarray*} n^{1/2}(\hat{\beta}-\beta)&=&\left(\frac{1}{n}\sum_{t=1}^{n}X_{t}^{T}X_{t}\right)^{-1}\sqrt{n}\left(\frac{1}{n}\sum_{t=1}^{n}X_{t}^{T}u_{t}-0\right)\\ &\sim_{a}&N\left(0,E(X_{t}^{T}X_{t})^{-1}E(u_{t}^{2}X_{t}^{T}X_{t})E(X_{t}^{T}X_{t})^{-1}\right) \end{eqnarray*}\]

Using OLS residuals $\hat{u}_{t}=y_{t}-X_{t}\hat{\beta}$ as consistent estimates error terms $u_{t}==y_{t}-X_{t}\beta$, we obtain the following consistent estimate for the asymptotic covariance matrix

\[\begin{eqnarray*} \widehat{Var}(n^{1/2}(\hat{\beta}-\beta))&=&\left(\frac{1}{n}\sum_{t=1}^{n}X_{t}^{T}X_{t}\right)^{-1}\left(\frac{1}{n}\sum_{t=1}^{n}\hat{u}_{t}^{2}X_{t}^{T}X_{t}\right)\left(\frac{1}{n}\sum_{t=1}^{n}X_{t}^{T}X_{t}\right)^{-1} \end{eqnarray*}\]

Then dividing by $n$ gives an approximation of the covariance matrix of $\widehat{Var}(\hat{\beta})$

\[\begin{eqnarray*} \widehat{Var}(\hat{\beta})&=&\left(\sum_{t=1}^{n}X_{t}^{T}X_{t}\right)^{-1}\left(\sum_{t=1}^{n}\hat{u}_{t}^{2}X_{t}^{T}X_{t}\right)\left(\sum_{t=1}^{n}X_{t}^{T}X_{t}\right)^{-1}\\ &=&\left(X^{T}X\right)^{-1}X^{T}\hat{\Omega}X\left(X^{T}X\right)^{-1} \end{eqnarray*}\]

where $\hat{\Omega}$ is a diagonal matrix with $\hat{u}_{1}^{2},\dots,\hat{u}_{n}^{2}$ on the diagonal.

Delta Method

How to estimate the covariance matrix of a nonlinear function of a vector parameter estimates?

For example, if we know $\hat{\theta}$ is an estimator for $\theta$ and $\gamma=g(\theta)$, the obvious way to estimate $\gamma$ is to use $\hat{\gamma}=g(\hat{\theta})$. The problem is to estimate the variance of $\hat{\gamma}$. The idea of the delta method is to find a linear approximation to $g(\theta)$. Since we know how to calculate the variance for linear function, we can apply the approximation to it.

Suppose $\hat{\theta}$ is a $k\times 1$ matrix of $\sqrt{n}$-consistent estimate of $\theta$

\[\begin{eqnarray} \label{eq51} n^{1/2}(\hat{\theta}-\theta)&\sim_{a}&N(0, V^{\infty}(\hat{\theta})) \end{eqnarray}\]

and $g: \theta_{k\times 1}\mapsto g(\theta)_{\ell\times 1}$ with $\ell\leq k$

$\gamma=g(\theta)$ is a $\ell$-vector of monotonic functions that are continuously differentiable.

By Taylor expansion,

\[\begin{eqnarray} \label{eq52} g(\hat{\theta})&\cong& g(\theta)+g'(\theta)(\hat{\theta}-\theta) \end{eqnarray}\]

where

\[\begin{eqnarray*} g'(\theta)&=&\left[\begin{array}{ccc} \frac{\partial g_{1}}{\partial \theta_{1}}&\cdots&\frac{\partial g_{1}}{\partial \theta_{k}}\\ \vdots&\ddots&\vdots\\ \frac{\partial g_{\ell}}{\partial \theta_{1}}&\cdots&\frac{\partial g_{\ell}}{\partial \theta_{k}} \end{array} \right] \end{eqnarray*}\]

Rewrite (\ref{eq52})

\[\begin{eqnarray} \label{eq53} n^{1/2}(g(\hat{\theta})-g(\theta))&\cong& g'(\theta)n^{1/2}(\hat{\theta}-\theta) \end{eqnarray}\]

It can be shown that

\[\begin{eqnarray} \label{eq54} \mathop{plim}_{n\to\infty}n^{1/2}(g(\hat{\theta})-g(\theta))&=&\mathop{plim}_{n\to\infty} g'(\theta)n^{1/2}(\hat{\theta}-\theta)\\ &\sim_{a}&N(0,g'(\theta)V^{\infty}(\hat{\theta})g'(\theta)^{T}) \end{eqnarray}\]

In practice, the covariance matrix of $\hat{\gamma}$ may be estimated by the matrix

\[\begin{eqnarray} \label{eq55} \widehat{Var}(\hat{\gamma})&=&g'(\theta)\widehat{Var}(\hat{\theta})(\hat{\theta})g'(\theta)^{T} \end{eqnarray}\]

Davidson and MacKinnon Chapter 6—Confidence Intervals