Davidson and MacKinnon Chapter 1—Regression Models
-
A data-generating process, or DGP is simply the mechanism that actually generated the data. A key feature of a DGP is that it constitutes a complete specification. Enough information is provided for the DGP to be simulated on a computer. A model is defined as a set of data-generating processes.
A model is misspecified if the DGP does not belong to the model under study.
- Simple linear regression model:
\begin{equation}
\label{eq:dmch1-regmodel}
y_{t}=\beta_{1}+\beta_{2}X_{t}+u_{t}
\end{equation}
-
\(t\): index the observations of a sample
-
\(n\): total number of observations, i.e., sample size
-
\(y_{t}\): dependent variable for observation \(t\)
-
\(X_{t}\): an observation on a single expalnatory variable, or independent variable
-
\(\beta_{1},\beta_{2}\): two unknown parameters
-
\(\beta_{1}\): constant or intercept
-
\(\beta_{2}\): slope coefficient
-
\(u_{t}\): the unknown error term
-
\(\beta_{1}+\beta_{2}X_{t}\): regression function
-
\(\Omega_{t}\): information set includes variables that potential explantatry variables. Specifying a regression model is deciding which of the variables that belong to \(\Omega_{t}\) should be included in the model.
-
Try to explain the observed values of the dependent variables in terms of those of the explanatory variable.
-
\(u_{t}\) is assumed to be a random variable. Its expectation value is zero whatever the value of \(X_{t}\) to identify the unknown parameters. Actually we are assuming that on average the effects of the neglected determinants tend to cancel out.
-
We never say that the effect of \(u_t\) is necessarily small. Even if the proportion is large, this model is still useful if it allows us to see how \(y_t\) is related to the observed variables \(X_t\).
-
How to estimate and test hypothesis about the parameters?
-
A strong assumption about \(u_t\) is IID. The \(u_{t}\) will suffer serial correlation or heteroskedasticity.
-
- Matrix form of a simple linear regression model Matrix form of a simple linear regression model
Let \(y\) denote an \(n\)-vector, \(u\) an \(n\)-vector, \(X\) an \(n\times 2\) matrix, and \(\beta\) a \(2\)-vector.
\[\begin{eqnarray*} y=\left[\begin{array}{c} y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{array}\right], u=\left[\begin{array}{c} u_{1}\\ u_{2}\\ \vdots\\ u_{n} \end{array}\right], X=\left[\begin{array}{cc} 1&X_{1}\\ 1&X_{2}\\ \vdots&\vdots\\ 1&X_{n} \end{array}\right]&\mbox{,and}& \beta=\left[\begin{array}{c} \beta_{1}\\ \beta_{2}\\ \end{array}\right] \end{eqnarray*}\]Equations (\ref{eqm1}) can be rewritten as
\(\begin{eqnarray} \label{eqm2} y&=&X\beta+u \end{eqnarray}\) - regressors: separate columns of the matrix \(X\) - regressand: column vector \(y\)
-
Multivariable linear regression model We have \(k\) regressors and \(\beta\) now is \(k\)-vector
\[\begin{eqnarray} y=\left[\begin{array}{c} y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{array}\right], u=\left[\begin{array}{c} u_{1}\\ u_{2}\\ \vdots\\ u_{n} \end{array}\right], X=\left[\begin{array}{cccc} X_{11}&X_{12}&\cdots&X_{1k}\\ X_{21}&X_{22}&\cdots&X_{2k}\\ \vdots&\vdots&&\vdots\\ X_{n1}&X_{n2}&\cdots&X_{nk} \end{array}\right]&\mbox{,and}& \beta=\left[\begin{array}{c} \beta_{1}\\ \beta_{2}\\ \vdots\\ \beta_{k} \end{array}\right] \end{eqnarray}\]The model is \begin{equation} y=X\beta+u \end{equation}
A typical row of this equation is \begin{equation} \label{eq:dmch1-regmodelrow} y_{t}=X_{t}\beta+u_{t}=\sum_{i=1}^{k}\beta_{i}X_{ti}+u_{t} \end{equation} where \(X_{t}\) is the \(t^{th}\) row of \(X\).