1. A data-generating process, or DGP is simply the mechanism that actually generated the data. A key feature of a DGP is that it constitutes a complete specification. Enough information is provided for the DGP to be simulated on a computer. A model is defined as a set of data-generating processes.

    A model is misspecified if the DGP does not belong to the model under study.

  2. Simple linear regression model: \begin{equation} \label{eq:dmch1-regmodel} y_{t}=\beta_{1}+\beta_{2}X_{t}+u_{t} \end{equation}
    • \(t\): index the observations of a sample

    • \(n\): total number of observations, i.e., sample size

    • \(y_{t}\): dependent variable for observation \(t\)

    • \(X_{t}\): an observation on a single expalnatory variable, or independent variable

    • \(\beta_{1},\beta_{2}\): two unknown parameters

    • \(\beta_{1}\): constant or intercept

    • \(\beta_{2}\): slope coefficient

    • \(u_{t}\): the unknown error term

    • \(\beta_{1}+\beta_{2}X_{t}\): regression function

    • \(\Omega_{t}\): information set includes variables that potential explantatry variables. Specifying a regression model is deciding which of the variables that belong to \(\Omega_{t}\) should be included in the model.

    1. Try to explain the observed values of the dependent variables in terms of those of the explanatory variable.

    2. \(u_{t}\) is assumed to be a random variable. Its expectation value is zero whatever the value of \(X_{t}\) to identify the unknown parameters. Actually we are assuming that on average the effects of the neglected determinants tend to cancel out.

    3. We never say that the effect of \(u_t\) is necessarily small. Even if the proportion is large, this model is still useful if it allows us to see how \(y_t\) is related to the observed variables \(X_t\).

    4. How to estimate and test hypothesis about the parameters?

    5. A strong assumption about \(u_t\) is IID. The \(u_{t}\) will suffer serial correlation or heteroskedasticity.

  3. Matrix form of a simple linear regression model Matrix form of a simple linear regression model
\[\begin{eqnarray} \nonumber y_{1}&=&\beta_{1}+\beta_{2}X_{1}+u_{1}\\\nonumber y_{2}&=&\beta_{1}+\beta_{2}X_{2}+u_{2}\\\label{eqm1} &\vdots&\\\nonumber y_{n}&=&\beta_{1}+\beta_{2}X_{n}+u_{n} \end{eqnarray}\]

Let \(y\) denote an \(n\)-vector, \(u\) an \(n\)-vector, \(X\) an \(n\times 2\) matrix, and \(\beta\) a \(2\)-vector.

\[\begin{eqnarray*} y=\left[\begin{array}{c} y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{array}\right], u=\left[\begin{array}{c} u_{1}\\ u_{2}\\ \vdots\\ u_{n} \end{array}\right], X=\left[\begin{array}{cc} 1&X_{1}\\ 1&X_{2}\\ \vdots&\vdots\\ 1&X_{n} \end{array}\right]&\mbox{,and}& \beta=\left[\begin{array}{c} \beta_{1}\\ \beta_{2}\\ \end{array}\right] \end{eqnarray*}\]

Equations (\ref{eqm1}) can be rewritten as

\(\begin{eqnarray} \label{eqm2} y&=&X\beta+u \end{eqnarray}\) - regressors: separate columns of the matrix \(X\) - regressand: column vector \(y\)

  1. Multivariable linear regression model We have \(k\) regressors and \(\beta\) now is \(k\)-vector

    \[\begin{eqnarray} y=\left[\begin{array}{c} y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{array}\right], u=\left[\begin{array}{c} u_{1}\\ u_{2}\\ \vdots\\ u_{n} \end{array}\right], X=\left[\begin{array}{cccc} X_{11}&X_{12}&\cdots&X_{1k}\\ X_{21}&X_{22}&\cdots&X_{2k}\\ \vdots&\vdots&&\vdots\\ X_{n1}&X_{n2}&\cdots&X_{nk} \end{array}\right]&\mbox{,and}& \beta=\left[\begin{array}{c} \beta_{1}\\ \beta_{2}\\ \vdots\\ \beta_{k} \end{array}\right] \end{eqnarray}\]

    The model is \begin{equation} y=X\beta+u \end{equation}

    A typical row of this equation is \begin{equation} \label{eq:dmch1-regmodelrow} y_{t}=X_{t}\beta+u_{t}=\sum_{i=1}^{k}\beta_{i}X_{ti}+u_{t} \end{equation} where \(X_{t}\) is the \(t^{th}\) row of \(X\).