Lecture 3. Multiple
Regression: model Assumptions and Interpretations
I. Model Assumptions
y =β0+β1x1+β2 x 2+β3
x 3+β4 x 4+ε
A. The model has two components:
Deterministic portion : β0+β1x1+β2 x 2+β3
x 3+β4 x 4
Random error: ε
B. Assumptions about the random error:
For any observed values of the independent
variables, the random error is normally distributed with a mean of zero and
defined variance
ε ≈ N(0,σ2)
The random errors are independent
II. Assumptions about Linear
Statistical Models
Linear in the parameters
First-order models: linear in the independent variables
Higher order models: if one of the independent variables are elevated
.
to a given power
III. Interpretation in First Order Models
y=β0+β x+β22+ε
B0=intercept or value of y
when all x are equal to zero
B1=mean change in y for every
unit increase in x1, holding x2 constant.
Interpretation continued
The effect of an independent variable xi on y is independent
of all the
other independent variables in the model.
The independent effect of xi is given by Bi
The interval +/-2(s) provides an approximation of the accuracy
with
which the model will
predict the future values of y for given values of x
Estimation of the variance
of the error term

The standard
deviation of the error term
We use s to: (1) Check the utility of
the model , (2) Provide a measure of reliability of
predictions and estimates
IV. Inference about the parameters
A. Hypothesis test of an
individual parameter coefficient in the multiple regression model:
One tailed test
Ho:
Bi=0
Ha:
Bi>0
Test
statistic: t*=bi/sb
Rejection Ho if t*>t α
t is based on n-k-1 degrees of freedom
Two tailed test Ho: Bi=0 Ha: Bi≠0
Test
statistic: t*=bi/sb (check STATA output)
Rejection Ho if |t*|>tα/2
t is based on n-k-1 degrees of freedom
Confidence Interval for
a B Parameter
bi± (tα/2)sb
Need caution interpreting t-test for individual parameters
in a first-order linear model
Possible interpretations:
There is no relationship between y and x
There is a linear relationship but Type II error
occurred (fail to reject Ho when
Ha is true)
There is a
potential non-linear relationship between x and y that needs to be explored
B. Hypothesis test of model
for multiple regression model: F = MSM/MSE
.
Ho: all bs are
equal to zero
.
Ha: at least one b is different from zero
.
Rejection region: F>Fα k/n-(k+i)
Caution
.
The fact that we
reject the Ho for a given model does not mean that we have found the best
model.
.
The F-test is
regarded as a test that the model must pass to merit further consideration
.
First, conduct
the F-test for the model
.
Second, conduct
individual t-tests for each parameter in the model
V. Interaction Models with qualitative
and quantitative predictors
y=β0+β1
x1+β2 x 2+β3 z 3+β4 z 3 x 2+ε
VI. Second Order Model with
Quantitative Predictor
We consider models that allow for curvalinear relationships. The models are second order
models because they have quadratic terms in the independent variables
y=β0
+β1x1+β2 x12+ε
Interpreting Quadratic term:
.
When the
parameter of the quadratic term is positive, then we are looking at a concave
upward trend. In other words, as the value of x increases the value of y
increases at an increasing rate
.
When the
parameter of the quadratic term is negative, then we are looking at an concave downward trend. In other words, as the value of x
increases the value of y increases at an decreasing rate
More on interpreting the quadratic
term
.
In our
regression B2 is the rate of the curvature
.
You cannot
interpret the B2 parameter in the same way as the parameters for first order
variables. The effect of the increase of x on y is not constant for all the values
of x. The B2 parameters must be interpreted in intervals
VII. Example of Using Natural log: Modeling Income
We use the natural logarithm of the dependent
variable when modeling the effects of many independent variables on salaries,
income or wages.
First, because the
transformation of the variable provides a nearly normal distribution
Second, because it is easier
to interpret
Interpretation of a Log Model: ln(y)=β0 +β1 x1+β2 x 2+ε
.
One unit increase in X1 represents on
average a (eβi− 1) x
100% (percentage) change in y, holding x2 constant.
.
VIII. Nested Models
You can only compare two models when the
dependent variable is the same and they are nested :
y =β0+β1x1+β2 x 2+β3 x 3+β4 x 4+ε
y=β0+β1x1+β2 x 2+ε