Lecture 3.  Multiple Regression: model Assumptions and Interpretations  

I. Model Assumptions

y   01x12 x 23 x 34 x 4

• A. The model has two components:

– Deterministic portion :  β01x12 x 23 x 34 x 4

– Random error: ε

 • B. Assumptions about the random error:

•  For any observed values of the independent variables, the random error is normally distributed with a mean of zero and defined variance

ε N(0,σ2)

•  The random errors are independent

II.  Assumptions about Linear Statistical Models

•       Linear in the parameters

•       First-order models: linear in the independent variables

•       Higher order models: if one of the independent variables are elevated

.                     to a given power

 

III. Interpretation in First Order Models

y0x+β22

•B0=intercept or value of y when all x are equal to zero

•B1=mean change in y for every unit increase in x1, holding x2 constant.

Interpretation continued

•       The effect of an independent variable xi on y is independent of all the

         other independent variables in the model.

•       The independent effect of xi is given by Bi

•       The interval +/-2(s) provides an approximation of the accuracy with

        which the model will predict the future values of y for given values of x

 

Estimation of the variance
of the error term

The standard deviation of the error term

•  We use s to: (1) Check the utility of the model , (2) Provide a measure of reliability of predictions and estimates


IV. Inference about the parameters

•A. Hypothesis test of an individual parameter coefficient in the multiple regression model:

   •One tailed test

–Ho: Bi=0

–Ha: Bi>0

– Test statistic: t*=bi/sb  

– Rejection Ho if t*>t α

– t is based on n-k-1 degrees of freedom

   •Two tailed test –Ho: Bi=0 –Ha: Bi≠0

– Test statistic: t*=bi/sb (check STATA output)

– Rejection Ho if |t*|>tα/2

– t is based on n-k-1 degrees of freedom

   • Confidence Interval for a B Parameter

bi± (tα/2)sb

•Need caution interpreting t-test for individual parameters in a first-order linear model

   • Possible interpretations:

– There is no relationship between y and x

–       There is a linear relationship but Type II error occurred (fail to reject Ho when

     Ha is true)

–       There is a potential non-linear relationship between x and y that needs to be    explored

•B. Hypothesis test of model for multiple regression model: F = MSM/MSE

 

.                     •       Ho: all b’s are equal to zero

.                     •       Ha: at least one b is different from zero

.                      •       Rejection region: F>Fα k/n-(k+i)


 

Caution

.                      •The fact that we reject the Ho for a given model does not mean that we have found the best model.

.                      •The F-test is regarded as a test that the model must pass to merit further consideration

.                      •First, conduct the F-test for the model

.                      •Second, conduct individual t-tests for each parameter in the model

 

V. Interaction Models with qualitative and quantitative predictors

y01 x12 x 23 z 34 z 3 x 2

VI. Second Order Model with Quantitative Predictor

•  We consider models that allow for curvalinear relationships. The models are second order models because they have quadratic terms in the independent variables

y01x12 x12

Interpreting Quadratic term:

.                      •When the parameter of the quadratic term is positive, then we are looking at a concave upward trend. In other words, as the value of x increases the value of y increases at an increasing rate

.                      •When the parameter of the quadratic term is negative, then we are looking at an concave downward trend. In other words, as the value of x increases the value of y increases at an decreasing rate

 


More on interpreting the quadratic term

.                     •In our regression B2 is the rate of the curvature

.                     •You cannot interpret the B2 parameter in the same way as the parameters for first order variables. The effect of the increase of x on y is not constant for all the values of x. The B2 parameters must be interpreted in intervals

 

VII. Example of Using Natural log: Modeling Income

•  We use the natural logarithm of the dependent variable when modeling the effects of many independent variables on salaries, income or wages.

– First, because the transformation of the variable provides a nearly normal distribution

– Second, because it is easier to interpret

 

Interpretation of a Log Model:  ln(y)=β01 x12 x 2

.                      •One unit increase in X1 represents on average a (eβi− 1) x 100% (percentage) change in y, holding x2 constant.

.                       

VIII.  Nested Models

•  You can only compare two models when the dependent variable is the same and they are nested :

y 01x12 x 23 x 34 x 4

 y01x12 x 2