Research Methods II: Lecture 5

I. Variable Screening Procedures

•These techniques are used to objectively determine which independent variables are the most important predictors of the dependent variable.

       Methods:

– Stepwise regression

– All-possible-regressions-selection

Note: theory should be used for variable selection, not a mindless computer!

 

II. Stepwise Regression

• 1. Identify Y and relevant Xs

• 2. STATA command: sw

     – The program will fit all possible bivariate regressions (best t-test)

     – The program will fit all possible two independent variable regressions (second best t-test)

_  Evaluate the second model in the presence of the first selected independent variable. It

    will look for the higher alternative t-test in the presence of the second selected variable

_ The program will fit all possible three independent variable regressions (the best

     two variable model and the best model with a third independent variable)

    _ The process continues until no further independent variables can be found that

        yield significant t values in the presence of the variables already in the model

•The process results in a model containing only those terms with t values that are significant at a α level.

•Stepwise regression is a non-theoretical variable screening procedure

 

III. Caution in using stepwise regression (regression fishing)

•There is a high probability that one or more errors have been made in selecting the variables—creating nonsensical results!

        -The computer cannot distinguish spurious correlations or make

          judgments regarding multicollinearity

•Often high-order terms are omitted. You should include not only the main variables, but their transformed forms and interactions.


•Stepwise regression should almost never be used, accept in a completely non-theoretical approach to prediction

•Other stepwise regression techniques:

     -Forward Selection


     -Backward elimination

 

 

 

IV. STATA commands

.                      •From UCLA-STATA website:

.                       

.                      •The sw command is used for stepwise regression.

.                      • The pr option is the probability to remove a variable.

.                      •The pe option is the probability to enter a variable.

 

.                                 sw regress y x1 x2 x3 x4, pr(.05)

.                                 sw regress y x1 x2 x3 x4, pe(.05)

.                                 sw regress y x1 x2 x3 x4, pe(.05) pr(.1)

 

V. All-possible-regressions selection

•A procedure that considers all possible regression models given the set of potentially important predictors

•R-squared criterion. Find a subset model so that adding more variables will yield only small increases in R-squared


Adjusted R-squared or MSE criterion: searches for the model with the minimum MSE

Cp criterion: selects as the best model the subset model with a small total mean squared error and a value of Cp near p+1(number of parameters), which is an indicator of no bias in the subset regression model

Adjusted R-squared or MSE criterion: searches for the model with the minimum MSE

Cp criterion: selects as the best model the subset model with a small total mean squared error and a value of Cp near p+1(number of parameters), which is an indicator of no bias in the subset regression model


PRESS (prediction sum of squares) criterion: The candidate model is fit to the sample data n times, each time omitting one of the data points and obtaining the predicted value for that data point.