The GAMCOVA: A New Improved ANCOVA

 

This is the web page associated with methods paper in the Psychologist (March, 2008). Here, more details for running the GAMCOVA are described. A slightly longer version is here. "GAMCOVA" is an amalgamation of GAM and ANCOVA, where GAM stands for Generalized Additive Model and ANCOVA stands for ANalysis of COVAriance. This is a more flexible type of ANCOVA where it is not assumed that the response variable and the covariate are linearly related. It increases the likelihood that you find a significant group effect.

 

The software used in the paper is R, which is freely available.

 

1.  How to download and install R.    

        ppt slides. There are notes with each page too.

        CRAN (where to get R)

        Free introductory manual here

Then click the icon.

 

 

2.  How to read from SPSS here

 

 

3.  Making the data used in the paper. Either within R type:

 

source("http://www.fiu.edu/~dwright/gamcova/data.R")

or open the data here, and copy and paste it into R.

 

    R commands in red

    R output in blue

        Comments in purple

 

 

4.  Traditional ANCOVA

 

anc1 <- lm(TIME2 ~ TIME1 + GROUP)

anova(anc1)

Analysis of Variance Table

Response: TIME2
           Df Sum Sq Mean Sq   F value  Pr(>F) 
TIME1       1  6040    6040    27.0169  5.024e-07 ***  
what to write
GROUP       1   717     717     3.2068  0.07487   . 
Residuals 197 44042     224 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

summary(anc1)

Call:
lm(formula = TIME2 ~ TIME1 + GROUP)

Residuals:
Min 1Q Median 3Q Max 
-34.123 -11.433 1.154 10.341 49.126 

Coefficients:
            Estimate Std. Error t value Pr(>|t|) 
(Intercept) 33.44090 2.69408     12.413 < 2e-16  ***
TIME1        0.27348 0.05468      5.002 1.25e-06 ***
GROUP        3.81646 2.13120      1.791 0.0749   . 

 

These says the "New Method" group has, on average, scores that are 3.82 units higher than the control group. This difference is non-significant, t(197) = 1.79, p = .07. This is the same as the F test above (since for F(1,x), t2 = F. So, 1.791*1.791 = 3.21).


---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 14.95 on 197 degrees of freedom
Multiple R-Squared: 0.133, Adjusted R-squared: 0.1242 
F-statistic: 15.11 on 2 and 197 DF, p-value: 7.839e-07 

 

5.  It is worth testing the interaction effect to see if the lines are parallel (this not discussed in the paper). 

See Chapter 4 of Wright and London's Modern Regression book.

 

anc2 <- lm(TIME2 ~ TIME1 * GROUP)    # here for equivalent methods

anova(anc1,anc2)

 

Analysis of Variance Table

Model 1: TIME2 ~ TIME1 + GROUP
Model 2: TIME2 ~ TIME1 * GROUP
Res.Df RSS Df Sum of Sq   F     Pr(>F) 
1 197 44042 
2 196 42747 1    1295  5.9389 0.01570 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 

So this allows us to reject the hypothesis that the lines are parallel: F(1,196) = 5.94, p = .02, ηp2 = 1295/(42747+1295) = .03.

 

 

6.  Running a GAMCOVA

 

You need to access the library splines which is part of the main R core (R Development Core Team, 2007). It includes the bs function.

 

library(splines)

gc1 <- lm(TIME2 ~ bs(TIME1, df=4) + GROUP)

anova(gc1)

Analysis of Variance Table

Response: TIME2
                 Df Sum Sq Mean Sq F value   Pr(>F) 
bs(TIME1, df = 4) 4 25656.7 6414.2 50.7866 < 2e-16 ***
GROUP             1   641.1  641.1  5.0764  0.02537 *     
what to write
Residuals       194 24501.5   126.3 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 

Notice that 641.1 < 717 found earlier. This model accounts for less of the deviation than the traditional ANCOVA. It is more significant because the better fitting spline covariate accounts for about half of the original residual sums of squares.

 

 

 

7.  Worth checking if these curve are "parallel".

 

gc2 <- lm(TIME2 ~ bs(TIME1, df=4) * GROUP)

anova(gc1,gc2)

Analysis of Variance Table

Model 1: TIME2 ~ bs(TIME1, df = 4) + GROUP
Model 2: TIME2 ~ bs(TIME1, df = 4) * GROUP
Res.Df RSS    Df Sum of Sq    F   Pr(>F) 
1 194 24501.5 
2 190 23379.3 4   1122.2   2.2801 0.06224 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 

This is near that magical .05 (see Baguley, 2008, March issue of the Psychologist).   

 

 

8.  The graph in the paper is a fairly complex graph. Two good sources for making graphs in R are:

 

    Crawley, M.J. (2007). The R book. Chichester, UK: John Wiley & Sons.
    Murrell, P. (2006). R Graphics. Boca Raton, FL: Chapman & Hall/CRC.

 

We also cover graphs in a book aimed at psychologists in:

    Wright, D.B. & London, K. (2009). Modern regression techniques. London: Sage. See here.

To get the graph in the paper you have to size the graph window so that it is about the right shape, and either type:

 

source("http://www.fiu.edu/~dwright/gamcova/graph.R")

 

or go to here and copy and paste it into R. You need to have run the previous analyses for this code to work. The graph is: here.

 

 

9. Summary

    There is a lot more that can be done with Generalized Additive Models. Wright (2008) discusses three main uses (the first being the topic of this paper) that psychologists have for this procedure:

1. When you do not want to make any assumptions about the relationship between a covariate and the response variable. With the typical ANCOVA, it is assumed that the covariate is a linear predictor. It may be appropriate to relax this assumption using a spline of the covariate.

2. To test if a relationship it not linear.

3. As an exploratory graphical tool to plot curves within a scatterplot to search for patterns, but be careful, particularly at the ends of the scales, not to over-interpret deviations from linearity.

 

 

Questions:

 

10. What if I have several groups?

If you define your grouping variable as a factor R automatically takes this into account. Your group variable should have 1 less degrees of freedom than it has categories. To turn your group variable in a categorical variable, type

 

group <- as.factor(group)

 

It it is an ordered variable, type

 

group <- as.ordered(group)

 

This will use as default polynomial contrasts.