The GAMCOVA:
This is the web page associated with methods paper in the Psychologist (March, 2008). Here, more details for running the GAMCOVA are described. A slightly longer version is here. "GAMCOVA" is an amalgamation of GAM and ANCOVA, where GAM stands for Generalized Additive Model and ANCOVA stands for ANalysis of COVAriance. This is a more flexible type of ANCOVA where it is not assumed that the response variable and the covariate are linearly related. It increases the likelihood that you find a significant group effect.
The software used in the paper is R, which is freely available.
1. How to download and install R.
ppt slides. There are notes with each page too.
CRAN (where to get R)
Free introductory manual here
Then
click the
icon.
2. How to read from SPSS here
3. Making the data used in the paper. Either within R type:
source("http://www.fiu.edu/~dwright/gamcova/data.R")
or open the data here, and copy and paste it into R.
R commands in red
R output in blue
Comments in purple
4. Traditional ANCOVA
anc1 <- lm(TIME2 ~ TIME1 + GROUP)
anova(anc1)
Analysis of Variance Table
Response: TIME2
Df Sum Sq Mean Sq
F value Pr(>F)
TIME1 1 6040 6040
27.0169 5.024e-07 ***
what
to write
GROUP 1 717
717 3.2068 0.07487 .
Residuals 197 44042 224
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(anc1)
Call:
lm(formula = TIME2 ~ TIME1 + GROUP)
Residuals:
Min 1Q Median 3Q Max
-34.123 -11.433 1.154 10.341 49.126
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.44090 2.69408 12.413 < 2e-16
***
TIME1 0.27348 0.05468
5.002 1.25e-06 ***
GROUP 3.81646 2.13120
1.791 0.0749 .
These says the "New Method" group has, on average, scores that are 3.82 units higher than the control group. This difference is non-significant, t(197) = 1.79, p = .07. This is the same as the F test above (since for F(1,x), t2 = F. So, 1.791*1.791 = 3.21).
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.95 on 197 degrees of freedom
Multiple R-Squared: 0.133, Adjusted R-squared: 0.1242
F-statistic: 15.11 on 2 and 197 DF, p-value: 7.839e-07
5. It is worth testing the interaction effect to see if the lines are parallel (this not discussed in the paper).
See Chapter 4 of Wright and London's Modern Regression book.
anc2 <- lm(TIME2 ~ TIME1 * GROUP) # here for equivalent methods
anova(anc1,anc2)
Analysis of Variance Table
Model 1: TIME2 ~ TIME1 + GROUP
Model 2: TIME2 ~ TIME1 * GROUP
Res.Df RSS Df Sum of Sq F Pr(>F)
1 197 44042
2 196 42747 1 1295 5.9389 0.01570 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So this allows us to reject the hypothesis that the lines are parallel: F(1,196) = 5.94, p = .02, ηp2 = 1295/(42747+1295) = .03.
6. Running a GAMCOVA
You need to access the library splines which is part of the main R core (R Development Core Team, 2007). It includes the bs function.
library(splines)
gc1 <- lm(TIME2 ~ bs(TIME1, df=4) + GROUP)
anova(gc1)
Analysis of Variance Table
Response: TIME2
Df Sum Sq Mean Sq F value Pr(>F)
bs(TIME1, df = 4) 4 25656.7 6414.2 50.7866 < 2e-16 ***
GROUP 1
641.1 641.1 5.0764 0.02537 *
what
to write
Residuals 194 24501.5 126.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Notice that 641.1 < 717 found earlier. This model accounts for less of the deviation than the traditional ANCOVA. It is more significant because the better fitting spline covariate accounts for about half of the original residual sums of squares.
7. Worth checking if these curve are "parallel".
gc2 <- lm(TIME2 ~ bs(TIME1, df=4) * GROUP)
anova(gc1,gc2)
Analysis of Variance Table
Model 1: TIME2 ~ bs(TIME1, df = 4) + GROUP
Model 2: TIME2 ~ bs(TIME1, df = 4) * GROUP
Res.Df RSS Df Sum of Sq F Pr(>F)
1 194 24501.5
2 190 23379.3 4 1122.2 2.2801 0.06224 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This is near that magical .05 (see Baguley, 2008, March issue of the Psychologist).
8. The graph in the paper is a fairly complex graph. Two good sources for making graphs in R are:
Crawley, M.J. (2007). The R book. Chichester, UK: John Wiley & Sons.
Murrell, P. (2006). R Graphics. Boca Raton, FL: Chapman &
Hall/CRC.
We also cover graphs in a book aimed at psychologists in:
Wright, D.B. & London, K. (2009).
Modern regression techniques. London: Sage.
See here.
To get the graph in the paper you have to size the graph window so that it is
about the right shape, and either type:
source("http://www.fiu.edu/~dwright/gamcova/graph.R")
or go to here and copy and paste it into R. You need to have run the previous analyses for this code to work. The graph is: here.
9. Summary
There is a lot more that can be done with Generalized Additive Models. Wright (2008) discusses three main uses (the first being the topic of this paper) that psychologists have for this procedure:
1.
When you do not want to make any assumptions about the relationship between a
covariate and the response variable. With the typical ANCOVA, it is assumed that
the covariate is a linear predictor. It may be appropriate to relax this
assumption using a spline of the covariate.
2.
To test if a relationship it not linear.
3.
As an exploratory graphical tool to plot curves within a scatterplot to search
for patterns, but be careful, particularly at the ends of the scales, not to
over-interpret deviations from linearity.
Questions:
10. What if I have several groups?
If you define your grouping variable as a factor R automatically takes this into account. Your group variable should have 1 less degrees of freedom than it has categories. To turn your group variable in a categorical variable, type
group <- as.factor(group)
It it is an ordered variable, type
group <- as.ordered(group)
This will use as default polynomial contrasts.