I am trying to tweak contrast coding on a linear model where I want to know if each level of a factor is significantly different from the grand mean.
Let’s say the factor has levels "A", "B" and "C". The most common control-treatment contrasts obviously set "A" as the reference level, and compare "B" and "C" to that. This is not what I want, because level "A" does not show up in model summary.
Deviation coding also doesn’t seem to give me what I want, since it sets the contrast matrix for level "C" to [-1,-1,-1]
, and now this level does not show up in model summary.
set.seed(1)y <- rnorm(6, 0, 1)x <- factor(rep(LETTERS[1:3], each = 2))fit <- lm(y ~ x, contrasts = list(x = contr.sum))summary(fit)
In addition, the reported level names have changed from "A", "B" to "1" and "2".
Call:lm(formula = y ~ x, contrasts = list(x = contr.sum))Residuals: 1 2 3 4 5 6 -0.405 0.405 -1.215 1.215 0.575 -0.575 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -0.02902 0.46809 -0.062 0.954x1 -0.19239 0.66198 -0.291 0.790x2 0.40885 0.66198 0.618 0.581Residual standard error: 1.147 on 3 degrees of freedomMultiple R-squared: 0.1129, Adjusted R-squared: -0.4785 F-statistic: 0.1909 on 2 and 3 DF, p-value: 0.8355
Am I missing something? Should I add a dummy variable that is equal to the grand mean, so that I can use this as the reference level?
I saw a similar question (but maybe more demanding) asked last year, but without solution (yet): models with 'differences from mean' for all coefficients on categorical variables; get 'contrast coding' to do it?.
The accepted answer here works, but the author has not provided an explanation. I have asked about it on the stats SE: https://stats.stackexchange.com/questions/600798/understanding-the-process-of-tweaking-contrasts-in-linear-model-fitting-to-show