I have data where I want to compare the level of a factor with the mean of another level of that factor. In addition, I want to create this contrast for all levels of another factor.
So I want to get a coefficient that gives the difference in means between the key factor level and another level nested within each level of a second factor.
Ideally, I could do this all in R with a single call to lm()
. I know how to do this using marginal effects packages or using separate regressions. But is it possible to do within one single lm()
?
I would probably need to create a specific covariate to get that effect, right?
I tried the x + x:y
model syntax, which works for one interaction. But as soon as another interaction is included in the same model, the estimates diverge. I know why this happens, but I don't know if it is possible to stop it with some clever recoding of factors or contrast coding.
Some example code where I am interested in estimating the mean difference between cyl
8 and 4 within the levels of am
:
library(marginaleffects) # For comparisonsdata(mtcars)mtcars$cyl <- factor(mtcars$cyl)mtcars$gear <- factor(mtcars$gear)mtcars$am <- factor(mtcars$am)# This doesn't give me marginal effects, obviously.mod <- lm(mpg ~ cyl * am + gear, data = mtcars)summary(mod)# This is what I want to estimate in the model directly:avg_comparisons(mod, variables = "cyl", by = "am")# This is getting me to my desired estimates:mod2 <- lm(mpg ~ am / cyl + gear, data = mtcars)summary(mod2)# For comparisonavg_comparisons(mod2, variables = "cyl", by = "am")# However, if I include a second set of interactions with cyl, the estimates diverge from the marginal comparisons:mod3 <- lm(mpg ~ am / cyl + gear / cyl, data = mtcars)summary(mod3)avg_comparisons(mod3, variables = "cyl", by = "am")# I don't see a way of fixing the model unfortunately