Quantcast
Channel: Active questions tagged lm - Stack Overflow
Viewing all articles
Browse latest Browse all 124

How to get coefficient values for all treatments in a zero-intercept lm model

$
0
0

(edited to include two instead of one categorical variable)

I have a hypothetical dataset with two categorical variables, 4 mutually exclusive treatments and 5 mutually exclusive groups, and a continuous variable, as well as a response variable. Each treatment is coded as a dummy variable and is either 0 or 1.

I am trying to fit a linear regression model with an intercept fixed at zero (y ~ 0 + treat1 + treat2 + treat3 + treat4 + group1 + group2 + group3 + group4 + group5 + contvar) using the lm() function in R. I want to get coefficient estimates for all four treatments, all five groups, as well as the continuous variable.

Here's a reproducible example of my issue:

set.seed(123) # added# Sample sizen <- 100# Generate predictors, treatments 1 - 4 that are mutually exclusive, and a continuous variabledata <- data.frame(  treat1 = c(rep(1, 25), rep(0, 75)),  treat2 = c(rep(0, 25), rep(1, 25), rep(0, 50)),  treat3 = c(rep(0, 50), rep(1, 25), rep(0, 25)),  treat4 = c(rep(0, 75), rep(1, 25)), # edit  group1 = rep(c(rep(1, 5), rep(0, 20)), 4),  group2 = rep(c(rep(0, 5), rep(1, 5), rep(0, 15)), 4),  group3 = rep(c(rep(0, 10), rep(1, 5), rep(0, 10)), 4),  group4 = rep(c(rep(0, 15), rep(1, 5), rep(0, 5)), 4),  group5 = rep(c(rep(0, 20), rep(1, 5)), 4),  contvar = sample(0:100, n)/100)# Define means for each treatmentmean1 <- rnorm(100, -0.5, 0.1) ; mean2 <- rnorm(100, -0.2, 0.1) ; mean3 <- rnorm(100, 0.2, 0.1) ; mean4 <- rnorm(100, 0.5, 0.1)# Define means for each groupmeangr1 <- rnorm(100, -1, 0.2) ; meangr2 <- rnorm(100, -0.1, 0.1) ; meangr3 <- rnorm(100, 0, 0.1) ; meangr4 <- rnorm(100, 0.1, 0.1) ; meangr5 <- rnorm(100, 1, 0.2)# Generate response variable y based on the treatment means, group means and the value of the continuous variabledata$y <- mean1 * data$treat1 + mean2 * data$treat2 + mean3 * data$treat3 + mean4 * data$treat4data$y <- data$y * data$contvardata$y <- data$y + meangr1 * data$group1 + meangr2 * data$group2 + meangr3 * data$group3 + meangr4 * data$group4 + meangr5 * data$group5# Fit a no-intercept modelmodel0 <- lm(y ~ 0 + treat1 + treat2 + treat3 + treat4 +               group1 + group2 + group3 + group4 + group5 +               contvar, data = data)# Summarize the no-intercept modelsummary(model0)

The generated outcome reads:

Call:lm(formula = y ~ 0 + treat1 + treat2 + treat3 + treat4 + group1 +    group2 + group3 + group4 + group5 + contvar, data = data)Residuals:     Min       1Q   Median       3Q      Max -0.54069 -0.11705 -0.00558  0.11709  0.56169 Coefficients: (1 not defined because of singularities)        Estimate Std. Error t value Pr(>|t|)    treat1   0.67980    0.07089   9.590 1.84e-15 ***treat2   0.96646    0.06949  13.908  < 2e-16 ***treat3   1.14318    0.07113  16.071  < 2e-16 ***treat4   1.22735    0.06751  18.180  < 2e-16 ***group1  -2.01289    0.06154 -32.710  < 2e-16 ***group2  -1.11811    0.06144 -18.200  < 2e-16 ***group3  -1.07177    0.06311 -16.983  < 2e-16 ***group4  -0.89593    0.06313 -14.193  < 2e-16 ***group5        NA         NA      NA       NA    contvar -0.01132    0.06987  -0.162    0.872    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1Residual standard error: 0.1938 on 91 degrees of freedomMultiple R-squared:  0.9301,    Adjusted R-squared:  0.9232 F-statistic: 134.6 on 9 and 91 DF,  p-value: < 2.2e-16

There is a coefficient for all treatments but not for all groups. Why is this the case? I understand that getting coefficient estimates for all treatments and all groups would not work with a random intercept model, where one category from each would be selected as reference, but with a zero-intercept model I expected it would work.

Just to illustrate, this is what happens if I fit the model with a random intercept:

Call:lm(formula = y ~ treat1 + treat2 + treat3 + treat4 + group1 +    group2 + group3 + group4 + group5 + contvar, data = data)Residuals:     Min       1Q   Median       3Q      Max -0.54069 -0.11705 -0.00558  0.11709  0.56169 Coefficients: (2 not defined because of singularities)            Estimate Std. Error t value Pr(>|t|)    (Intercept)  1.22735    0.06751  18.180  < 2e-16 ***treat1      -0.54755    0.05509  -9.940 3.41e-16 ***treat2      -0.26089    0.05491  -4.751 7.51e-06 ***treat3      -0.08417    0.05513  -1.527    0.130    treat4            NA         NA      NA       NA    group1      -2.01289    0.06154 -32.710  < 2e-16 ***group2      -1.11811    0.06144 -18.200  < 2e-16 ***group3      -1.07177    0.06311 -16.983  < 2e-16 ***group4      -0.89593    0.06313 -14.193  < 2e-16 ***group5            NA         NA      NA       NA    contvar     -0.01132    0.06987  -0.162    0.872    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1Residual standard error: 0.1938 on 91 degrees of freedomMultiple R-squared:  0.9301,    Adjusted R-squared:  0.9239 F-statistic: 151.3 on 8 and 91 DF,  p-value: < 2.2e-16

This happens if I fit a linear model manually, however, by building a no-intercept linear model function, and then minimising the residual sum of squares with optim. Then I do get coefficients for all treatments and for the continuous variable:

lm_manual <- function(b){  b1 <- b[1];  b2 <- b[2];  b3 <- b[3];  b4 <- b[4];   bgr1 <- b[5];  bgr2 <- b[6];  bgr3 <- b[7];  bgr4 <- b[8]; bgr5 <- b[9];   bcont <- b[10]  y <- 0 + b1*data$treat1 + b2*data$treat2 + b3*data$treat3 + b4*data$treat4 + bcont*data$contvar  SSR <- sum((y - data$y)^2)}out <- optim(par = c(-0.5, -0.2, 0.2, 0.5, -1, -0.1, 0, 0.1, 1, 0.01), fn = lm_manual)(out$par) # coefficients for treatment 1 - 4, group 1 - 5 and the continuous variable

This reads: -0.42869647 -0.13663451 0.03372066 0.13213677 -1.64310116 0.61429850 0.51688837 -0.10915319 1.14837076 0.15682632. These estimates were what I expected to get when fitting the lm zero-intercept model. In the lm zero-intercept model both the values of the coefficients, and the NA for group5 were unexpected.

How can I get such coefficient estimates using lm? Or if that is not possible: why would this only work when fitting the model "manually" and not when fitting with lm?


Viewing all articles
Browse latest Browse all 124

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>