Quantcast
Channel: Active questions tagged lm - Stack Overflow
Viewing all articles
Browse latest Browse all 106

Why does my fe with demeaned data ( lm() ) not reproduce coefficients from plm(), lfe() and lsdv?

$
0
0

I am trying to reproduce fixed effects coefficients for panel data using different packages and techniques: (1) plm(), (2) lfe(), (3) dummy-lsdv with lm(), and (4) demeaned-fe with lm().

My data set consists of 1581 observations and 13 variables. It is survey data from 527 respondents (var = respondent) in 3 waves (var = wave). I have one DV (y) and 10 IVs (x1 to x10).

The data set looks like this:

  respondent  wave      y    x1    x2    x3    x4    x5    x6    x7    x8    x9   x10<dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>1          1     1  1         1     2    NA    NA     2   1     1.5  NA      NA     22          1     2 NA         2    NA     0     0     0   1     4     4       1     33          1     3 NA         4     5    NA    NA    NA  NA     8    NA      NA     14          2     1  0.931     3     3     2     2     2   4     7.5   7.5    NA     35          2     2  0.986     4    NA    NA     2     2   4.5   6.5   5       3     46          2     3  0.986     4     3     2     2     2   3     3     3       2     3

My question: When I perform fixed effects regressions with (1) plm(), (2) lfe(), and (3) dummy-lsdv with lm(), the models always return the same coefficients. However, when I perform a fixed effects regression using (4) demeaned data and the lm() package, I get different coefficients. This puzzles me and I wonder: why?

Here is my code:

1. plm():

Input:

library(plm)model_plm <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,             data = dataset,             index=c("respondent","wave"),             model = "within",             effect = 'individual')summary(model_plm)

Output:

Unbalanced Panel: n = 228, T = 1-2, N = 316Residuals:      Min.    1st Qu.     Median    3rd Qu.       Max. -0.3240866 -0.0048416  0.0000000  0.0048416  0.3240866 Coefficients:      Estimate Std. Error t-value Pr(>|t|)  x1  -0.0216484  0.0167614 -1.2916  0.20032  x2   0.0178114  0.0141219  1.2613  0.21097  x3  -0.0145262  0.0103954 -1.3974  0.16627  x4  -0.0061660  0.0133069 -0.4634  0.64439  x5   0.0174401  0.0144256  1.2090  0.23032  x6  -0.0053556  0.0067210 -0.7968  0.42796  x7   0.0065517  0.0097627  0.6711  0.50415  x8  -0.0151375  0.0081992 -1.8462  0.06865 .x9   0.0235351  0.0092612  2.5412  0.01303 *x10  0.0235181  0.0228927  1.0273  0.30745  ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

2. lfe():

Input:

library(lfe)model_lfe <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 | respondent, data = dataset)summary(model_lfe)

Output:

Coefficients:     Estimate Std. Error t value Pr(>|t|)  x1  -0.021648   0.016761  -1.292   0.2003  x2   0.017811   0.014122   1.261   0.2110  x3  -0.014526   0.010395  -1.397   0.1663  x4  -0.006166   0.013307  -0.463   0.6444  x5   0.017440   0.014426   1.209   0.2303  x6  -0.005356   0.006721  -0.797   0.4280  x7   0.006552   0.009763   0.671   0.5041  x8  -0.015138   0.008199  -1.846   0.0687 .x9   0.023535   0.009261   2.541   0.0130 *x10  0.023518   0.022893   1.027   0.3074  ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

3. LSDV with lm():

Input:

model_lsdv <- lm(y ~ as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset)options(max.print=2000)summary(model_lsdv)

Output:

Coefficients:                        Estimate Std. Error t value Pr(>|t|)    (Intercept)            0.9499746  0.1505806   6.309 1.57e-08 ***[...]    x1                    -0.0216484  0.0167614  -1.292  0.20032    x2                     0.0178114  0.0141219   1.261  0.21097    x3                    -0.0145262  0.0103954  -1.397  0.16627    x4                    -0.0061660  0.0133069  -0.463  0.64439    x5                     0.0174401  0.0144256   1.209  0.23032    x6                    -0.0053556  0.0067210  -0.797  0.42796    x7                     0.0065517  0.0097627   0.671  0.50415    x8                    -0.0151375  0.0081992  -1.846  0.06865 .  x9                     0.0235351  0.0092612   2.541  0.01303 *  x10                    0.0235181  0.0228927   1.027  0.30745    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

4. Demeaned FE with lm():

Input:

dataset_demeaned <- with(dataset, data.frame(respondent = respondent,                                             wave = wave,                                             y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))                                            )                        )model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)summary(model_dmd)

Output:

Coefficients:     Estimate Std. Error t value Pr(>|t|)   x1  -0.006223   0.008220  -0.757  0.44957   x2   0.013181   0.007880   1.673  0.09543 . x3  -0.012807   0.005484  -2.335  0.02018 * x4  -0.006431   0.006311  -1.019  0.30900   x5   0.015455   0.005941   2.602  0.00973 **x6  -0.001429   0.003402  -0.420  0.67483   x7   0.004362   0.004698   0.929  0.35387   x8  -0.009336   0.004366  -2.139  0.03326 * x9   0.015731   0.005267   2.987  0.00305 **x10  0.007631   0.010922   0.699  0.48529   ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

More information:
I have already performed these checks:

  • I have used other ways to demean the data, e.g. the demean() function. --> Same results as in 4.
  • I have calculated some of the demeaned data by hand, which produces the same results as the ave() and the demean() functions.
  • I have played around with the na.action option because I hoped that the problem might result from a different treatment of missing values. But it did not change the results.
  • I've once included the respondent variable as_factor in the (4) demeaned fe model. Like: model_dmd <- lm(y ~ 0 + as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned). This approach reproduced the correct coefficients. However, demeaning should already take care of unobserved heterogeneity and thus the inclusion of dummies seems redundant.

So my best guess is that the problem does not come from the process of demeaning, but from the lm() function. Maybe the fact that the panel is unbalanced plays a role here?

I am very thankful for any suggestions and explanations!


SOLUTION:

Thanks to @G.Grothendieck, I can post the solution here. The correct code of (4) Demeaned FE with lm() should read as:

Input:

# Delete all rows with NAsdataset <- na.omit(dataset)# Demean the rows that are left behinddataset_demeaned <- with(dataset, data.frame(respondent = respondent,                                             wave = wave,                                             y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),                                             x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))                                             )                         )model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)summary(model_dmd)

Output:

Coefficients:     Estimate Std. Error t value Pr(>|t|)    x1  -0.021648   0.008462  -2.558 0.011004 *  x2   0.017811   0.007130   2.498 0.013009 *  x3  -0.014526   0.005248  -2.768 0.005989 ** x4  -0.006166   0.006718  -0.918 0.359452    x5   0.017440   0.007283   2.395 0.017240 *  x6  -0.005356   0.003393  -1.578 0.115530    x7   0.006552   0.004929   1.329 0.184768    x8  -0.015138   0.004140  -3.657 0.000301 ***x9   0.023535   0.004676   5.033 8.24e-07 ***x10  0.023518   0.011558   2.035 0.042734 *  ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

Viewing all articles
Browse latest Browse all 106

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>