I am trying to reproduce fixed effects coefficients for panel data using different packages and techniques: (1) plm()
, (2) lfe()
, (3) dummy-lsdv with lm()
, and (4) demeaned-fe with lm()
.
My data set consists of 1581 observations and 13 variables. It is survey data from 527 respondents (var = respondent) in 3 waves (var = wave). I have one DV (y) and 10 IVs (x1 to x10).
The data set looks like this:
respondent wave y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>1 1 1 1 1 2 NA NA 2 1 1.5 NA NA 22 1 2 NA 2 NA 0 0 0 1 4 4 1 33 1 3 NA 4 5 NA NA NA NA 8 NA NA 14 2 1 0.931 3 3 2 2 2 4 7.5 7.5 NA 35 2 2 0.986 4 NA NA 2 2 4.5 6.5 5 3 46 2 3 0.986 4 3 2 2 2 3 3 3 2 3
My question: When I perform fixed effects regressions with (1) plm()
, (2) lfe()
, and (3) dummy-lsdv with lm()
, the models always return the same coefficients. However, when I perform a fixed effects regression using (4) demeaned data and the lm()
package, I get different coefficients. This puzzles me and I wonder: why?
Here is my code:
1. plm()
:
Input:
library(plm)model_plm <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset, index=c("respondent","wave"), model = "within", effect = 'individual')summary(model_plm)
Output:
Unbalanced Panel: n = 228, T = 1-2, N = 316Residuals: Min. 1st Qu. Median 3rd Qu. Max. -0.3240866 -0.0048416 0.0000000 0.0048416 0.3240866 Coefficients: Estimate Std. Error t-value Pr(>|t|) x1 -0.0216484 0.0167614 -1.2916 0.20032 x2 0.0178114 0.0141219 1.2613 0.21097 x3 -0.0145262 0.0103954 -1.3974 0.16627 x4 -0.0061660 0.0133069 -0.4634 0.64439 x5 0.0174401 0.0144256 1.2090 0.23032 x6 -0.0053556 0.0067210 -0.7968 0.42796 x7 0.0065517 0.0097627 0.6711 0.50415 x8 -0.0151375 0.0081992 -1.8462 0.06865 .x9 0.0235351 0.0092612 2.5412 0.01303 *x10 0.0235181 0.0228927 1.0273 0.30745 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
2. lfe()
:
Input:
library(lfe)model_lfe <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 | respondent, data = dataset)summary(model_lfe)
Output:
Coefficients: Estimate Std. Error t value Pr(>|t|) x1 -0.021648 0.016761 -1.292 0.2003 x2 0.017811 0.014122 1.261 0.2110 x3 -0.014526 0.010395 -1.397 0.1663 x4 -0.006166 0.013307 -0.463 0.6444 x5 0.017440 0.014426 1.209 0.2303 x6 -0.005356 0.006721 -0.797 0.4280 x7 0.006552 0.009763 0.671 0.5041 x8 -0.015138 0.008199 -1.846 0.0687 .x9 0.023535 0.009261 2.541 0.0130 *x10 0.023518 0.022893 1.027 0.3074 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
3. LSDV with lm()
:
Input:
model_lsdv <- lm(y ~ as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset)options(max.print=2000)summary(model_lsdv)
Output:
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9499746 0.1505806 6.309 1.57e-08 ***[...] x1 -0.0216484 0.0167614 -1.292 0.20032 x2 0.0178114 0.0141219 1.261 0.21097 x3 -0.0145262 0.0103954 -1.397 0.16627 x4 -0.0061660 0.0133069 -0.463 0.64439 x5 0.0174401 0.0144256 1.209 0.23032 x6 -0.0053556 0.0067210 -0.797 0.42796 x7 0.0065517 0.0097627 0.671 0.50415 x8 -0.0151375 0.0081992 -1.846 0.06865 . x9 0.0235351 0.0092612 2.541 0.01303 * x10 0.0235181 0.0228927 1.027 0.30745 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
4. Demeaned FE with lm()
:
Input:
dataset_demeaned <- with(dataset, data.frame(respondent = respondent, wave = wave, y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)), x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)), x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)), x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)), x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)), x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)), x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)), x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)), x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)), x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)), x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T)) ) )model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)summary(model_dmd)
Output:
Coefficients: Estimate Std. Error t value Pr(>|t|) x1 -0.006223 0.008220 -0.757 0.44957 x2 0.013181 0.007880 1.673 0.09543 . x3 -0.012807 0.005484 -2.335 0.02018 * x4 -0.006431 0.006311 -1.019 0.30900 x5 0.015455 0.005941 2.602 0.00973 **x6 -0.001429 0.003402 -0.420 0.67483 x7 0.004362 0.004698 0.929 0.35387 x8 -0.009336 0.004366 -2.139 0.03326 * x9 0.015731 0.005267 2.987 0.00305 **x10 0.007631 0.010922 0.699 0.48529 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
More information:
I have already performed these checks:
- I have used other ways to demean the data, e.g. the
demean()
function. --> Same results as in 4. - I have calculated some of the demeaned data by hand, which produces the same results as the
ave()
and thedemean()
functions. - I have played around with the
na.action
option because I hoped that the problem might result from a different treatment of missing values. But it did not change the results. - I've once included the respondent variable
as_factor
in the (4) demeaned fe model. Like:model_dmd <- lm(y ~ 0 + as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
. This approach reproduced the correct coefficients. However, demeaning should already take care of unobserved heterogeneity and thus the inclusion of dummies seems redundant.
So my best guess is that the problem does not come from the process of demeaning, but from the lm()
function. Maybe the fact that the panel is unbalanced
plays a role here?
I am very thankful for any suggestions and explanations!
SOLUTION:
Thanks to @G.Grothendieck, I can post the solution here. The correct code of (4) Demeaned FE with lm()
should read as:
Input:
# Delete all rows with NAsdataset <- na.omit(dataset)# Demean the rows that are left behinddataset_demeaned <- with(dataset, data.frame(respondent = respondent, wave = wave, y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)), x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)), x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)), x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)), x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)), x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)), x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)), x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)), x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)), x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)), x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T)) ) )model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)summary(model_dmd)
Output:
Coefficients: Estimate Std. Error t value Pr(>|t|) x1 -0.021648 0.008462 -2.558 0.011004 * x2 0.017811 0.007130 2.498 0.013009 * x3 -0.014526 0.005248 -2.768 0.005989 ** x4 -0.006166 0.006718 -0.918 0.359452 x5 0.017440 0.007283 2.395 0.017240 * x6 -0.005356 0.003393 -1.578 0.115530 x7 0.006552 0.004929 1.329 0.184768 x8 -0.015138 0.004140 -3.657 0.000301 ***x9 0.023535 0.004676 5.033 8.24e-07 ***x10 0.023518 0.011558 2.035 0.042734 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1