I want to run an IV regression in R using ivreg from the AER package. The output gives me a negative R^2 which should be impossible as far as I know. When running the same regression manually with 2SLS the R^2 is positive although very small.
This is caused by the fact that the AER package uses the true X and not the predicted/fitted values from the first stage to calculate the residuals. It happens when the fit is pretty bad but the R^2 always differes between using ivreg and 2SLS manually. My question is whether the calculation of R^2 is wrong in the AER package or if R^2 may be negative under these circumstances.Here is some code to reproduce the negative R^2:
library(AER)set.seed(40)n <- 100# Data generationZ <- rnorm(n, 10, 2)X <- 2 * Z + rnorm(n, 0, 10000)Y <- 3 * X + rnorm(n, 0, 1000000)df <- data.frame(Z, X, Y)# IV regressionivreg1 <- ivreg(Y ~ X | Z, data = df)summary(ivreg1)# 2SLS approachlm1 <- lm(X ~ Z, data = df)df$predict <- predict(lm1)lm2 <- lm(Y ~ predict, data = df)summary(lm2)
The output of the ivreg function:
Call:ivreg(formula = Y ~ X | Z, data = df)Residuals: Min 1Q Median 3Q Max -3513062 -843258 -33611 845922 4533273 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 43430.1 165298.8 0.263 0.793X -114.1 131.9 -0.865 0.389Residual standard error: 1553000 on 98 degrees of freedomMultiple R-Squared: -1.665, Adjusted R-squared: -1.692 Wald test: 0.7479 on 1 and 98 DF, p-value: 0.3893