Quantcast
Channel: Active questions tagged lm - Stack Overflow
Viewing all articles
Browse latest Browse all 124

Does R always return NA as a coefficient as a result of linear regression with unnecessary variables?

$
0
0

My question is about the unnecessary predictors, namely the variables that do not provide any new linear information or the variables that are linear combinations of the other predictors. As you can see the swiss dataset has six variables.

library(swiss)names(swiss)# "Fertility"        "Agriculture"      "Examination"      "Education"        # "Catholic"      "Infant.Mortality"

Now I introduce a new variable ec. It is the linear combination of Examination and Catholic.

ec <- swiss$Examination + swiss$Catholic

When we run a linear regression with unnecessary variables, R drops terms that are linear combinations of other terms and returns NA as their coefficients. The command below illustrates the point perfectly.

lm(Fertility ~ . + ec, swiss)Coefficients: (Intercept)       Agriculture       Examination         Education                 66.9152           -0.1721           -0.2580           -0.8709 Catholic  Infant.Mortality    ec  0.1041            1.0770    NA

However, when we regress first on ec and then all of the regressors as shown below,

lm(Fertility ~ ec + ., swiss) Coefficients: (Intercept)                ec       Agriculture       Examination                66.9152            0.1041           -0.1721           -0.3621             Education          Catholic     Infant.Mortality      -0.8709                NA            1.0770  

I would expect the coefficients of both Catholic and Examination to be NA. The variable ec is linear combination of both of them but in the end the coefficient of Examination is not NA whereas that of the Catholic is NA.

Could anyone explain the reason of that?


Viewing all articles
Browse latest Browse all 124

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>