I am using the caret
package for fitting different models with the same data. I am using cross-validation for all of them; however, when I use different number of folds with the lm
method, I get the same coefficients, I was expecting at least small differences. What is the reason? Is this expected?
Thanks for your time!
Here is a reprex
library(caret)#> Loading required package: ggplot2#> Loading required package: lattice{set.seed(123)Xs <- matrix(rnorm(300*20),nrow = 300)Y <- rnorm(300)data <- cbind(Xs,Y) |> as.data.frame()}ctrlspecs_2 <- trainControl(method="cv", number=2)ctrlspecs_10 <- trainControl(method="cv", number=10)set.seed(123)model_2 <- train(Y~., data = data, method = "lm", trControl = ctrlspecs_2)set.seed(123)model_10 <- train(Y~., data = data, method = "lm", trControl = ctrlspecs_10)summary(model_2)#> #> Call:#> lm(formula = .outcome ~ ., data = dat)#> #> Residuals:#> Min 1Q Median 3Q Max #> -3.5934 -0.6277 -0.0082 0.7448 2.2594 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -0.044073 0.060499 -0.728 0.46692 #> V1 -0.129567 0.065772 -1.970 0.04984 * #> V2 -0.002505 0.061859 -0.040 0.96773 #> V3 -0.046897 0.059486 -0.788 0.43115 #> V4 0.044195 0.061427 0.719 0.47245 #> V5 0.086981 0.064085 1.357 0.17579 #> V6 0.014166 0.061001 0.232 0.81653 #> V7 -0.077959 0.060911 -1.280 0.20165 #> V8 0.017661 0.065486 0.270 0.78759 #> V9 -0.096562 0.060567 -1.594 0.11200 #> V10 0.164024 0.060858 2.695 0.00746 **#> V11 -0.028008 0.060869 -0.460 0.64577 #> V12 0.034027 0.062118 0.548 0.58428 #> V13 -0.066028 0.066681 -0.990 0.32294 #> V14 0.142444 0.061319 2.323 0.02090 * #> V15 -0.129046 0.060109 -2.147 0.03267 * #> V16 -0.020873 0.061512 -0.339 0.73462 #> V17 0.046835 0.063381 0.739 0.46056 #> V18 0.035570 0.066567 0.534 0.59353 #> V19 -0.016253 0.060039 -0.271 0.78682 #> V20 -0.082083 0.060843 -1.349 0.17840 #> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1#> #> Residual standard error: 1.033 on 279 degrees of freedom#> Multiple R-squared: 0.1041, Adjusted R-squared: 0.03986 #> F-statistic: 1.621 on 20 and 279 DF, p-value: 0.04731summary(model_10)#> #> Call:#> lm(formula = .outcome ~ ., data = dat)#> #> Residuals:#> Min 1Q Median 3Q Max #> -3.5934 -0.6277 -0.0082 0.7448 2.2594 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -0.044073 0.060499 -0.728 0.46692 #> V1 -0.129567 0.065772 -1.970 0.04984 * #> V2 -0.002505 0.061859 -0.040 0.96773 #> V3 -0.046897 0.059486 -0.788 0.43115 #> V4 0.044195 0.061427 0.719 0.47245 #> V5 0.086981 0.064085 1.357 0.17579 #> V6 0.014166 0.061001 0.232 0.81653 #> V7 -0.077959 0.060911 -1.280 0.20165 #> V8 0.017661 0.065486 0.270 0.78759 #> V9 -0.096562 0.060567 -1.594 0.11200 #> V10 0.164024 0.060858 2.695 0.00746 **#> V11 -0.028008 0.060869 -0.460 0.64577 #> V12 0.034027 0.062118 0.548 0.58428 #> V13 -0.066028 0.066681 -0.990 0.32294 #> V14 0.142444 0.061319 2.323 0.02090 * #> V15 -0.129046 0.060109 -2.147 0.03267 * #> V16 -0.020873 0.061512 -0.339 0.73462 #> V17 0.046835 0.063381 0.739 0.46056 #> V18 0.035570 0.066567 0.534 0.59353 #> V19 -0.016253 0.060039 -0.271 0.78682 #> V20 -0.082083 0.060843 -1.349 0.17840 #> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1#> #> Residual standard error: 1.033 on 279 degrees of freedom#> Multiple R-squared: 0.1041, Adjusted R-squared: 0.03986 #> F-statistic: 1.621 on 20 and 279 DF, p-value: 0.04731identical(model_2$finalModel$coefficients,model_10$finalModel$coefficients)#> [1] TRUE
Created on 2024-03-20 with reprex v2.1.0