I'm using the following R code in power query, and it's working well to provide me the studentized residuals that I use to flag outliers in a big dataset :
As output, I get the 'df' object containing the col "group" + the augment output including the studentized residuals column.
My objective is obtaining the original dataset + the studentized column to avoid having 2 big tables (my original table has 20M of rows, not easy to have 2 tables with such big size ...).Thanks a lot for your help !
library(tidyverse) library(broom) dataset <- as.data.frame(dataset) dataset$perf <- as.numeric(dataset$perf) dataset$factor1 <- as.factor(dataset$factor1) dataset$factor2 <- as.factor(dataset$factor2) df <- dataset %>% group_by(group) %>% mutate(unique_factor1 = n_distinct(factor1), unique_factor1 = n_distinct(factor1), var = var(perf)) %>% filter( unique_factor1 != 1 & unique_factor2 != 1 & var != 0 ) %>% do(cbind(group = .$group, lm(perf ~ factor1 + factor2, data = .) %>% augment))group perf factor1 Factor21 32 1 11 44 1 21 58 1 31 76 2 11 73 2 21 37 2 31 52 3 11 78 3 21 60 3 32 93 1 12 78 1 22 25 1 32 97 2 12 85 2 22 60 2 32 70 3 12 62 3 22 95 3 3