I work with longitudinal (repeated measures) models of clinical trial data. Patients are randomized to different treatment groups and measured over multiple pre-specified points in time. An example dataset is the FEV dataset from the mmrm
package, where ARMCD
denotes treatment groups and AVISIT
denotes discrete time points:
library(tidyverse)data("fev_data", package = "mmrm")data <- fev_data %>% as_tibble() %>% mutate(ARMCD = as.character(ARMCD)) %>% select(-VISITN, -VISITN2)data#> # A tibble: 800 × 8#> USUBJID AVISIT ARMCD RACE SEX FEV1_BL FEV1 WEIGHT#> <fct> <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl>#> 1 PT1 VIS1 TRT Black or African American Female 25.3 NA 0.677#> 2 PT1 VIS2 TRT Black or African American Female 25.3 40.0 0.801#> 3 PT1 VIS3 TRT Black or African American Female 25.3 NA 0.709#> 4 PT1 VIS4 TRT Black or African American Female 25.3 20.5 0.809#> 5 PT2 VIS1 PBO Asian Male 45.0 NA 0.465#> 6 PT2 VIS2 PBO Asian Male 45.0 31.5 0.233#> 7 PT2 VIS3 PBO Asian Male 45.0 36.9 0.360#> 8 PT2 VIS4 PBO Asian Male 45.0 48.8 0.507#> 9 PT3 VIS1 PBO Black or African American Female 43.5 NA 0.682#> 10 PT3 VIS2 PBO Black or African American Female 43.5 36.0 0.892#> # ℹ 790 more rows#> # ℹ Use `print(n = ...)` to see more rows
It is common to use a so-called "constrained longitudinal data analysis" (cLDA) approach which pools all treatment groups at baseline. If time were continuous with baseline at AVISIT == 0
, it would be easy to enforce cLDA in the model formula:
formula <- FEV1 ~ AVISIT + AVISIT:ARMCD
However, this does not work with discrete time because the extra AVISITVIS1:ARMCDTRT
term distinguishes between baselines for PBO
and TRT
.
colnames(model.matrix(formula, data = data))#> [1] "(Intercept)" "AVISITVIS2" "AVISITVIS3" "AVISITVIS4" #> [5] "AVISITVIS1:ARMCDTRT" "AVISITVIS2:ARMCDTRT" "AVISITVIS3:ARMCDTRT" "AVISITVIS4:ARMCDTRT"
A workaround I have seen others use is to manually enforce cLDA in the data.
count(data, ARMCD, AVISIT)#> # A tibble: 8 × 3#> ARMCD AVISIT n#> <chr> <fct> <int>#> 1 PBO VIS1 105#> 2 PBO VIS2 105#> 3 PBO VIS3 105#> 4 PBO VIS4 105#> 5 TRT VIS1 95#> 6 TRT VIS2 95#> 7 TRT VIS3 95#> 8 TRT VIS4 95clda <- mutate(data, ARMCD = ifelse(AVISIT == "VIS1", "PBO", ARMCD))count(clda, ARMCD, AVISIT)#> # A tibble: 7 × 3#> ARMCD AVISIT n#> <chr> <fct> <int>#> 1 PBO VIS1 200#> 2 PBO VIS2 105#> 3 PBO VIS3 105#> 4 PBO VIS4 105#> 5 TRT VIS2 95#> 6 TRT VIS3 95#> 7 TRT VIS4 95
But the terms in the model are all the same.
colnames(model.matrix(formula, data = clda))#> [1] "(Intercept)" "AVISITVIS2" "AVISITVIS3" "AVISITVIS4" #> [5] "AVISITVIS1:ARMCDTRT" "AVISITVIS2:ARMCDTRT" "AVISITVIS3:ARMCDTRT" "AVISITVIS4:ARMCDTRT"
Even worse, the model matrix is no longer full rank.
as.integer(Matrix::rankMatrix(model.matrix(formula, data = clda)))#> [1] 7
Is there a way to code up the formula and/or contrasts to use cLDA without having to recode the data or model matrix?
It seems oddly low-level to have to manually generate a custom model matrix and then call lm()
or brms::brm()
on the hacked model matrix instead of the actual data. And it would prevent me from straightforwardly working with many of the nice post-processing tools that brms
provides.