An ad-hoc forward model selection strategy is used to evaluate the effect of the terms of the linear predictor in mixed effects models. A strongly reduced starting model is prompted by earlier analysis using linear models conditioned on genes. The model space is explored using that prior knowledge combined with information criteria (BIC and AIC) and likelihood ratio test. The biologically interesting finding is that Ancestry.1, Ancestry.3 and Age have significant gene-specific effects but no gene-independent effects. Dx (schizophrenia) has neither gene-specific nor gene-independent effect.

## Loading required package: Matrix
## Loading required package: methods

Selected genes (inferred to be imprinted)

gene.ids <- unlist(read.csv("../../data/genes.regression.new", as.is = TRUE))
names(gene.ids) <- gene.ids

Prepare data frame including all 30 selected genes

dat <- merge.data(gene.ids = gene.ids)

Model selection

The strategy is to start from a model that includes the strongest predictor terms based on previous analysis using separately modeling genes, then confirm the significant effect of these terms. After this, single—mainly technical—terms are added separately (not yet accumulatively) to to asses the effect of these. The terms that improve the fit according to some criterion will be jointly added to resulting in . A similar procedure involving two sets of biological terms takes model selection further resulting in , , etc.

Starting with model

fo <- Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) + (scale(Age) | Gene)
(M1 <- lmer(fo, data = dat))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) +  
##     (scale(Age) | Gene)
##    Data: dat
## REML criterion at convergence: 20464.87
## Random effects:
##  Groups                 Name        Std.Dev. Corr 
##  Institution:Individual (Intercept) 0.3175        
##  Gene                   (Intercept) 1.2355        
##                         scale(Age)  0.1388   -0.51
##  RNA_batch              (Intercept) 0.1581        
##  Institution            (Intercept) 0.2547        
##  Residual                           0.7938        
## Number of obs: 8213, groups:  
## Institution:Individual, 579; Gene, 30; RNA_batch, 9; Institution, 3
## Fixed Effects:
## (Intercept)   scale(RIN)  
##     3.11696      0.06006
av <- list()
av$RIN <- anova(update(M1, . ~ . - scale(RIN)), M1)
av$RNA_batch <- anova(update(M1, . ~ . - (1 | RNA_batch)), M1)
av$Institution <- anova(update(M1, . ~ . - (1 | Institution)), M1)
av$Individual <- anova(update(M1, . ~ . - (1 | Institution:Individual)), M1)
av$Age.Gene <- anova(update(M1, . ~ . - (scale(Age) | Gene) + (1 | Gene)), M1)
av$Gene <- anova(update(M1, . ~ . - (scale(Age) | Gene)), update(M1, . ~ . - (scale(Age) | Gene) + (1 | Gene)))
summarize.anova <- function(av)
    do.call(rbind, lapply(av, function(x)
                          data.frame(Delta.AIC = diff(x$AIC),
                                     Delta.BIC = diff(x$BIC),
                                     Chisq = x$Chisq[2],
                                     df = x[["Chi Df"]][2],
                                     p.Chi = x[["Pr(>Chisq)"]][2])))
print.av(summarize.anova(av))
##             Delta.AIC Delta.BIC    Chisq    df      p.Chi
## RIN              -8.0      -1.0     10.0     1    1.5e-03
## RNA_batch       -25.7     -18.7     27.7     1    1.4e-07
## Institution     -61.2     -54.2     63.2     1    1.9e-15
## Individual     -453.8    -446.8    455.8     1   4.0e-101
## Age.Gene       -192.2    -178.2    196.2     2    2.5e-43
## Gene          -8700.3   -8693.3   8702.3     1        0.0

Notable results

Extending to with technical terms

av.1 <- list()
# interactions between random-effect factors
av.1$Gene.Instit <- anova(M1, update(M1, . ~ . + (1 | Gene:Institution)))
av.1$Gene.RNA_batch <- anova(M1, update(M1, . ~ . + (1 | Gene:RNA_batch)))
av.1$RNA_batch.Instit <- anova(M1, update(M1, . ~ . + (1 | RNA_batch:Institution)))
# interactions between random-effect covariates and factors
# RIN
av.1$RIN.Instit <- anova(M1, update(M1, . ~ . - (1 | Institution) + (scale(RIN) | Institution)))
av.1$RIN.RNA_batch <- anova(M1, update(M1, . ~ . - (1 | RNA_batch) + (scale(RIN) | RNA_batch)))
av.1$RIN.Gene <- anova(M1, update(M1, . ~ . - (1 | Gene) + (scale(RIN) | Gene)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?
# PMI
av.1$PMI.Instit <- anova(M1, update(M1, . ~ . - (1 | Institution) + (scale(PMI) | Institution)))
av.1$PMI.RNA_batch <- anova(M1, update(M1, . ~ . - (1 | RNA_batch) + (scale(PMI) | RNA_batch)))
av.1$PMI.Gene <- anova(M1, update(M1, . ~ . - (1 | Gene) + (scale(PMI) | Gene)))
av.1$PMI <- anova(M1, update(M1, . ~ . + scale(PMI)))
# Age revisited
av.1$Age.Instit <- anova(M1, update(M1, . ~ . - (1 | Institution) + (scale(Age) | Institution)))
av.1$Age.RNA_batch <- anova(M1, update(M1, . ~ . - (1 | RNA_batch) + (scale(Age) | RNA_batch)))
av.1$Age <- anova(M1, update(M1, . ~ . + scale(Age)))
print.av(summarize.anova(av.1))
##                  Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Gene.Instit         -136.5    -129.5    138.5     1    5.8e-32
## Gene.RNA_batch         0.3       7.3      1.7     1    1.9e-01
## RNA_batch.Instit       2.0       9.0      0.0     1        1.0
## RIN.Instit             2.1      16.1      1.9     2    3.8e-01
## RIN.RNA_batch          3.4      17.4      0.6     2    7.3e-01
## RIN.Gene            -326.0    -305.0    332.0     3    1.2e-71
## PMI.Instit             3.7      17.7      0.3     2    8.5e-01
## PMI.RNA_batch          2.0      16.0      2.0     2    3.6e-01
## PMI.Gene             -11.4       9.7     17.4     3    6.0e-04
## PMI                    1.9       8.9      0.1     1    7.6e-01
## Age.Instit             1.2      15.2      2.8     2    2.4e-01
## Age.RNA_batch          2.3      16.4      1.7     2    4.4e-01
## Age                    1.8       8.8      0.2     1    6.5e-01

Notable results

Extending with biological terms

M1b <- update(M1, . ~ . + (1 | Gene:Institution) - (scale(Age) | Gene))
(M2 <- update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) | Gene)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) +  
##     (1 | Gene:Institution) + (scale(Age) + scale(RIN) + scale(PMI) |  
##     Gene)
##    Data: dat
## REML criterion at convergence: 20019.65
## Random effects:
##  Groups                 Name        Std.Dev. Corr             
##  Institution:Individual (Intercept) 0.32736                   
##  Gene:Institution       (Intercept) 0.19668                   
##  Gene                   (Intercept) 1.22177                   
##                         scale(Age)  0.07065  -0.35            
##                         scale(RIN)  0.17682   0.21 -0.71      
##                         scale(PMI)  0.01870   0.24  0.82 -0.66
##  RNA_batch              (Intercept) 0.16600                   
##  Institution            (Intercept) 0.21699                   
##  Residual                           0.76397                   
## Number of obs: 8213, groups:  
## Institution:Individual, 579; Gene:Institution, 90; Gene, 30; RNA_batch, 9; Institution, 3
## Fixed Effects:
## (Intercept)   scale(RIN)  
##     3.10931      0.05196
av.2 <- list()
# Ancestry
av.2$Ances1 <- anova(M2, update(M2, . ~ . + scale(Ancestry.1)))
av.2$Ances2 <- anova(M2, update(M2, . ~ . + scale(Ancestry.2)))
av.2$Ances3 <- anova(M2, update(M2, . ~ . + scale(Ancestry.3)))
av.2$Ances4 <- anova(M2, update(M2, . ~ . + scale(Ancestry.4)))
av.2$Ances5 <- anova(M2, update(M2, . ~ . + scale(Ancestry.5)))
av.2$Ances1.Gene <- anova(M2, update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) + scale(Ancestry.1) | Gene)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
av.2$Ances2.Gene <- anova(M2, update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) + scale(Ancestry.2) | Gene)))
av.2$Ances3.Gene <- anova(M2, update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) + scale(Ancestry.3) | Gene)))
av.2$Ances4.Gene <- anova(M2, update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) + scale(Ancestry.4) | Gene)))
av.2$Ances5.Gene <- anova(M2, update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) + scale(Ancestry.5) | Gene)))
print.av(summarize.anova(av.2))
##             Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Ances1           -0.2       6.8      2.2     1    1.3e-01
## Ances2            1.7       8.7      0.3     1    5.6e-01
## Ances3            1.9       8.9      0.1     1    7.7e-01
## Ances4            0.9       7.9      1.1     1    2.9e-01
## Ances5            2.0       9.0      0.0     1    8.4e-01
## Ances1.Gene     -56.8     -21.7     66.8     5    4.8e-13
## Ances2.Gene       5.1      40.2      4.9     5    4.3e-01
## Ances3.Gene      -3.7      31.3     13.7     5    1.7e-02
## Ances4.Gene       7.9      42.9      2.1     5    8.3e-01
## Ances5.Gene       9.1      44.1      0.9     5    9.7e-01

Notable results

Model and

Based on the above findings the linear predictor for model and is defined as below, and and are fitted:

(M3 <- update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) +  
##     (1 | Gene:Institution) + (scale(Age) + scale(RIN) + scale(Ancestry.1) |  
##     Gene)
##    Data: dat
## REML criterion at convergence: 19956.89
## Random effects:
##  Groups                 Name              Std.Dev. Corr             
##  Institution:Individual (Intercept)       0.32862                   
##  Gene:Institution       (Intercept)       0.19890                   
##  Gene                   (Intercept)       1.22187                   
##                         scale(Age)        0.06926  -0.44            
##                         scale(RIN)        0.17915   0.20 -0.75      
##                         scale(Ancestry.1) 0.08387   0.24  0.04  0.10
##  RNA_batch              (Intercept)       0.16599                   
##  Institution            (Intercept)       0.21712                   
##  Residual                                 0.75881                   
## Number of obs: 8213, groups:  
## Institution:Individual, 579; Gene:Institution, 90; Gene, 30; RNA_batch, 9; Institution, 3
## Fixed Effects:
## (Intercept)   scale(RIN)  
##     3.14180      0.05632
M4 <- update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(PMI) +
                           scale(Ancestry.1) + scale(Ancestry.3) | Gene) +
             scale(Ancestry.1))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

While the fitting of the more simple model converges that of the more complex does not. Therefore will be used for subsequent analysis.

Adding more biological terms

Now add terms including gender Gender and Dx:

av.3 <- list()
# Gender
av.3$Gender.fix <- anova(M3, update(M3, . ~ . + Gender))
av.3$Gender.ran <- anova(M3, m <- update(M3, . ~ . + (1 | Gender)))
av.3$Gender.Gene <- anova(m, update(m, . ~ . + (1 | Gender:Gene)))
# Dx
av.3$Dx.fix <- anova(M3, update(M3, . ~ . + Dx))
av.3$Dx.ran <- anova(M3, m <- update(M3, . ~ . + (1 | Dx)))
av.3$Dx.Gene <- anova(m, update(m, . ~ . + (1 | Dx:Gene)))

Note that adding Gender and Dx in the following way (i.e.(Gender+…|Gene)) is not the proper way because Gender and Dx are factors, whose effect cannot be modeled as a slope. Despite this the lme4 allows fitting models with such formulas. But as the warnings below show the fit does not converge.

# factor playing the role of slope doesn't make much sense
invisible(update(M1b, . ~ . + (Gender + scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

Results

print.av(summarize.anova(av.3))
##             Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Gender.fix        1.0       8.0      1.0     1    3.2e-01
## Gender.ran        2.0       9.0      0.0     1        1.0
## Gender.Gene      -5.8       1.2      7.8     1    5.2e-03
## Dx.fix            3.9      17.9      0.1     2    9.5e-01
## Dx.ran            2.0       9.0      0.0     1        1.0
## Dx.Gene           0.5       7.5      1.5     1    2.1e-01

Only (1|Gender:Gene) improves fit appreciably, while Dx has no effect (neither random, nor fixed, nor in interaction with Gene).

The impact of removing (Age|Gene) from M3

This is to investigate how much improvement (Age|Gene) can lead to in the more complex M3 (cf. improvement by (Age|Gene) in M1). The result below shows the improvement is less in M3 but still highly significant.

print.av(summarize.anova(list(anova(update(M1b, . ~ . + (scale(RIN) + scale(Ancestry.1) | Gene)),
              update(M1b, . ~ . + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene))))))
## refitting model(s) with ML (instead of REML)
##   Delta.AIC Delta.BIC    Chisq    df      p.Chi
## 1     -23.4       4.7     31.4     4    2.5e-06

M5 and M6: more biological terms

Based on the findings above M5 and M5.Dx.Gene are defined (the latter is renamed to M6 later). These are thus better fitting models than any of the previous ones.

(M5 <- update(M1b, . ~ . + (1 | Gender:Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) +  
##     (1 | Gene:Institution) + (1 | Gender:Gene) + (scale(Age) +  
##     scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)
##    Data: dat
## REML criterion at convergence: 19921.19
## Random effects:
##  Groups                 Name              Std.Dev. Corr                   
##  Institution:Individual (Intercept)       0.32947                         
##  Gene:Institution       (Intercept)       0.19838                         
##  Gender:Gene            (Intercept)       0.05939                         
##  Gene                   (Intercept)       1.22077                         
##                         scale(Age)        0.06600  -0.45                  
##                         scale(RIN)        0.17777   0.19 -0.71            
##                         scale(Ancestry.1) 0.08317   0.24  0.04  0.04      
##                         scale(Ancestry.3) 0.04528  -0.31  0.37 -0.20 -0.88
##  RNA_batch              (Intercept)       0.16581                         
##  Institution            (Intercept)       0.21710                         
##  Residual                                 0.75584                         
## Number of obs: 8213, groups:  
## Institution:Individual, 579; Gene:Institution, 90; Gender:Gene, 60; Gene, 30; RNA_batch, 9; Institution, 3
## Fixed Effects:
## (Intercept)   scale(RIN)  
##     3.13115      0.05611
(M5.Dx.Gene <- update(M5, . ~ . + (1 | Dx:Gene)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Q ~ scale(RIN) + (1 | RNA_batch) + (1 | Institution) + (1 | Institution:Individual) +  
##     (1 | Gene:Institution) + (1 | Gender:Gene) + (scale(Age) +  
##     scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) +  
##     (1 | Dx:Gene)
##    Data: dat
## REML criterion at convergence: 19919.61
## Random effects:
##  Groups                 Name              Std.Dev. Corr                   
##  Institution:Individual (Intercept)       0.32953                         
##  Dx:Gene                (Intercept)       0.03872                         
##  Gene:Institution       (Intercept)       0.19903                         
##  Gender:Gene            (Intercept)       0.06114                         
##  Gene                   (Intercept)       1.22093                         
##                         scale(Age)        0.06523  -0.45                  
##                         scale(RIN)        0.17692   0.20 -0.70            
##                         scale(Ancestry.1) 0.08326   0.24  0.05  0.04      
##                         scale(Ancestry.3) 0.04468  -0.30  0.37 -0.20 -0.88
##  RNA_batch              (Intercept)       0.16608                         
##  Institution            (Intercept)       0.21711                         
##  Residual                                 0.75525                         
## Number of obs: 8213, groups:  
## Institution:Individual, 579; Dx:Gene, 90; Gene:Institution, 90; Gender:Gene, 60; Gene, 30; RNA_batch, 9; Institution, 3
## Fixed Effects:
## (Intercept)   scale(RIN)  
##     3.13076      0.05625

The following sequence of ANOVAs tests the effects of biological terms in these better fitting but also more complex models:

av.5 <- list()
av.5$Age.Gene <-
    anova(update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5)
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.1.Gene <-
    anova(update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.3) | Gene)),
          M5)
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.2.Gene <-
    anova(M5,
          update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.2) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.3.Gene <-
    anova(update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)),
          M5)
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.4.Gene <-
    anova(M5,
          update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.4) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.5.Gene <-
    anova(M5,
          update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.5) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5$Gender.Gene <-
    anova(update(M5, . ~ . - (1 | Gender:Gene)), M5)
## refitting model(s) with ML (instead of REML)
av.5$Gender <- anova(M5, update(M5, . ~ . + (1 | Gender)))
## refitting model(s) with ML (instead of REML)
# note that the next one is an addition to M5
av.5$Dx.Gene <-
    anova(M5, update(M5, . ~ . + (1 | Dx:Gene)))
## refitting model(s) with ML (instead of REML)
av.5$Dx <- anova(M5, update(M5, . ~ . + (1 | Dx)))
## refitting model(s) with ML (instead of REML)

Now fit M6 and perform ANOVAs similarly to M5

M6 <- M5.Dx.Gene
av.6 <- list()
av.6$Age.Gene <-
    anova(update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M6)
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.1.Gene <-
    anova(update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.3) | Gene)),
          M6)
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.2.Gene <-
    anova(M6,
          update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.2) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.3.Gene <-
    anova(update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)),
          M6)
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.4.Gene <-
    anova(M6,
          update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.4) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.5.Gene <-
    anova(M6,
          update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.5) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.6$Gender.Gene <-
    anova(update(M6, . ~ . - (1 | Gender:Gene)), M6)
## refitting model(s) with ML (instead of REML)
av.6$Gender <- anova(M6, update(M6, . ~ . + (1 | Gender)))
## refitting model(s) with ML (instead of REML)
av.6$Dx.Gene <-
    anova(update(M6, . ~ . - (1 | Dx:Gene)), M6)
## refitting model(s) with ML (instead of REML)
av.6$Dx <- anova(M6, update(M6, . ~ . + (1 | Dx)))
## refitting model(s) with ML (instead of REML)
av.5$Gene <-
    anova(update(M5, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (-1 + scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5)
## refitting model(s) with ML (instead of REML)
av.6$Gene <-
    anova(update(M6, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (-1 + scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M6)
## refitting model(s) with ML (instead of REML)

Effect of Age, Ancestry.n independently from other variables.

av.5$Age <- anova(M5, update(M5, . ~ . + scale(Age)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.1 <- anova(M5, update(M5, . ~ . + scale(Ancestry.1)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.2 <- anova(M5, update(M5, . ~ . + scale(Ancestry.2)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.3 <- anova(M5, update(M5, . ~ . + scale(Ancestry.3)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.4 <- anova(M5, update(M5, . ~ . + scale(Ancestry.4)))
## refitting model(s) with ML (instead of REML)
av.5$Ancestry.5 <- anova(M5, update(M5, . ~ . + scale(Ancestry.5)))
## refitting model(s) with ML (instead of REML)
av.6$Age <- anova(M6, update(M6, . ~ . + scale(Age)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.1 <- anova(M6, update(M6, . ~ . + scale(Ancestry.1)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.2 <- anova(M6, update(M6, . ~ . + scale(Ancestry.2)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.3 <- anova(M6, update(M6, . ~ . + scale(Ancestry.3)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.4 <- anova(M6, update(M6, . ~ . + scale(Ancestry.4)))
## refitting model(s) with ML (instead of REML)
av.6$Ancestry.5 <- anova(M6, update(M6, . ~ . + scale(Ancestry.5)))
## refitting model(s) with ML (instead of REML)

Results of ANOVA

write.csv(summarize.anova(av.5), "../../results/anova-mixed-M5.csv")
print.av(summarize.anova(av.5))
##                 Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Age.Gene            -18.9      16.2     28.9     5    2.5e-05
## Ancestry.1.Gene     -71.2     -36.2     81.2     5    4.6e-16
## Ancestry.2.Gene       6.3      48.3      5.7     6    4.5e-01
## Ancestry.3.Gene     -17.9      17.2     27.9     5    3.8e-05
## Ancestry.4.Gene       6.9      48.9      5.1     6    5.3e-01
## Ancestry.5.Gene      10.8      52.9      1.2     6    9.8e-01
## Gender.Gene          -5.7       1.3      7.7     1    5.5e-03
## Gender                2.0       9.0      0.0     1        1.0
## Dx.Gene               0.4       7.4      1.6     1    2.1e-01
## Dx                    2.0       9.0      0.0     1        1.0
## Gene               -126.8     -91.7    136.8     5    8.5e-28
## Age                   1.3       8.3      0.7     1    3.9e-01
## Ancestry.1            0.6       7.6      1.4     1    2.4e-01
## Ancestry.2            1.7       8.7      0.3     1    5.6e-01
## Ancestry.3            1.6       8.6      0.4     1    5.4e-01
## Ancestry.4            0.9       8.0      1.1     1    3.0e-01
## Ancestry.5            1.9       9.0      0.1     1    8.1e-01
write.csv(summarize.anova(av.6), "../../results/anova-mixed-M6.csv")
print.av(summarize.anova(av.6))
##                 Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Age.Gene            -17.5      17.5     27.5     5    4.5e-05
## Ancestry.1.Gene     -71.4     -36.4     81.4     5    4.2e-16
## Ancestry.2.Gene       6.4      48.4      5.6     6    4.7e-01
## Ancestry.3.Gene     -17.1      18.0     27.1     5    5.5e-05
## Ancestry.4.Gene       6.9      49.0      5.1     6    5.4e-01
## Ancestry.5.Gene      10.9      53.0      1.1     6    9.8e-01
## Gender.Gene          -6.4       0.6      8.4     1    3.8e-03
## Gender                2.0       9.0      0.0     1        1.0
## Dx.Gene               0.4       7.4      1.6     1    2.1e-01
## Dx                    2.0       9.0      0.0     1        1.0
## Gene               -126.1     -91.0    136.1     5    1.2e-27
## Age                   1.2       8.3      0.8     1    3.9e-01
## Ancestry.1            0.6       7.6      1.4     1    2.4e-01
## Ancestry.2            1.7       8.7      0.3     1    5.6e-01
## Ancestry.3            1.6       8.6      0.4     1    5.4e-01
## Ancestry.4            1.0       8.0      1.0     1    3.1e-01
## Ancestry.5            1.9       9.0      0.1     1    8.1e-01

Revision for Nature Communications

M5.odd <- lmer(formula(M5), data = dat, subset = Gene %in% gene.ids[seq(from = 1, to = length(gene.ids), by = 2)])
M5.even <- lmer(formula(M5), data = dat, subset = Gene %in% gene.ids[seq(from = 2, to = length(gene.ids), by = 2)])
av.5.odd <- list()
av.5.odd$Age.Gene <-
    anova(update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5.odd)
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.1.Gene <-
    anova(update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.3) | Gene)),
          M5.odd)
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.2.Gene <-
    anova(M5.odd,
          update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.2) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.3.Gene <-
    anova(update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)),
          M5.odd)
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.4.Gene <-
    anova(M5.odd,
          update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.4) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.5.Gene <-
    anova(M5.odd,
          update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.5) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Gender.Gene <-
    anova(update(M5.odd, . ~ . - (1 | Gender:Gene)), M5.odd)
## refitting model(s) with ML (instead of REML)
av.5.odd$Gender <- anova(M5.odd, update(M5.odd, . ~ . + (1 | Gender)))
## refitting model(s) with ML (instead of REML)
# note that the next one is an addition to M5.odd
av.5.odd$Dx.Gene <-
    anova(M5.odd, update(M5.odd, . ~ . + (1 | Dx:Gene)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Dx <- anova(M5.odd, update(M5.odd, . ~ . + (1 | Dx)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Gene <-
    anova(update(M5.odd, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (-1 + scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5.odd)
## refitting model(s) with ML (instead of REML)
av.5.odd$Age <- anova(M5.odd, update(M5.odd, . ~ . + scale(Age)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.1 <- anova(M5.odd, update(M5.odd, . ~ . + scale(Ancestry.1)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.2 <- anova(M5.odd, update(M5.odd, . ~ . + scale(Ancestry.2)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.3 <- anova(M5.odd, update(M5.odd, . ~ . + scale(Ancestry.3)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.4 <- anova(M5.odd, update(M5.odd, . ~ . + scale(Ancestry.4)))
## refitting model(s) with ML (instead of REML)
av.5.odd$Ancestry.5 <- anova(M5.odd, update(M5.odd, . ~ . + scale(Ancestry.5)))
## refitting model(s) with ML (instead of REML)
av.5.even <- list()
av.5.even$Age.Gene <-
    anova(update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5.even)
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.1.Gene <-
    anova(update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.3) | Gene)),
          M5.even)
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.2.Gene <-
    anova(M5.even,
          update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.2) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.3.Gene <-
    anova(update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) | Gene)),
          M5.even)
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.4.Gene <-
    anova(M5.even,
          update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.4) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.5.Gene <-
    anova(M5.even,
          update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.5) + scale(Ancestry.3) | Gene)))
## refitting model(s) with ML (instead of REML)
av.5.even$Gender.Gene <-
    anova(update(M5.even, . ~ . - (1 | Gender:Gene)), M5.even)
## refitting model(s) with ML (instead of REML)
av.5.even$Gender <- anova(M5.even, update(M5.even, . ~ . + (1 | Gender)))
## refitting model(s) with ML (instead of REML)
# note that the next one is an addition to M5.even
av.5.even$Dx.Gene <-
    anova(M5.even, update(M5.even, . ~ . + (1 | Dx:Gene)))
## refitting model(s) with ML (instead of REML)
av.5.even$Dx <- anova(M5.even, update(M5.even, . ~ . + (1 | Dx)))
## refitting model(s) with ML (instead of REML)
av.5.even$Gene <-
    anova(update(M5.even, . ~ . - (scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene) + (-1 + scale(Age) + scale(RIN) + scale(Ancestry.1) + scale(Ancestry.3) | Gene)),
          M5.even)
## refitting model(s) with ML (instead of REML)
av.5.even$Age <- anova(M5.even, update(M5.even, . ~ . + scale(Age)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.1 <- anova(M5.even, update(M5.even, . ~ . + scale(Ancestry.1)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.2 <- anova(M5.even, update(M5.even, . ~ . + scale(Ancestry.2)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.3 <- anova(M5.even, update(M5.even, . ~ . + scale(Ancestry.3)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.4 <- anova(M5.even, update(M5.even, . ~ . + scale(Ancestry.4)))
## refitting model(s) with ML (instead of REML)
av.5.even$Ancestry.5 <- anova(M5.even, update(M5.even, . ~ . + scale(Ancestry.5)))
## refitting model(s) with ML (instead of REML)

The less significant p-values for the even gene subset compared to the odd gene subset are partly explained by fewer observations (more missing values) in the even subset, and partly by the fact that the even subset is less variable in terms of e.g. age dependence.

## [1] "number of non-NA observations"
##  odd even 
## 4347 3866
## [1] "number of total observations in subsets"
##  odd even 
## 8685 8685
## [1] "fraction of non-NA observations"
##       odd      even 
## 0.5005181 0.4451353
write.csv(summarize.anova(av.5.odd), "../../results/anova-mixed-M5.odd.csv")
print.av(summarize.anova(av.5.odd))
##                 Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Age.Gene            -11.8      20.1     21.8     5    5.8e-04
## Ancestry.1.Gene     -40.1      -8.2     50.1     5    1.3e-09
## Ancestry.2.Gene       5.6      43.8      6.4     6    3.8e-01
## Ancestry.3.Gene     -13.3      18.5     23.3     5    2.9e-04
## Ancestry.4.Gene      11.6      49.9      0.4     6        1.0
## Ancestry.5.Gene       9.0      47.3      3.0     6    8.1e-01
## Gender.Gene          -2.2       4.2      4.2     1    4.0e-02
## Gender                2.0       8.4      0.0     1        1.0
## Dx.Gene               1.9       8.2      0.1     1    7.1e-01
## Dx                    2.0       8.4      0.0     1        1.0
## Gene                -61.2     -29.3     71.2     5    5.7e-14
## Age                  -0.0       6.4      2.0     1    1.6e-01
## Ancestry.1           -0.4       6.0      2.4     1    1.2e-01
## Ancestry.2            1.4       7.8      0.6     1    4.4e-01
## Ancestry.3            1.7       8.1      0.3     1    5.9e-01
## Ancestry.4           -1.0       5.4      3.0     1    8.2e-02
## Ancestry.5            1.3       7.6      0.7     1    3.9e-01
write.csv(summarize.anova(av.5.even), "../../results/anova-mixed-M5.even.csv")
print.av(summarize.anova(av.5.even))
##                 Delta.AIC Delta.BIC    Chisq    df      p.Chi
## Age.Gene              5.1      36.4      4.9     5    4.3e-01
## Ancestry.1.Gene     -18.5      12.8     28.5     5    2.9e-05
## Ancestry.2.Gene       8.6      46.2      3.4     6    7.6e-01
## Ancestry.3.Gene       6.0      37.3      4.0     5    5.5e-01
## Ancestry.4.Gene       5.5      43.1      6.5     6    3.7e-01
## Ancestry.5.Gene       8.1      45.7      3.9     6    6.9e-01
## Gender.Gene           0.1       6.4      1.9     1    1.7e-01
## Gender                0.7       6.9      1.3     1    2.5e-01
## Dx.Gene              -0.0       6.2      2.0     1    1.6e-01
## Dx                    2.0       8.3      0.0     1        1.0
## Gene                -59.2     -27.9     69.2     5    1.5e-13
## Age                   2.0       8.2      0.0     1    8.6e-01
## Ancestry.1            1.8       8.1      0.2     1    6.6e-01
## Ancestry.2            1.9       8.1      0.1     1    7.1e-01
## Ancestry.3            1.6       7.9      0.4     1    5.4e-01
## Ancestry.4            2.0       8.3      0.0     1    9.5e-01
## Ancestry.5            1.7       8.0      0.3     1    6.0e-01

Summary

Saving model formulas

Save all model formulas in the results directory

write.csv(data.frame(lapply(list(M1 = M1, M1b = M1b, M2 = M2, M3 = M3, M4 = M4, M5 = M5, M6 = M6),
                            function(x) as.character(formula(x)))),
          file = "../../results/M-formulas.csv", row.names = FALSE)