The R packages limma and voom have been developed for analysis of RNA-seq read count data, whereby RNA-seq artifacts are modeled successfully, as it was demonstrated. Here I consider employing limma and voom for our study.

Background

Ifat’s initial analysis (stage 1) used , the rank transformed statistic averaged over 8 known imprinted genes as the response variable of a normal linear model. My subsequent analysis extended this framework in several ways, fitting 88 various multiple regression models (and 88 additional simple regression models). These extensions include

My analysis (referred to here as stage 2) confirmed Ifat’s result (stage 1) suggesting loss of the 8 genes’ imprinting with age when they are averaged and added some new results:

Some of the remaining questions:

The limma package with its recent voom extension appears a powerful tool to answer the above questions.

limma and voom

The figure, taken from the 2015 limma paper, showcases the features of limma: limma F1

Its voom extension takes care of two challenging features of RNA-seq data: strong heteroscedasticity and overdispersion (relative to the Poisson/binomial distributions). voom “unlocks” the versatile and powerful toolbox of limma by estimating the precision – the inverse error variance – for each observation (i.e. for each pair of tissue sample and gene ) and using that precision as weight in the normal linear model. Here is the illustration of how precision for two observations is obtained by voom.

Adoption to the present study

To make the data accessible to limma and voom, allele-specific read counts should be used instead of the statistic derived from those. This means that for each individual and gene we would have at least one pair of read counts: one for parental allele and another for allele . limma could then be used to detect differential expression between the two alleles.

Let be all heterozygous SNP variant sites that occur in individual and gene and let indicate the event that the reference variant is maternal. Because that event is random (unless we have prior knowledge that is imprinted maternally or paternally), there are allele configurations for . A rule – used in the definition of – that could reduce the number of configurations to 1 would be summing the higher read counts (and the lower read counts) over all .

This table compares limma (potential stage 3 analysis) with two modeling approaches nlm.R and logi.S from stage 1 and 2, respectively.

  nlm.R logi.S limma + voom
response R: rank transformed S S
precision weights / overdispersion no / no yes / no yes / yes
gene-gene variability w/o borrowing of strength w/o borrowing of strength yes
borrowing of strength (genes) w/o variability w/o variability yes
age dependence H_0
effect size / interpretation no yes / difficult yes / simple
overall parental bias H_0 no and
genome-wise applicability yes ? yes
proportion of H_0 no ? yes
gene set testing no no yes