This analysis compares our findings on the biological regulation of parental expression bias to those published by Perez et al (eLife 2015;4:e07860), who presented and applied the BRAIM model to RNA-seq data from the cerebellum of P8 and P60 hybrid mice.

Preparation

Load functions

source("~/projects/monoallelic-brain/src/import-data.R")
source("~/projects/monoallelic-brain/src/fit-glms.R")
source("~/projects/monoallelic-brain/src/graphics.R")

Orthology: matching human and mouse genes

Import “associated names” (symbols) for the selected human genes:

h.g.names <- unlist(read.csv("../../data/genes.regression.new", as.is = TRUE))
names(h.g.names) <- h.g.names

Import results by Perez et al provided in the first sheet of Supplementary file 1.

perez <- read.csv("../../data/elife-07860/elife-07860-supp1-v2.csv", as.is = TRUE)

Using AnnotationDbi

library(AnnotationDbi)
library("org.Hs.eg.db")
## Error in library("org.Hs.eg.db"): there is no package called 'org.Hs.eg.db'
library("org.Mm.eg.db")
## Error in library("org.Mm.eg.db"): there is no package called 'org.Mm.eg.db'
library("hom.Hs.inp.db")

An initial attempt with the AnnotationDbi::inpIDMapper function retrieved the id of only 3 mouse orthologs for the 30 selected genes. As a result of this frustrating result the following conversion hack was done. Mouse gene names were “humanified” by all-capitalization, i.e. using the rule: Mest (mouse) -> MEST (human). The detailed operations follow:

Convert for the 30 selected human genes the gene names (stored in h.g.names) into Ensemble gene ids. Then print the name of those genes for which the conversion failed.

human.eg.ids <- unlist(as.list(org.Hs.egSYMBOL2EG[mappedkeys(org.Hs.egSYMBOL2EG)])[h.g.names])
## Error in as.list(org.Hs.egSYMBOL2EG[mappedkeys(org.Hs.egSYMBOL2EG)]): object 'org.Hs.egSYMBOL2EG' not found
# the conversion failed for these genes:
names(h.g.names[! h.g.names %in% names(human.eg.ids)])
## Error in h.g.names %in% names(human.eg.ids): object 'human.eg.ids' not found

Import all mouse gene ids and symbols, do the conversion hack and check which human genes could not be converted:

# get all mouse genes
mouse.all <- as.list(org.Mm.egSYMBOL2EG[mappedkeys(org.Mm.egSYMBOL2EG)])
## Error in as.list(org.Mm.egSYMBOL2EG[mappedkeys(org.Mm.egSYMBOL2EG)]): object 'org.Mm.egSYMBOL2EG' not found
# "humanify" mouse gene names by all-capitalization, i.e. using the rule Mest (mouse) -> MEST (human)
names(mouse.all) <- toupper(names(mouse.all))
## Error in toupper(names(mouse.all)): object 'mouse.all' not found
# convert mouse gene names (i.e. symbols) to upper case and check for non-matching human gene symbols
names(h.g.names[! h.g.names %in% toupper(as.character(perez$gene_name))])
##  [1] "TMEM261P1"     "SNHG14"        "AL132709.5"    "RP11-909M7.3" 
##  [5] "ZIM2"          "PWAR6"         "FAM50B"        "SNURF"        
##  [9] "KCNQ1OT1"      "SNORD116-20"   "RP13-487P22.1" "ZNF331"       
## [13] "hsa-mir-335"   "DIRAS3"        "PWRN1"         "NLRP2"

This result shows that 16 human genes could not be matched to any mouse ortholog. Below is a more principled and therefore more trusted approach using BioMart.

Using BioMart

This has been done with the web-based BioMart tool of Ensemble, which produced the following file

h2m <- read.csv("../../data/human-mouse-orthology.csv", as.is = TRUE)
str(h2m)
## 'data.frame':	37 obs. of  5 variables:
##  $ Ensembl.Gene.ID                           : chr  "ENSG00000162595" "ENSG00000145945" "ENSG00000106070" "ENSG00000167244" ...
##  $ Mouse.Ensembl.Gene.ID                     : chr  "" "ENSMUSG00000038246" "ENSMUSG00000020176" "ENSMUSG00000048583" ...
##  $ Mouse.orthology.confidence..0.low..1.high.: int  NA 1 1 1 1 1 NA 1 NA 1 ...
##  $ Mouse.associated.gene.name                : chr  "" "Fam50b" "Grb10" "Igf2" ...
##  $ Associated.Gene.Name                      : chr  "DIRAS3" "FAM50B" "GRB10" "IGF2" ...
# remove rows without mouse ortholog; also remove the column "Mouse.orthology.confidence..0.low..1.high."
h2m <- h2m[h2m$Mouse.Ensembl.Gene.ID != "", -3]
nrow(h2m) == length(unique(h2m$Mouse.associated.gene.name))
## [1] TRUE
rownames(h2m) <- h2m$Mouse.associated.gene.name
h2m[-1 * 1:2]
##         Mouse.associated.gene.name Associated.Gene.Name
## Fam50b                      Fam50b               FAM50B
## Grb10                        Grb10                GRB10
## Igf2                          Igf2                 IGF2
## Inpp5f                      Inpp5f               INPP5F
## Kcnk9                        Kcnk9                KCNK9
## Magel2                      Magel2               MAGEL2
## Mest                          Mest                 MEST
## Nap1l5                      Nap1l5               NAP1L5
## Ndn                            Ndn                  NDN
## Nlrp2                        Nlrp2                NLRP2
## Peg10                        Peg10                PEG10
## Peg3                          Peg3                 PEG3
## Snrpn                        Snrpn                SNRPN
## Gm38393                    Gm38393                SNURF
## Snurf                        Snurf                SNURF
## Ube3a                        Ube3a                UBE3A
## Zdbf2                        Zdbf2                ZDBF2

Results

The results of Perez et al

Bringing the data table into a long format for plotting:

vname.pp <- paste0("posterior_mean.pp_", fc <- c("parent", "sex", "cross", "age"), "_effect")
vname.beta <- paste0("posterior_mean.", fc,  "_effect")
perez.l <- # long format
    reshape(perez[c("gene_name", vname.pp, vname.beta)], # extract variables of interest
            v.names = c("P", "beta"),
            direction = "long", varying = list(vname.pp, vname.beta), timevar = "predictor", times = fc, idvar = "id1")

Plotted are posterior means of two parameters of the BRAIM model:

  1. , the regression coefficient mediating the effect of predictor for the parental bias of gene ; this is called posterior_mean.age_effect, posterior_mean.sex_effect,…, in the first sheet of Supplementary file 1 provided by Perez et al.
  2. , the probability that explanatory variable has large effect on parental bias of (i.e. induces a larger variance of ); this is called posterior_mean.pp_age_effect, posterior_mean.pp_sex_effect,…,
xyplot(P ~ beta | predictor, data = perez.l, scales = list(x = list(relation = "free")), pch = ".",
       xlab = expression(paste("E[ ", beta, " | data]")), ylab = "E[ P | data]",
       sub = "Perez et al, using BRAIM model",
       main = "Estimates from mouse cerebellum")

plot of chunk perez-p-vs-beta

This plot shows that the posterior mean of a regression coefficient need not differ greatly from zero even if provides a strong evidence that predictor significantly influences gene . This follows from the property of the BRAIM model that controls only the variance of but not its mean.

Comparison between the two studies

The following expressions focus only two predictors of parental bias: age and gender because only these two are clearly shared by our human study and the mouse study by Perez et al.

Data manuiplations

perez.s <- perez[perez$gene_name %in% h2m$Mouse.associated.gene.name, # rows: orthologs of selected human genes
                 c("gene_name", # informative columns
                   "posterior_mean.age_effect", "posterior_mean.pp_age_effect",
                   "posterior_mean.sex_effect", "posterior_mean.pp_sex_effect")]
perez.s <-
    do.call(rbind,
            lapply(unique(perez.s$gene_name),
                   function(n) {
                       df <- perez.s[ perez.s$gene_name == n, ]
                       cbind(data.frame(mouse.gene_name = n),
                             data.frame(t(as.matrix(apply(df[ , -1], 2, mean, na.rm = TRUE)))))
                   }))
perez.s$mouse.gene_name <- as.character(perez.s$mouse.gene_name)
rownames(perez.s) <- h2m[perez.s$mouse.gene_name, "Associated.Gene.Name"]

Combine evidence for the effect of age and gender from both our human study and the mouse study by Perez et al. The two types of evidence are:

  1. p-value for rejecting under the wnlm.Q or logi.S model, calculated either from normal distribution theory or a random permutation test
  2. the posterior mean
# Age
combined.a <- cbind(extractor("wnlm.Q", coef = "Age"),
                  extractor("logi.S")[-1 * c(1:2)])
combined.a <- cbind(combined.a,
                  expander("posterior_mean.age_effect", "beta.post.mean", df.long = combined.a),
                  expander("posterior_mean.pp_age_effect", "pp.post.mean", df.long = combined.a))
# Gender
combined.g <- cbind(extractor("wnlm.Q", coef = "GenderMale"),
                  extractor("logi.S")[-1 * c(1:2)])
combined.g <- cbind(combined.g,
                  expander("posterior_mean.sex_effect", "beta.post.mean", df.long = combined.g),
                  expander("posterior_mean.pp_sex_effect", "pp.post.mean", df.long = combined.g))
combined <- rbind(combined.a, combined.g)

The main result

Plotted below is from Perez et al against , the p-value, from our study.

xyplot(pp.post.mean ~ - log10(p.val.t.dist.wnlm.Q) | Coefficient, data = combined, groups = Gene,
       panel = function(x, y, ..., groups) {
           panel.text(x, y, labels = groups, cex = 0.7, ...)
       },
       strip = strip.custom(factor.levels = c("Age", "Gender")),
       xlim = c(-0.2, 2.5),
       #aspect = 1,
       main = "Evidence for significant effect",
       xlab = expression(paste(log[10], p, ",  present work")),
       ylab = "E[ P | data],  Perez et al"
       )

plot of chunk posterior-pp-vs-pval-wnlm-Q

Clearly, the evidence from the mouse and human study for any gene do not match each other. While there is evidence for some genes from both studies that age has an effect, only our human study detected significant effect of gender. These results suggests large differences between the studies, which are methodological and/or biological in nature. In particular: