This is a discussion on Ifat’s draft in order to complete the research project.
Overall
- What are the most interesting findings?
Andy: (1) There are seem to be much fewer imprinted genes as some prev. studies suggested (Gregg,…, Dulac). (2) Age dependence of imprinting. (3) First(?) study using human samples.
- What needs further analysis?
- regression on age of death: model selection?
- Error rates for calling monoallelic expression necessary?
note: We started discussing the necessity of error control for supporting the first conclusion above. I will write a more formal account on error control soon.
Regression
, where is the age of death and the null hypothesis is , i.e. age has no impact on imprinting. seems to be a variable that in some way aggregates over 8 or 13 genes . But what is the definition of ? What kind of aggregation is it (summation, pooling,…)?
note: We looked at Ifat’s R script for regression analysis and the definition of LOI_R. When I get access to her files on the other server I’ll look at them.
Error rates
The manuscript provides no error rates for the classification of genes as mono or biallelically expressed.
- frequentist approach
- -values based on the null distribution of
- FDR control based on estimate of fraction of monoallelically expressed genes
- Bayesian approach
- probabilities of mono/biallelic expression:
- prior prob. based on a. estimate of fraction of monoallelically expressed genes b. distance from known imprinted genes (further extension: HMM)
- posterior given a. expression data (RNA-seq) b. genotype data (SNP-array) c. likelihood for based on d. prior
Notes
In the frequentist approach we only need the likelihood function for biallelic expression whereas in the Bayesian one we also need that for (and the prior , of course).
Andy: permutation-derived null distribution of seems preferable instead of binomial assumption
The form of likelihood depends on the dependency structure of the following variables:
Error of genotype calling
A different kind of error rates is provided: error for calling genotypes (Figures error rate 1 and error rate 2.
Andy: discordant call is when RNA-seq suggests monoallelic expression and the Chip-array suggests heterozygosity
Figure error rate 1.
- Does
AB
mean heterozygosity for a given SNP? all calls
: within an individual? For all individuals?- How was
probability for AB
calculated? What ischip
? - What are
err_AB_100
anderr_AB_7
?
Andy: the acceptable minimum number of fragments covering a given SNP.
Figure error rate 2.
- What are
data points
? - What are numbers (1 to 100) next to plotted symbols?
Association of HLA genes to schizophrenia
Nonsignificant tendency for HLA-DQB1 was found. Is it worth to follow up?
Andy: not really worth it.
Imputation of HLA types uses two sources of info
- SNPs (HIBAG)
- RNA-seq (PHLAT)