What is the precise meaning of the sentence below, taken from Ifat’s manuscript?

The black portion of each bar represents the fraction of individuals with evidence for biallelic expression (where the observed is close to 0.5 and the 95% confidence interval given the observed read counts does not include 0.7)

My reading

The intention must have been to test if gene is biallelically expressed in individual . But the sentence in question may be understood in two ways (Interpretation A and B). Before discussing those cases I start with an assumption that was made (personal communication with Andy).

Binomial assumption

For individual and gene , the statistic is assumed to be distributed as . The two parameters in this binomial distribution are the observed total read count and , which is the expected proportion of higher read counts in . Note that is a parameter, an unobservable ( theoretical) quantity, while its counterpart is a statistic, an observable quantity, and that parameters have confidence intervals while statistics prediction intervals.

Interpretation A: Classical hypothesis testing

In this case we have the null hypothesis that . models an ideally behaved (perfectly balanced) biallelically expressing gene. Then, what was really meant instead of confidence interval for some parameter was indeed the prediction interval for presumably the statistic. The 95 % prediction interval provides us a rule for testing : if the observed falls in that interval we accept otherwise we reject it at significance level .

The 95 % prediction interval can be calculated as follows. Let be the cumulative distribution function (CDF) of the distribution and let be the quantile function (inverse CDF). Then the 95 % prediction interval may be given as . However, independently of we are certain that due to the definition of . Thus the prediction interval that expresses this certainty is .

Interpretation B: Hypothesis testing based on confidence interval

If indeed confidence interval was meant in the sentence that would obviously be an interval for , which is treated as unknown unlike in case A where we hypothesized that . The value 0.7 can be considered as the threshold that divides the null hypothesis (biallelic expression) from the alternative hypothesis (monoallelic expression). Then a sensible rule may be to

  1. accept the null hypothesis when the confidence interval is contained in and mark individual black
  2. accept the alternative hypothesis when it is and mark with a second color
  3. otherwise, accept neither hypotheses i.e remain “ignorant” and mark with a third color, say gray

Note that only in the 3rd case is 0.7 contained in the confidence interval so that under both the null and alternative hypothesis “the confidence interval does not include 0.7”, as the sentence prescribes, which however makes little sense. Perhaps what was meant in the original sentence is that the confidence interval , I wonder.

Calculation of confidence intervals may be done various ways in general. For the the best way is to use the fact that the log likelihood is asymptotically chi-square distributed.

Conclusion

Interpretation A and B both conform the sentence in question. They have quite different interpretations, though, and involve very different calculations. What semantics the sentence really has should turn out from the Excel formula that was used to calculate the “black bars”.