What is the precise meaning of the sentence below, taken from Ifat’s manuscript?

The black portion of each bar represents the fraction of individuals with evidence for biallelic expression (where the observed Sg is close to 0.5 and the 95% confidence interval given the observed read counts does not include 0.7)

My reading

The intention must have been to test if gene g is biallelically expressed in individual i. But the sentence in question may be understood in two ways (Interpretation A and B). Before discussing those cases I start with an assumption that was made (personal communication with Andy).

Binomial assumption

For individual i and gene g, the statistic Sig is assumed to be distributed as Binom(nig,θig). The two parameters in this binomial distribution are the observed total read count nig and θig, which is the expected proportion of higher read counts in nig. Note that θig is a parameter, an unobservable ( theoretical) quantity, while its counterpart Sig/nig is a statistic, an observable quantity, and that parameters have confidence intervals while statistics prediction intervals.

Interpretation A: Classical hypothesis testing

In this case we have the null hypothesis H0 that θig=1/2. H0 models an ideally behaved (perfectly balanced) biallelically expressing gene. Then, what was really meant instead of confidence interval for some parameter was indeed the prediction interval for presumably the Sig/nig statistic. The 95 % prediction interval provides us a rule for testing H0: if the observed sig/nig falls in that interval we accept H0 otherwise we reject it at significance level 0.05.

The 95 % prediction interval can be calculated as follows. Let Fig be the cumulative distribution function (CDF) of the Binom(nig,1/2) distribution and let F1ig be the quantile function (inverse CDF). Then the 95 % prediction interval may be given as [F1ig(0.025),F1ig(0.975)]. However, independently of H0 we are certain that Sig/nig1/2 due to the definition of Sig. Thus the prediction interval that expresses this certainty is [1/2,F1ig(0.95)].

Interpretation B: Hypothesis testing based on confidence interval

If indeed confidence interval was meant in the sentence that would obviously be an interval for θig, which is treated as unknown unlike in case A where we hypothesized that θig=1/2. The value 0.7 can be considered as the threshold that divides the null hypothesis 0.5θig0.7 (biallelic expression) from the alternative hypothesis 0.7<θig1 (monoallelic expression). Then a sensible rule may be to

  1. accept the null hypothesis when the confidence interval is contained in [0.5,0.7) and mark individual i black
  2. accept the alternative hypothesis when it is (0.7,1] and mark i with a second color
  3. otherwise, accept neither hypotheses i.e remain “ignorant” and mark i with a third color, say gray

Note that only in the 3rd case is 0.7 contained in the confidence interval so that under both the null and alternative hypothesis “the confidence interval does not include 0.7”, as the sentence prescribes, which however makes little sense. Perhaps what was meant in the original sentence is that the confidence interval [0.5,0.7), I wonder.

Calculation of confidence intervals may be done various ways in general. For the θig the best way is to use the fact that the log likelihood is asymptotically chi-square distributed.

Conclusion

Interpretation A and B both conform the sentence in question. They have quite different interpretations, though, and involve very different calculations. What semantics the sentence really has should turn out from the Excel formula that was used to calculate the “black bars”.