From Ravi’s email to Andy (January 27, 2017 at 12:35:30 AM EST)

Broadly, I think expanding captions of figures would help, I sometimes had trouble understanding the message from a graph/plot. also, the fits to various parameters could probably be moved to supplementary (at least in cases where the parameters, like age, seemed to not have a predictive value). and maybe a conclusion section would be useful (I think some journals also require it).

I also had questions on various numbers etc. and some suggestions which are embedded in the pdf.

Turning to his embedded comments in the edited PDF… p1:

one broad comment on figure captions. I feel that each figure should have a take-home message, and reading the captions, I did not always get an idea of what the message/point of the figure was.

Specific

More embedded comments:

p6

“Then, we removed PCR replicates using the Samtools rmdup utility; around one third of the reads mapped (which is expected, given the parameters we used and the known high repeat content of the human genome).”

2/3 did not map, that seems to be a bit high. How many reads were ribosomal RNA ? and excluding that what percentage did not map. Most mRNA should be repeat-free

p6

“Using the BCFtools utility of Samtools, we called SNPs (SNVs only, no indels). Then, we invoked a quality filter requiring a Phred score > 20 (corresponding to a probability for an incorrect SNP call < 0.01).”

how many got filtered out with this criterion ? most bases should have phred > 30 ?

p7

“Read depth was measured using the Samtools Pileup utility. After these filters were applied, 364,509 SNPs remained in 22,254 genes.”

probably less that 10K genes had sufficient expression to allow SNP calls ?

p7

“We calibrated the imputation parameters to find a reasonable balance between the number of genes assessable for allelic bias and the number false positive calls since the latter can arise if a SNP is incorrectly called heterozygous.”

what are the results if imputed SNPS are not used ?

p13, Fig 3 legend

maybe genes that would not be on this list without the imputed SNPs should be highlighted in a different color ?

p15, Fig 4 legend

is there a message to this set of graphs ? I was not sure what message I should take hom from this.

p22: Discussion

It seems like the fits did not really find any variable that was predictive. In which case why show all the figures (maybe they can go into supplementary data)

p24

“In conclusion,…”

would it help to have a Conclusion section ? I think that would help wrap up the message for the reader His embedded comments: