Tuesday, 22 March 2011

FAQ-SNPs missing when called with more samples

Using mpileup called with 2 different samples, Im getting this
particular SNP which has a gd coverage (DP=55) only in one particular
sample. This is good.

However when mpileup was called with 10 samples, the SNP got lost. Im
just trying to figure out if the SNP got 'drowned' out by the other 9
samples which doesnt have the SNP and hence, wasnt called. Is this how
mpileup works

Excellent answer by Heng Li 

With more samples, you gain power on SNPs shared between samples, but lose power on singleton SNPs. Here is a way of thinking of this. Suppose we have 1% false positive rate (FPR) for one sample. If we call SNPs from 10 samples separately and then combine the calls, the FPR would be around 5% (not 10% because more SNPs are found given 10 samples). To retain a low FPR on singletons we have to be more stringent. Nonetheless, with more samples, we can usually get overall better calls than calling SNPs in each sample separately because information between samples is used more effectively.

No comments:

Post a Comment

Datanami, Woe be me