Monday, 10 December 2012

Gabe Rudy "GATK is a Research Tool. Clinics Beware." | Our 2 SNPs…(R)

Gabe points out in great detail a bug he found in GATK's variant caller which has be widely regarded as a reliable SNP caller. 

I think in general the 'unreliable' nature of next gen seq data has researchers often seeking multiple sources of confirmation for variants before moving to publication. 

though I am frankly surprised that GATK turned up an error but as Gabe points out it might be common to find Heisen Bugs in software

and it's a poignant reminder that DTC genetic testing needs more work to avoid mistakes like these that might be detrimental to personalised medicine 

"But my scary homozygous insertion (row 2) shows 153 reference bases and no reads supporting the insertion. Yet it was still called a homozygous variant!
I promptly sent an email off to 23andMe's exome team letting them know about what is clearly a bug in the GATK variant caller. They confirmed it was a bug that went away after updating to a newer release. I talked to 23andMe's bioinformatician behind the report face-to-face a bit at this year's ASHG conference, and it sounds like it was most likely a bug in the tool's multi-sample variant calling mode as this phantom insertion was a real insertion in one of the other samples.
Since there were 8,242 other InDels that match this pattern, I am most likely not looking at random noise but real "leaked" variants from other members of the 23andMe Exome Pilot Program. (Edit: After some analysis with a fixed version of GATK, Eoghan from 23andMe found that these genotypes where not leaked from other samples but completely synthetic.)" 

No comments:

Post a Comment

Datanami, Woe be me