Kevin's GATTACA World: April 2016

Friday, 29 April 2016

FDA launching the second precisionFDA challenge.

The challenge begins with two precisionFDA-provided input datasets, corresponding to whole-genome sequencing of the HG001 (NA12878) and HG002 (NA24385) human samples. Both samples were sequenced under similar sequencing conditions and instruments, at the same sequencing site. Your mission is to process these two FASTQ datasets through your mapping and variation calling pipeline and create VCF files. You can generate those results on your own environment, and upload them to precisionFDA, or you can reconstruct your pipeline on precisionFDA and run it there. Regardless of how you generate your VCF files, you will subsequently submit them as your entry to the challenge.

For HG002, the truth data will not be known during the challenge. After submissions close on May 26, GiaB will publish their reference VCF file for HG002. The precisionFDA team will then run and publish comparisons between each contestant’s HG002 VCF file and the GiaB HG002 reference VCF. This will publicly reveal how similar is each result to the GiaB HG002 reference.

For HG001, the reference VCF is already available. You are therefore asked to conduct a comparison between your VCF and the GiaB HG001 (NA12878) reference VCF, and include it in your submission entry, for the following reasons:

to ensure that your VCF files are compatible with the comparison process (remember that we won’t be able to check on your HG002 VCF until after the end of submissions, so you are using your HG001 VCF as a check that your files can be compared without issues)
for the community to be able to contrast your performance on a previously known sample (HG001) versus a previously unknown (HG002), and to evaluate any overfitting on HG001

Your entry to the challenge comprises your submitted HG001 and HG002 VCFs, your submitted HG001 comparison, and the HG002 comparison conducted by precisionFDA. Each comparison outputs several metrics (such as precision*, recall*, f-measure, or number of common variants). Selected participants and winners** will be recognized on the precisionFDA website. Therefore, we hope you are willing to share your experience with others to further enhance the community's effort to ensure accuracy and consistency of tests.

The challenge runs until May 26, 2016.

Source: https://precision.fda.gov/challenges/truth

Wednesday, 27 April 2016

Free Imputation servers

Free imputation servers will allow anyone to use the full haplotype reference panel to impute missing genotypes in their data. Users will be able to upload (pre-phased or unphased) genotype data to the server. Imputation will be carried out remotely on the server, and the imputed data will then be made available to the user.

Prototype imputation servers are already available at

https://imputation.sanger.ac.uk/

https://imputationserver.sph.umich.edu/

A prototype phasing server for phasing high coverage sequenced samples is available at

https://phasingserver.stats.ox.ac.uk/

Source: http://www.haplotype-reference-consortium.org/data-access

Wednesday, 13 April 2016

Best Disclaimer on sharing personal genetic information thus far

My vote goes to https://opensnp.org/signup

This sentence sums it all There is zero privacy anyway, get over it

copied from the above URL

By signing up for openSNP you declare that you have understood the possible risks and side-effects that can occur by making your genetical and medical information available on this platform. In short:

Data uploaded to the internet can not be fully deleted, there may always be a backup somewhere
By publishing data you expose information about you and your next of kin worldwide
Genetic and medical information can be used by employers, insurance companies and the government to know more about you than you would like
new findings about your genotypes can be negative

What has been seen can not be unseen

You agree that all data you upload to openSNP will be freely available online (well, except your mail-address and password) under a Creative Commons Zero license. The data can be viewed and downloaded through this webpage, RSS-feeds, in future maybe via an API and via FTP. Although you can delete your data from openSNP this does not guarantee that no one else did already create a backup of the data (who may re-publish the data somewhere else).

There is zero privacy anyway, get over it

Although you can upload your data using a pseudonym, there is no way to anonymously submit data. Statistically speaking it is really unlikely that your medical and genetic information matches that of someone else. By uploading you do not only disclose information about yourself, but also about your next kinship (parents and siblings), that shares half of a genome with you. Before uploading any genetical data you should make sure that those people approve of you doing so. This is especially important if you have monozygotic twin, who shares all of your genome!

Jobs, insurance, the government

Medical and genetic data can be used to discriminate people. Due to medical or genetic information an employer may not give you a job, an insurance company may request higher payments and who knows what any evil™ government will do with your data? Although some countries have laws against genetic discrimination, these laws certainly will not cover possible discrimination scenarios and could change in the future. Again: These are side effects and risks which also can apply to your kinship, if you chose to upload this information.

Knowledge about genes and SNPs is not static

Nearly every week there are new scientific publications that find new associations between certain traits (like diseases) with existing genetic information. Because of this you should not publish your data just because it currently looks harmless and unsuspicious. It may be true that your genotyping data is of no greater interest for your employer, your insurance company or the government right now, but this can easily change (Remember: One of the reasons to upload your data here in the first place is, to enable everyone to find such new associations).

Think of the hypothetical SNP rs666. One day after you upload your genotyping-data to this website, a new publication finds that your genotype at rs666 will give you, your siblings and your parents a fatal disease that will most certainly strike all of you. Due to this disease you (and you kin) may lose your jobs and your insurance. Chances for a association of this kind may be small, but by uploading the data you are nonetheless taking this risk!

Spam-accounts

Accounts which only serve advertising will be deleted.

Kevin's GATTACA World