Friday, 29 April 2016

FDA launching the second precisionFDA challenge.

PrecisionFDA Truth Challenge

The challenge begins with two precisionFDA-provided input datasets, corresponding to whole-genome sequencing of the HG001 (NA12878) and HG002 (NA24385) human samples. Both samples were sequenced under similar sequencing conditions and instruments, at the same sequencing site. Your mission is to process these two FASTQ datasets through your mapping and variation calling pipeline and create VCF files. You can generate those results on your own environment, and upload them to precisionFDA, or you can reconstruct your pipeline on precisionFDA and run it there. Regardless of how you generate your VCF files, you will subsequently submit them as your entry to the challenge.
For HG002, the truth data will not be known during the challenge. After submissions close on May 26, GiaB will publish their reference VCF file for HG002. The precisionFDA team will then run and publish comparisons between each contestant’s HG002 VCF file and the GiaB HG002 reference VCF. This will publicly reveal how similar is each result to the GiaB HG002 reference.
For HG001, the reference VCF is already available. You are therefore asked to conduct a comparison between your VCF and the GiaB HG001 (NA12878) reference VCF, and include it in your submission entry, for the following reasons:
  1. to ensure that your VCF files are compatible with the comparison process (remember that we won’t be able to check on your HG002 VCF until after the end of submissions, so you are using your HG001 VCF as a check that your files can be compared without issues)
  2. for the community to be able to contrast your performance on a previously known sample (HG001) versus a previously unknown (HG002), and to evaluate any overfitting on HG001
Your entry to the challenge comprises your submitted HG001 and HG002 VCFs, your submitted HG001 comparison, and the HG002 comparison conducted by precisionFDA. Each comparison outputs several metrics (such as precision*, recall*, f-measure, or number of common variants). Selected participants and winners** will be recognized on the precisionFDA website. Therefore, we hope you are willing to share your experience with others to further enhance the community's effort to ensure accuracy and consistency of tests.
The challenge runs until May 26, 2016.

Wednesday, 27 April 2016

Free Imputation servers

Free imputation servers will allow anyone to use the full haplotype reference panel to impute missing genotypes in their data.  Users will be able to upload (pre-phased or unphased) genotype data to the server. Imputation will be carried out remotely on the server, and the imputed data will then be made available to the user. 

Prototype imputation servers are already available at 

A prototype phasing server for phasing high coverage sequenced samples is available at


Wednesday, 13 April 2016

Best Disclaimer on sharing personal genetic information thus far

My vote goes to

This sentence sums it all There is zero privacy anyway, get over it 

copied from the above URL

By signing up for openSNP you declare that you have understood the possible risks and side-effects that can occur by making your genetical and medical information available on this platform. In short:
  • Data uploaded to the internet can not be fully deleted, there may always be a backup somewhere
  • By publishing data you expose information about you and your next of kin worldwide
  • Genetic and medical information can be used by employers, insurance companies and the government to know more about you than you would like
  • new findings about your genotypes can be negative
What has been seen can not be unseen
You agree that all data you upload to openSNP will be freely available online (well, except your mail-address and password) under a Creative Commons Zero license. The data can be viewed and downloaded through this webpage, RSS-feeds, in future maybe via an API and via FTP. Although you can delete your data from openSNP this does not guarantee that no one else did already create a backup of the data (who may re-publish the data somewhere else).
There is zero privacy anyway, get over it
Although you can upload your data using a pseudonym, there is no way to anonymously submit data. Statistically speaking it is really unlikely that your medical and genetic information matches that of someone else. By uploading you do not only disclose information about yourself, but also about your next kinship (parents and siblings), that shares half of a genome with you. Before uploading any genetical data you should make sure that those people approve of you doing so. This is especially important if you have monozygotic twin, who shares all of your genome!
Jobs, insurance, the government
Medical and genetic data can be used to discriminate people. Due to medical or genetic information an employer may not give you a job, an insurance company may request higher payments and who knows what any evil™ government will do with your data? Although some countries have laws against genetic discrimination, these laws certainly will not cover possible discrimination scenarios and could change in the future. Again: These are side effects and risks which also can apply to your kinship, if you chose to upload this information.
Knowledge about genes and SNPs is not static
Nearly every week there are new scientific publications that find new associations between certain traits (like diseases) with existing genetic information. Because of this you should not publish your data just because it currently looks harmless and unsuspicious. It may be true that your genotyping data is of no greater interest for your employer, your insurance company or the government right now, but this can easily change (Remember: One of the reasons to upload your data here in the first place is, to enable everyone to find such new associations).
Think of the hypothetical SNP rs666. One day after you upload your genotyping-data to this website, a new publication finds that your genotype at rs666 will give you, your siblings and your parents a fatal disease that will most certainly strike all of you. Due to this disease you (and you kin) may lose your jobs and your insurance. Chances for a association of this kind may be small, but by uploading the data you are nonetheless taking this risk!
Accounts which only serve advertising will be deleted.

Wednesday, 30 March 2016

GWAX:Case-control association mapping without cases PREPRINT

This should be of interest to many!


The case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; in some situations it is more practical to identify family members of cases. We show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are rare in the cohort, and can increase power to detect association by up to 30% for diseases that are more common in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer's disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.

Monday, 14 March 2016

Elephants are resistant to cancer

Whole-genome sequencing of 644 elephant tissue samples using the HiSeq 2500 System identified multiple copies of TP53. Compared to human cells, elephant cells demonstrated increased apoptotic response following DNA damage, which could account for the low incidence of cancer (4.81%) in elephant populations.

Related Links
What elephants can teach scientists about fighting cancer in humans

How elephants avoid cancer

Potential Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in Humans Journal of the American Medical Association, DOI: 10.1001/jama.2015.13134

Friday, 11 March 2016

Ambry to share aggregated anonymous data from 10,000+ human exomes.

    • Exciting times.
      Firms are starting to be more open with the data they have collected for the benefit of mankind. But do watch out for the fine print. there's a disclaimer that although the data is 'free to download and use, the company retains copyright.'

    By Andrew Pollack
  • Posted March 08, 2016


Original post can be found here.
In an unusual move, a leading genetic testing company is making genetic information from the people it has tested publicly available, a move the company says could make a large trove of data available to researchers looking for genes linked to various diseases.
The company, Ambry Genetics, is expected to announce on Tuesday that it will put information from 10,000 of its customers into a database called AmbryShare.
“We’re going to discover a lot of new diagnostic targets and a lot of new drug targets,” Aaron Elliott, interim chief scientific officer at Ambry, which is based in Southern California, said in an interview. “With our volume, we can pull out a significant number of genes just by the sheer number we are looking at.”
The 10,000 people all have or have had breast or ovarian cancer and were tested by Ambry to see if they have genetic variants that increase the risk of those diseases. Ambry returned to the samples from those customers and, at its own expense, sequenced their exomes — the roughly 1.5 percent of a person’s genome that contains the recipes for the proteins produced by the body.
Since proteins perform most of the functions in the body, sequencing just that part of the genome provides considerable information, and is less expensive than sequencing the entire genome.
AmbryShare will not contain the actual exome of each person, because that would pose a risk to patient privacy. Rather it will contain aggregated data on the genetic variants.
For example, a researcher could look up how frequently a particular mutation occurs among the 10,000 people. Ones that occur frequently in these 10,000 patients, but not among healthy people, could raise the risk of developing those cancers.
Specialists welcomed Ambry’s move, but some said it was unclear how useful the information will be. The Exome Aggregation Consortium, an academic collaboration based at the Broad Institute of M.I.T. and Harvard, already has a similar publicly available database containing information from more than 60,000 exomes.
“It is not clear to me that 10,000 exomes changes the game much,” said David B. Goldstein, professor of genetics at Columbia University.
Ambry said its data would be from people with the diseases it tests for, like epilepsy and intellectual development problems, while the Broad database covers a more general population. Ambry said it hoped to add data from as many as 200,000 customers a year to the database.
The Ambry customers whose data is being used were not told specifically about this project. But in ordering tests they consent to having their samples used for research.
Various labs, including Ambry, have been pooling information on which mutations in certain known breast cancer risk genes are harmful or not. AmbryShare is different — aimed more at novel discoveries of genes linked to diseases.
Data can be valuable to drug companies. The consumer genetics company 23andMe sells access to data from its testing to drug makers and uses that data to develop drugs itself.
Charles Dunlop, founder and chief executive of Ambry, said he was approached by drug companies, but decided to make the company’s data freely available to expedite research.
“I’ve got Stage 4 cancer myself,” he said, referring to advanced prostate cancer that is in remission. “I don’t want to wait an extra day.”
He said Ambry had spent $20 million on the project. Ambry is privately held and majority-owned by Mr. Dunlop and his family, insulating it somewhat from shareholder pressure.

Correction: March 10, 2016 
An earlier version of this article described incorrectly the step that Ambry is taking with its customers’ data. It is making the data publicly available; it is not putting the data in the public domain. While the data is free to download and use, the company retains copyright.

Wednesday, 2 March 2016

Baylor releases Exome and WGS data of 7 cancer patients with Open Access

An open access pilot freely sharing cancer genomic data from participants in Texas
  • Scientific Data 3, Article number: 160010 (2016) 
  • ​doi:10.1038/sdata.2016.10
  •  In a pilot Open Access (OA) project from the CPRIT-funded Texas Cancer Research Biobank, many Texas cancer patients were willing to openly share genomic data from tumor and normal matched pair specimens. For the first time, genetic data from 7 human cancer cases with matched normal are freely available without requirement for data use agreements nor any major restriction except that end users cannot attempt to re-identify the participants (

There's whole exome seq data and 2 whole genome sequencing data where the sample quality is good enough for WGS.
A copy of the open-access TCRB data, conditions of use, and the HGSC’s Mercury informatics pipeline is available now for DNAnexus Platform users.

The full paper is here
A copy of the data is also available for DNAnexus Platform users here

Datanami, Woe be me