Showing posts with label resequencing. Show all posts

Sunday, 6 May 2012

SEQCHIP: A Powerful Method to Integrate Sequence and Genotype Data for the Detection of Rare Variant Associations.

Bioinformatics. 2012 May 3. [Epub ahead of print]

SEQCHIP: A Powerful Method to Integrate Sequence and Genotype Data for the Detection of Rare Variant Associations.

Liu DJ, Leal SM.

Source

1. Department of Biostatistics, Center of Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109.

Abstract

MOTIVATION:

Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests.

RESULTS:

It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods.

AVAILABILITY:

SEQCHIP is implemented in an R-Package and is available at http://linkage.rockefeller.edu/suzanne/seqchip/Seqchip.htmContacts: dajiang@umich.edu (D.J.L.), sleal@bcm.edu (S.M.L.)

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:: 22556370; [PubMed - as supplied by publisher]

Tuesday, 26 October 2010

Throwing the baby out with the bathwater:Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association

I was literally having a 'oh shoot' moment when i saw this news in GenomeWeb

Synonymous SNPs Shouldn't Be Discounted in Disease, Study Finds

NEW YORK (GenomeWeb News) – Synonymous SNPs that don't change the amino acid sequence encoded by a gene appear just as likely to influence human disease as non-synonymous SNPs that do, according to a paper appearing online recently in PLoS ONE by researchers from Stanford University and the Lucile Packard Children's Hospital.

from the abstract of the paper

The enrichment of disease-associated SNPs around the 80^th base in the first introns might provide an effective way to prioritize intronic SNPs for functional studies. We further found that the likelihood of disease association was positively associated with the effect size across different types of SNPs, and SNPs in the 3′untranslated regions, such as the microRNA binding sites, might be under-investigated. Our results suggest that sSNPs are just as likely to be involved in disease mechanisms, so we recommend that sSNPs discovered from GWAS should also be examined with functional studies.

Hmmmm how is this going to affect your carefully crafted pipeline now?

Tuesday, 12 October 2010

Human Whole genome sequencing at 11x coverage

http://genomebiology.com/2010/11/9/R91

Just saw this paper Sequencing and analysis of an Irish human genome. AFAIK WGS is usually done at 30x coverage. In this paper, the authors “describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information.” I thought it was pretty good considering that they had 99.3% of the reference genome covered for 10.6x coverage. That leaves only like 21 Mbases missing ..

For those interested in the tech details

Four single-end and five paired-end DNA libraries were generated and sequenced using a GAII Illumina Genome Analyzer. The read lengths of the single-end libraries were 36, 42, 45 and 100 bp and those of the paired end were 36, 40, 76, and 80 bp, with the span sizes of the paired-end libraries ranging from 300 to 550 bp (± 35 bp). In total, 32.9 gigabases of sequence were generated (Table 1). Ninety-one percent of the reads mapped to a unique position in the reference genome (build 36.1) and in total 99.3% of the bases in the reference genome were covered by at least one read, resulting in an average 10.6-fold coverage of the genome.
...
At 11-fold genome coverage, approximately 99.3% of the reference genome was covered and more than 3 million SNPs were detected, of which 13% were novel and may include specific markers of Irish ancestry.

Kevin's GATTACA World

Sunday, 6 May 2012