Thursday, 31 January 2013

A hybrid likelihood model for sequence-based disease association studies.

PLoS Genet. 2013 Jan;9(1):e1003224. doi: 10.1371/journal.pgen.1003224. Epub 2013 Jan 24.

A hybrid likelihood model for sequence-based disease association studies.

Chen YC, Carter H, Parla J, Kramer M, Goes FS, Pirooznia M, Zandi PP, McCombie WR, Potash JB, Karchin R.

Source

Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America.

Abstract

In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values[Formula: see text]0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.

PMID:: 23358228; [PubMed - in process]

http://www.ncbi.nlm.nih.gov/pubmed/23358228

Fast and accurate read mapping with approximate seeds and multiple backtracking.

Nucleic Acids Res. 2013 Jan 28. [Epub ahead of print]

Fast and accurate read mapping with approximate seeds and multiple backtracking.

Siragusa E, Weese D, Reinert K.

Source

Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195 Berlin, Germany and Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.

Abstract

We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.

PMID:: 23358824; [PubMed - as supplied by publisher]

http://www.ncbi.nlm.nih.gov/pubmed/23358824

Friday, 25 January 2013

Fwd: [Biopython] Debian Med Sprint in Kiel, Germany 23rd/24th of February

Plug for Debian Med Sprint
---------- Forwarded message ----------
From: Steffen Möller <steffen_moeller gmx.de>
Date: Jan 25, 2013 11:25 PM
Subject: [Biopython] Debian Med Sprint in Kiel, Germany 23rd/24th of February
To: "Biopython Mailing List" <biopython lists.open-bio.org>
Cc:

> Dear all,
>
> We have our annual Debian/Ubuntu/Bio-Linux sprint on Bioinformatics again next month. Every year there are a few individuals more peripheral to the distribution attending, which usually helps us to develop our community further in some way. Anybody from BioPython interested to join in, please read through
> http://wiki.debian.org/DebianMed/Meeting/Kiel2013
> and just email me or add him/herself. There is not anything particular that I expect from the BioPython community, except for more and better ideas on how to develop research on and with tools in computational biology further.
> Registration is free. Accommodation and travel are not.
>
> Cheers,
>
> Steffen
> _______________________________________________

Saturday, 19 January 2013

Watch out for cars and lung cancer

http://mendeliandisorder.blogspot.sg/2012/11/why-i-dont-want-to-know-my-genome.html

http://blogs.plos.org/dnascience/2012/11/01/why-i-dont-want-to-know-my-genome-sequence/

Interesting reads on a rainy Saturday.

I think (at this point in time) believing whole genome sequencing or even exome seq is the way forward in medical health is akin to buying extended warranty.
You don't need it now but you are banking on having cost savings when u actually do (doing one whole genome versus small individual regions)

No doubt eventually when prescription of drugs depends on your genetic make up, your DNA sequences will be invaluable or even compulsory. (Before I read this article I didn't even know being slow to metabolize anti psychotics and beta blockers can be deadly). Right now, genomics offer a glimpse into likely causal associations which can be hard for the man on the street to act on, beyond the advice of " don't smoke, exercise, eat a healthy diet, and don't worry about DNA sequences"

I would also add "watch out for cars" since 1.3 million people die yearly from auto accidents versus 1.4 million deaths attributed to lung cancer.

See
http://www.who.int/mediacentre/factsheets/fs358/en/index.html
http://www.cancerresearchuk.org/cancer-info/cancerstats/world/the-global-picture/

Thursday, 17 January 2013

Article: DSK: k-mer counting with very low memory usage

DSK: k-mer counting with very low memory usage
http://bioinformatics.oxfordjournals.org/content/early/2013/01/16/bioinformatics.btt020.short?buffer_share=64cbf&rss=1

We present a new streaming algorithm for k-mer counting, called DSK (diskstreaming of k-mers), which only requires a fixed, user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered.

DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 hours. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers.

Availability:http://minia.genouest.org/dsk

Sent via Flipboard

Sent from myPhone

Article: Fecal Microbiota Transplantation — An Old Therapy Comes of Age — NEJM

Fascinating!
Fecal Microbiota Transplantation — An Old Therapy Comes of Age — NEJM
http://www.nejm.org/doi/full/10.1056/NEJMe1214816?query=TOC&#article

Sent via Flipboard

Sent from myPhone

Wednesday, 9 January 2013

Article: Genomic basis for coral resilience to climate change

Genomic basis for coral resilience to climate change
http://www.pnas.org/content/early/2013/01/02/1210224110.short?buffer_share=f163c&rss=1

Different corals differ substantially in physiological resilience to environmental stress, but the molecular mechanisms behind enhanced coral resilience remain unclear. Here, we compare transcriptome-wide gene expression (via RNA-Seq using Illumina sequencing) among conspecific thermally sensitive and thermally resilient corals to identify the molecular pathways contributing to coral resilience
Sent via Flipboard

Sent from myPhone

Article: RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network

RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network
http://genomebiology.com/2012/13/12/R125/abstract

Sent via Flipboard

Sent from myPhone

Article: Structure-based whole genome realignment reveals many novel non-coding RNAs

Structure-based whole genome realignment reveals many novel non-coding RNAs
http://genome.cshlp.org/content/early/2013/01/07/gr.137091.111.abstract

Sent via Flipboard

Sent from myPhone

Saturday, 5 January 2013

Bowtie 2 2.0.5 released

Subject: [Bowtie-bio-announce] Bowtie 2 2.0.5 released

Bowtie 2 version 2.0.5 - January 4, 2013
  * Fixed an issue that would cause excessive memory allocation when aligning
    to very repetitive genomes.
  * Fixed an issue that would cause a pseudo-randomness-related assert to be
    thrown in debug mode under rare circumstances.
  * When bowtie2-build fails, it will now delete index files created so far so
    that invalid index files don't linger.
  * Tokenizer no longer has limit of 10,000 tokens, which was a problem for
    users trying to index a very large number of FASTA files.
  * Updated manual's discussion of the -I and -X options to mention that
    setting them farther apart makes Bowtie 2 slower.
  * Renamed COPYING to LICENSE and created a README to be GitHub-friendly.

Best,
Ben

--

Ben Langmead
Department of Computer Science
Johns Hopkins University
3400 North Charles St
Baltimore, MD 21218-2682

Friday, 4 January 2013

new version of IMPUTE2 (v2.3.0)

---------- Forwarded message ----------
From: "Jonathan Marchini" <marchini@ 2013 4:51 PM
Subject: [OXSTATGEN] new version of IMPUTE2 (v2.3.0)
To: <OXSTATGEN>
Cc:

> Hello,
>
> There is a new version of IMPUTE2 (v2.3.0) on the website:
>
> https://mathgen.stats.ox.ac.uk/impute/impute_v2.html
>
> There are several new features in this version:
>
> - IMPUTE2 now has a streamlined way to combine haplotypes from two reference panels and impute from the merged panel. For example, we have seen good results when merging 1000 Genomes haplotypes with sequenced haplotypes from other cohorts to form a combined reference panel. This feature provides accurate imputation of variants that are specific to each panel while maintaining accuracy at variants that are shared across panels. You can read the details of our approach at https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#merging_panels.
>
> - To make the panel-merging as flexible as possible, we now allow the -k_hap parameter to take separate values for each of two reference panels. In essence, you can specify the number of "useful" haplotypes in each reference panel, then IMPUTE2 will take this information into account when merging the reference panels and imputing genotypes in your study. This feature is described here: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#-k_hap.
>
> - We have added important documentation, such as a detailed description of how IMPUTE2 creates the concordance tables that are printed at the end of most runs: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#concordance_tables.
>
> - We fixed some bugs that were present in v2.2.2:
> -- The program used to throw an error when the -use_prephased_g and -chrX flags were combined; now these options are compatible.
> -- You can get phased imputation output by combining the -use_prephased_g and -phase flags, but previously the phasing of hets in the input file (-known_haps_g) was scrambled in the output; this is now fixed.
> -- Another problem with combining the -use_prephased_g and -phase flags is that Type 3 SNPs (those present in the -known_haps_g file but not the reference panel) were omitted from the output haplotypes; now these SNPs are included in the output by default.
> -- Annotations in the reference legend file (columns 5+) used to be restricted to numeric values, but now the program can handle string values as well. This extends the flexibility of the -filt_rules_l mechanism for run-time filtering of reference variants.
>
> - IMPUTE2 can now be made even more accurate by pre-phasing your study genotypes with SHAPEIT2 [ http://www.shapeit.fr/ ], which combines ideas from SHAPEIT and IMPUTE2 to improve the accuracy and efficiency of haplotype estimation. You can read the SHAPEIT2 article here: http://www.nature.com/nmeth/journal/v10/n1/full/nmeth.2307.html.
>
> - We have redesigned the website to make it easier to navigate.
>
> We are still actively improving some of the new features, and we anticipate making another software release in the next few months. In the meantime, we would be happy to hear your feedback about the new software and website.
>
> Happy imputing and Happy New Year!
>
> Bryan and Jonathan
>
> --
> o__ Jonathan Marchini
> c/ /'_ Department of Statistics, University of Oxford
> (+) \(+) 1 South Parks Road, Oxford, OX1 3TG
>

http://www.stats.ox.ac.uk/~marchini/
>

Tuesday, 1 January 2013

PLOS Computational Biology: Ten Simple Rules for the Open Development of Scientific Software

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002802

If you have the choice, embracing an open approach to development has tremendous benefits. It allows you to build on the work of other scientists, and enables others to build on your own efforts. To make the development of open scientific software more rewarding and the experience of using software more positive, the following ten rules are intended to serve as a guide for any computational scientist.

Kevin's GATTACA World

Thursday, 31 January 2013