Saturday, 30 April 2011

Evaluation of next-generation sequencing software in mapping and assembly.

Evaluation of next-generation sequencing software in mapping and assembly.

J Hum Genet. 2011 Apr 28;

Authors: Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 28 April 2011; doi:10.1038/jhg.2011.43.

PMID: 21525877 [PubMed - as supplied by publisher]


HowTo: Take Untethered Screenshots in Froyo Android

Finally found out how I was activating this accidentally!

Monday, 25 April 2011

Webinar Introduction to Ion Torrent Informatics

Webinar by Life Technologies
Introduction to Ion Torrent Informatics
4 May 2011
Register via this web link

Event Title Introduction to Ion Torrent Informatics
Event Description The Ion Torrent semiconductor sequencing platform includes a preconfigured Torrent Server that processes data from the Ion PGM™ Sequencer. With each semiconductor sequencing run, data analysis occurs on the Torrent Server. Scientists can interact with Torrent Server through the remotely accessible Torrent Browser web interface. Both processing status and run performance are easily viewable through these web pages. From Torrent Browser, detailed analysis reports can be viewed or sequencing data can be downloaded to your local computer for downstream analysis. The Torrent Suite Software formats base call and alignment data using industry standard data formats giving users the flexibility to use a wide variety of analysis tools. For scientists who are looking for analysis solutions, several downstream software packages are available and will be briefly demonstrated to show how DNA variations can be identified.

Ion Torrent, episode #1, scalability, simplicity, and speed

OMG Ion Torrent ( can I say Life Technologies ) does a hilarious* parody of Apple advertising.. 

Watch it quick! I suspect they might be pulled down soon judging by the nay reviews by the blogsphere. .. 

*Hilarious for the wrong reasons

Lists of URLs are so 1990s - 4 days ago on What You're Doing Is Rather Desperate

What You're Doing Is Rather Desperate - Lists of URLs are so 1990s 

This is so true.. I have given up bookmarking anything for archival purposes and keeping a browser bookmark db on my thumbdrive. 

Next generation sequencing reveals genome downsizing in allotetraploid Nicotiana tabacum, predominantly through the elimination of paternally derived repetitive DNAs.

1.Next generation sequencing reveals genome downsizing in allotetraploid Nicotiana tabacum, predominantly through the elimination of paternally derived repetitive DNAs.
Renny-Byfield S, Chester M, Kovarík A, Le Comber SC, Grandbastien MA, Deloger M, Nichols R, Macas J, Novák P, Chase MW, Leitch AR.
Mol Biol Evol. 2011 Apr 21. [Epub ahead of print]
PMID: 21512105 [PubMed - as supplied by publisher]
Related citations

Thursday, 21 April 2011

Ion Torrent Inks Software Distribution Agreement with SoftGenetics

Ion Torrent Inks Software Distribution Agreement with SoftGenetics

Ion Torrent will offer SoftGenetics' NextGene software as part of a cadre of next-generation sequence data analysis products ... read more


Amazon EC2 Goes Down, Taking With It Reddit, Foursquare And Quora

Amazon EC2 Goes Down, Taking With It Reddit, Foursquare And Quora

Cloud computing is all very well until someone trips over a wire and the whole thing goes dark. Reddit, Foursquare and Quora ... read more


Obama Wants "to Start Making Science Cool" [Blockquote]

:-)  I am already sold...

Obama Wants "to Start Making Science Cool" [Blockquote]

Not only did Barack Obama tell people at his Facebook town hall lecture that he wants science to be cool, he also said he wants ... read more


Gut-Bacteria Mapping Finds Three Global Varieties | Wired Science |

when pop sci does a review of your paper, you know you have captured the interest of the public

Tuesday, 19 April 2011

New Study Reveals 1 Million Human Genome Sequence Errors Across Two NGS Platforms

New Study Reveals 1 Million Human Genome Sequence Errors Across Two NGS Platforms

Bio-IT World | "What does it mean to have a 'healthy' genome?" That was the question that University of Utah geneticist Mark ... read more


Thursday, 14 April 2011

Tumour evolution inferred by single-cell sequencing

Tumour evolution inferred by single-cell sequencing

Tumour evolution inferred by single-cell sequencing
Nature 472, 7341 (2011). doi:10.1038/nature09807
Authors: Nicholas Navin, Jude Kendall, Jennifer Troge, Peter Andrews, Linda Rodgers, Jeanne McIndoo, Kerry Cook, Asya Stepansky, Dan Levy, Diane Esposito, Lakshmi Muthuswamy, Alex Krasnitz, W. Richard McCombie, James Hicks & Michael Wigler
Genomic analysis provides insights into the role of copy number variation in disease, but most methods are not designed to resolve mixed populations of cells. In tumours, where genetic heterogeneity is common, very important information may be lost that would be useful for reconstructing evolutionary history. Here we show that with flow-sorted nuclei, whole genome amplification and next generation sequencing we can accurately quantify genomic copy number within an individual nucleus. We apply single-nucleus sequencing to investigate tumour population structure and evolution in two human breast cancer cases. Analysis of 100 single cells from a polygenomic tumour revealed three distinct clonal subpopulations that probably represent sequential clonal expansions. Additional analysis of 100 single cells from a monogenomic primary tumour and its liver metastasis indicated that a single clonal expansion formed the primary tumour and seeded the metastasis. In both primary tumours, we also identified an unexpectedly abundant subpopulation of genetically diverse ‘pseudodiploid’ cells that do not travel to the metastatic site. In contrast to gradual models of tumour progression, our data indicate that tumours grow by punctuated clonal expansions with few persistent intermediates.

What do I do with a PGM?

I am curious if 1 Gbp is enough for broad views of metagenomics, cursory glances of WGS for eukaryotes, and targetted sequence capture..  here's views by others on how they would use the PGM

IonTorrent: Benchtop Sequencing, Streamlined

from MassGenomics

PGM Applications

It’s a good thing to see the PGM constantly evolving - first the throughput was doubled, and now sample prep time cut substantially. At the price points we’re talking about, this might easily become standard equipment or small labs, academic departments, even single investigators. The current throughput of 1 Gbp isn’t enough for whole-genome or whole-exome sequencing, but it opens the door to a number of targeted applications. In Genome Technology’s Cancer Issue this month, for example, I read about a group that’s using the PGM for a clinical test comprising 100 common mutations in human cancers.
Essentially, the niche for PGM, MiSeq, and GS Junior is everything that’s not quite enough for a full-on sequencing run. A few examples come to mind:

  1. Microbial sequencing. For bite-sized genomes, 1 Gbp should be more than enough. Imagine walking into a clinic to have your strain of Streptococcus or some other infection sequenced the same day.
  2. Family linkage studies. With a few family members and a reasonably-sized linkage peak, you could sequence all gene-coding exons across a region of interest, either by PCR or custom capture.
  3. Orthogonal validation. Whole-genome and whole-exome studies might identify hundreds of putative mutations. Emphasis on putative. No matter how good your algorithms and filters are, there will be some false positives. Here’s an opportunity for a small, fast validation instrument. Preferably, you choose a different sequencing technology for validation (e.g. PGM for Illumina, MiSeq for SOLiD).
For more, see Keith Robison’s post at Omics! Omics! or Matthew Herber’s blog on

Want to use samtools for polyploid organisms?

Well the short answer is you can't.
Here's what Heng Li has to say
Samtools is designed for diploid genomes. I would recommend to treat a haploid genome as diploid and filtering heterozygotes afterwards. For higher ploidy (>2), samtools does not work well. For pooled resequencing, specialized SNP callers (such as syzygy and a few others) are better.

Other recommendations are

Wednesday, 13 April 2011

ZORRO is an hybrid sequencing technology assembler:tested with Solexa 454

Typos in the header aside... you have to love the name!

waiting for the name to become a verb... " I zorroed the NGS reads the other and i had a fantastic assembly!" lol..

Here goes:


ZORRO is an hybrid sequencing technology assembler. It takes 2 sets of pre-assembled contigs and merge them into a more contiguous and consistent assembly. We have already tested Zorro with Illumina Solexa and 454 from some of organisms varying from 3Mb to 100Mb. The main caracteristic of Zorro is the treatment before and after assembly to avoid errors.
The ZORRO project is maintained by Gustavo Lacerda, Ramon Vidal and Marcelo Carazzole and were first used in this Yeast assembly: Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production
ZORRO needs to be better documented and has not undergone enough testing. If you want to discuss the pipeline you can join the mailing list: zorro-google group

Zorro: The Complete Series

p.s. the typo is here
"ZORROthe masked assember "

Monday, 11 April 2011

Android Apps for Scientists!

Just got myself an Android phone and loving it.. it hits the sweet spot for price and functionality though I have crashed it a few times already..
check out this site for stuff like

AgileSciTools is a calculator set for biologists. Includes functions to determine molarity dilutions, cell dilutions, MOI calculations, and primer resuspension volumes. You can also count cells using the Laboratory Cell Counter.

AgileMedSearch: Searching through pubmed databases. Pretty much bare-bones. Can search for articles, read abstracts, and email details. PubMedMobile seems to have some more functionality, with a link to the article, if available, and more search parameters.

1 billion core-hours for 10 qualified researchers!

Official Google Research Blog: 1 billion core-hours of computational capacity for researchers: "10 qualified researchers with at least 100 million core-hours each, for a total of 1 billion core-hours."

We’re pleased to announce a new academic research grant program: Google Exacycle for Visiting Faculty. Through this program, we’ll award up to 10 qualified researchers with at least 100 million core-hours each, for a total of 1 billion core-hours. The program is focused on large-scale, CPU-bound batch computations in research areas such as biomedicine, energy, finance, entertainment, and agriculture, amongst others. For example, projects developing large-scale genomic search and alignment, massively scaled Monte Carlo simulations, and sky survey image analysis could be an ideal fit.

Exacycle for Visiting Faculty expands upon our current efforts through University Relations to stimulate advances in science and engineering research, and awardees will participate through the Visiting Faculty Program. We invite full-time faculty members from universities worldwide to apply. All grantees, including those outside of the U.S., will work on-site at specific Google offices in the U.S. or abroad. The exact Google office location will be determined at the time of project selection.

We are excited to accept proposals starting today. The application deadline is 11:59 p.m. PST May 31, 2011. Applicants are encouraged to send in their proposals early as awards will be granted starting in June.

More information and details on how to apply for a Google Exacycle for Visiting Faculty grant can be found on the Google Exacycle for Visiting Faculty website.

Well not a whole bunch of help for NGS which is predominantly disk bound but still fun to know.. or perhaps someone knows how to make use of the cpu cycles to better do this

Datanami, Woe be me