Kevin's GATTACA World: whole genome

Showing posts with label whole genome. Show all posts

Tuesday, 8 May 2012

I love IGV!

Well sorry for the proclamation

But I think it's really nice that you have a relatively small footprint for a application that's cross platform.

anyway I needed to view a few regions of a merged bam file that's aligned to hg18 from 1000 genomes project.
You would think I might have copies of the hg18 lying around on every server so that I can just use 'samtools tview in.bam ref.fasta'

But unfortunately I don't and I am not really looking forward to downloading the hg18 just for a verification exercise.

Luckily I remembered that IGV has prebuilt human genome references that's available on the fly.
with a few clicks, I was happily viewing the regions of interest. (I downloaded the compiled binary instead of using the 'launch from browser' option but that would have been cool too)

among the list of reference human genomes avail there was actually one available for 1KG specifically. How cool is that?

Go on take the application for a run if you haven't!

http://www.broadinstitute.org/software/igv/download

Wednesday, 10 August 2011

Pauline Ng expects a genome analysis to cost $500.

Pauline Ng is planning open source, open access analytics for the genomes to come.

By Allison Proffitt

August 2, 2011 | SINGAPORE—Pauline Ng’s office is the Genome building of the Biopolis science park in Singapore, a fitting home for one of the authors of the first published personal genome, that of J. Craig Venter, published in 2007 while Ng was a senior scientist at the J. Craig Venter Institute.

Now Ng leads an expanding group of three bioinformaticists (she’s hiring!) at the Genome Institute of Singapore (GIS). Before her stint at the Venter Institute, Ng worked for Illumina as well as the Fred Hutchinson Cancer Center in Seattle, where she wrote the powerful SIFT algorithm (http://sift-dna.org), a widely used tool to predict the effect of a given amino acid substitution on protein function.

But sequencing and analysis—today at least—cost the same. “The problem is that right now, companies like Knome are actually charging the same amount for bioinformatics as they are for sequencing. If you sequence more individuals, I’d expect the bioinformatics to go down, but it’s the same price. That means the price is double! If we can make these tools online, accessible for free or at least at cost, I think I can get it to a tenth of the cost.”

Ng plans to do the computation on the Amazon Cloud and, at today’s rates, expects a genome analysis to cost $500. She hopes that these price points will enable doctors and individuals to use genomics. “If we could say, OK, outsource [the sequencing] to these companies. You’re going to get a hard disk. Mail it to Amazon and get your results in a week.”
Ng is not promising a magic cure, and doesn’t even think that this model should be the only one. She just hopes to drive prices down and open the market. “There’s never a guarantee of an answer,” she says. “Even with the software we write, there may not be a guarantee of an answer, but at least…” she pauses and begins again, emphatically. “We can definitely give you the basic annotation and provide the tools that everyone uses. And if it doesn’t work, then you go to an expensive company that really uses the same tools as the academics but with a couple of more bells and whistles. If you try our stuff first, at least you’ve invested only $500 instead of $5,000.”

Full article here

Tuesday, 12 July 2011

A 3rd party evaluation of Ion Torrent's 316 chip data

Dan Koboldt (from massgenomics) has posted about what I know to be the 1st independent look at the data from Ion Torrent's 316 chip,
Granted the data was handed to him in a 'shiny report with color images' but he has bravely ignored that to give an honest look at the raw data itself.

The 316 chip gives a throughout that nicely covers WGS reseq experiments for bacterial sized genomes. "The E. coli reference genome totals about 4.69 Mbp. With 175 Mbp of data, the theoretical coverage is around 37.5-fold across the E. coli genome."

For those wary of dry reviews, fear not, easily comprehensible graphs are posted within!

read the full post here

Tuesday, 8 February 2011

Complete Genomics on the $10,000 Human genome

Read the complete article here

By Kevin Davies
February 7, 2011 | MARCO ISLAND, FL – It is a testament to the remarkable progress in next-generation sequencing and analysis that when neurobiologist Tim Yu described the complete sequencing of 40 human genomes in a successful search for gene mutations that cause autism, it barely registered a ripple from the large audience.

“We’re still the only company that’s published a 10^-5 error-rate [human] genome,” Reid says (average 1 error/100,000 bases). He asserts that Illumina’s current system consumes $5,000 in reagents, and that cost swells to $20-25,000 when the full cost of informatics and labor is included.
After claiming last year that CGI had cracked the $1,000 genome threshold for reagent costs, Reid now says that CGI’s all-in cost for a complete human genome is under $10,000. “With all of it added in, we’re below $10,000 now. We’ve got a 2-3X cost advantage [over Illumina], and a 10X quality advantage.”
CGI currently charges $9,500 per genome for a minimum order of eight genomes. “You can’t pay $20,000 [per genome] any more, even if you try. We just send the money back!”

Tuesday, 12 October 2010

Human Whole genome sequencing at 11x coverage

http://genomebiology.com/2010/11/9/R91

Just saw this paper Sequencing and analysis of an Irish human genome. AFAIK WGS is usually done at 30x coverage. In this paper, the authors “describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information.” I thought it was pretty good considering that they had 99.3% of the reference genome covered for 10.6x coverage. That leaves only like 21 Mbases missing ..

For those interested in the tech details

Four single-end and five paired-end DNA libraries were generated and sequenced using a GAII Illumina Genome Analyzer. The read lengths of the single-end libraries were 36, 42, 45 and 100 bp and those of the paired end were 36, 40, 76, and 80 bp, with the span sizes of the paired-end libraries ranging from 300 to 550 bp (± 35 bp). In total, 32.9 gigabases of sequence were generated (Table 1). Ninety-one percent of the reads mapped to a unique position in the reference genome (build 36.1) and in total 99.3% of the bases in the reference genome were covered by at least one read, resulting in an average 10.6-fold coverage of the genome.
...
At 11-fold genome coverage, approximately 99.3% of the reference genome was covered and more than 3 million SNPs were detected, of which 13% were novel and may include specific markers of Irish ancestry.

Kevin's GATTACA World