Kevin's GATTACA World: ABI

Showing posts with label ABI. Show all posts

Tuesday, 6 December 2011

Complete Khoisan and Bantu genomes from southern Africa : Article : Nature

http://www.nature.com/nature/journal/v463/n7283/full/nature08795.html

Just attended a very good lecture by Stephan Schuster, entitled "African Genomes: Charting Human Diversity"

He offered unbiased views / charts on the platform differences between 454, GAIIx, HiSeq, SOLiD for NGS sequencing coverage (which I think I should not repeat here). It points to the need to do sequencing on 2 different platforms to get a more accurate SNP list.

He also gave compelling reasons for getting a 20x coverage of Human genome done in 454 to complete the human genome (457 gaps in hg19).

Yes, I often forget that the media / lay person thinks that the human genome is 'complete'. It's a often ignored 'secret' that actually it isn't. Maybe the next marketing ploy(i mean strategy) for emerging sequencing platforms would be to be THE ONE that actually finishes the human genome.

Tuesday, 9 November 2010

SOLiD™ BioScope™ Software v1.3 releasing soon

v1.3 is due for release soon! How do I know other than the fact that you can register for v1.3 video tutorials , e.g. SOLiD™ Targeted ReSeq Data Analysis featuring BioScope 1.3 (1 hour)
The clue comes from new documentation that is being uploaded on to solidsoftwaretools.com.

BioScope™ Software v1.3 adds/enhances support for following:

Targeted Resequencing analysis (enrichment statistics and target
filtering)
BFAST integration
Annotation, reporting and statistics generation
Methylation analysis
75 bp read length support
Mapping and Pairing speed improvements

It also fixes a long list of bugs I won't repeat all of them here.
But the important ones are

Bug – Pairing: In BAM file, readPaired and firstOfPair/secondOfPair flags set incorrectly for reads with missing mates.
Bug – diBayes: Defunct java processes continue when bioscope exits
Bug – Mapping: When the last batch of the processing has the number of reads less than the value of the key mapping.np.per.node, the ma file contains duplicated entries.

Have fun playing with the new version when it's up!
here's some impt notes:

It is advised that a user runs BioScope using the user’s own user
account. Then if Control-C is used to interrupt bioscope.sh which
spawns many other processes, user can use following OS commands
to find the pid of the left-over processes, and clean them up.
ps –efl | grep bioscope.sh | grep username
ps –efl | grep java_app.sh | grep username
ps –efl | grep map | grep username
ps –efl | grep java | grep username
ps –efl | grep mapreads | grep username
ps –efl | grep pairing | grep username
kill -9 PID

Oh but I would use the command highlighted in bold carefully as basically it kills all process that have the name java in it

My suggestion to the team is to have a db table to keep the PID of launched processes instead of depending on non-unique names. Ensembl's pipeline uses perl with less overhead to track jobs and it is much cleaner to clear up.

Monday, 8 November 2010

At ASHG, Ion Torrent Drums Up Interest; Provides Preliminary Specs for PGM

WASHINGTON, DC – Ion Torrent revealed some preliminary specs for its Personal Genome Sequencer, due to be launched later this year, as the Life Technologies business unit presented the instrument to potential customers at its booth at the American Society for Human Genetics meeting this week. The speed of the instrument — a run takes approximately two hours, and several runs can be performed in a day — is what appears to be most attractive to potential customers, Maneesh Jain, Ion Torrent's vice president of marketing and business development, told In Sequence.
The first version of the PGM will sell for $49,500, plus a $16,500 server to analyze the data.
Initially, the machine will produce about 10 megabases of data per run, or about 100,000 reads of 100 base pairs each, using the so-called 314 chip, which has about 1.5 million wells and will cost $250. Reagent kits for template preparation, library preparation, and sequencing will cost another $250, bringing the total consumables cost per run to approximately $500.
In the first half of 2011, Ion Torrent plans to launch the 316 chip, with about 6 million wells, which will increase the output per run to 100 megabases and which will cost about twice as much as the 314. Additional chip upgrades will follow, with details to be revealed next year.
Sample prep, which Jain said takes about a day and can be done in batches of six to eight samples, requires an emulsion PCR protocol, which will be simplified over time. "We focused on the sequencing initially," he said, adding that the next step will be to optimize the sample prep. Life Technologies said previously that sample prep for the PGM would eventually be able to use the EZ Bead system, which was originally developed for the SOLiD system.
Read full article here

Wednesday, 3 November 2010

Life Technologies Launches New SOLiD Sequencer to Drive Advances in Cancer Biology and Genetic Disease Research

It's official! The web is crawling with the news reports. Read their press release here
My previous coverage on the preview launch is here
There's a discussion in the seqanswers forum on the new machine.

The Life Tech cmsXXX.pdfs with the useful specs are out too. you can google them or search on the website
The specs
solid.appliedbiosystems.com/solid5500

Monday, 25 October 2010

AB on Ion Torrent

There was a brief mention of the Ion Torrent at the 5500 presentation as well but nothing of great significance. I do wish they marketing fellows will push ion torrent out faster but i think they are trying to streamline production by testing if invitrogen kits can replace the ones at ion torrent. I do hope they do not sacrifice compatibility over performance.

For Bioscope, they are going to include base space support (hurray?) presumably so that they can use the same pipeline for analysis of their SMS and Ion Torrent technologies.

Stay Tuned!

AB releases 4 HQ and PI as 5500xl and 5500 SOLiD

Was lucky to be part of the 1st group to view the specs and info on the new SOLiD 4 hq.
For reasons unknown,
They have renamed it to 5500XL and 5500 solid system which is your familiar 4 HQ and PI
Or if you prefer formulas.
5500xl = 4 hq
5500 = PI

One can only fathom their obession with these 4 digits judging by similar instruments named
AB Sciex Triple Quad 5500 and the AB Sciex QTrap 5500

Honestly the 5500 numbers are of no numerical significance AFAIK.

outlook wise both looks like the PI system
I DO NOT see the computer cluster anymore, that's something I am curious about.

Finally we are at 75 bp though.
Of notable importance, there is a new Exact Call Chemistry module (ECC) which promises 99.99% accuracy which is optional as it increases the run time.
the new solid system is co-developed with the Hitachi-Hi Technologies.
Instead of the familiar slides, they use 'flowchips' now. with 6 individual lanes to allow for more mixing of samples of different reads.
for the 5500xl
throughput per day is 20-30 Gb
per run you have 180 Gb or 2.8 B tags (paired ends or mate pairs)

Contrary to most rumours, 5500xl is upgradeable from SOLiD 4 although I suspect it is a trade in program. No mention about the 5500 (which i guess is basically a downgrade).

The specs should be up soon
solid.appliedbiosystems.com/solid5500

Update from seqanswers from truthseqr
http://seqanswers.com/forums/showthread.php?t=6761&goto=newpost

Here is the message that has just been posted:
***************
AB is premiering two new instruments at ASHG next week.

Mobile ASHG calendar: http://m.appliedbiosystems.com/ashg/ (http://solid.community.appliedbiosystems.com/)

Twitter account: @SOLiDSequencing (http://twitter.com/SOLiDSequencing)

SOLiD Community: http://solid.community.appliedbiosystems.com/

More info soon at: solid.appliedbiosystems.com/solid5500/ (http://solid.appliedbiosystems.com/solid5500)

Tuesday, 12 October 2010

SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data

Have a look at this tool http://genomebiology.com/2010/11/10/R99/abstract
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)

Excerpted

SRMA was able to improve the ultimate variant calling using a variety of measures on the simulated data from two different popular aligners (BWA and BFAST. These aligners were selected based on their sensitivity to insertions and deletions (BFAST and BWA), since a property of SRMA is that it produces a better consensus around indel positions. The initial alignments from BFAST allow local SRMA re-alignment using the original color sequence and qualities to be assessed as BFAST retains this color space information. This further reduces the bias towards calling the reference allele at SNP positions in ABI SOLiD data, and reduces the false discovery rate of new variants. Thus, local re-alignment is a powerful approach to improving genomic sequencing with next generation sequencing technologies. The alignments to the reference genome were implicitly split into 1Mb regions and processed in parallel on a large computer cluster; the re-alignments from each region were then merged in a hierarchical fashion. This allows for the utilization of multi-core computers, with one re-alignment per core, as well as parallelization across a computer cluster or a cloud. The average peak memory utilization per process was 876Mb (on a single-core), with a maximum peak memory utilization of 1.25GB. On average, each 1Mb region required approximately 2.58 minutes to complete, requiring approximately 86.17 hours total running time for the whole U87MG genome. SRMA also supports re- alignment within user-specified regions for efficiency, so that only regions of interest need to be re-aligned. This is particularly useful for exome-sequencing or targeted re-sequencing data.

Sunday, 30 May 2010

Cofactor genomics on the different NGS platforms

Original post here

They are a commercial company that offers NGS on ABI and Illumina platforms and since this is on their company page I guess its their official stand on what rocks on each platform

Excerpted.

Applied Biosystems SOLiD 3

The Applied Biosystems SOLiD 3 has the shortest but also the highest quantity of reads. The SOLiD produces up to 240 million 50bp reads per slide per end. As with the Illumina, Mate-Pairs produce double the output by duplicating the read length on each end, and the SOLiD supports a variety of insert lengths like the 454. The SOLiD can also run 2 slides at once to again double the output. SOLiD has the lowest *raw* base qualities but the highest processed base qualities when using a reference due to its 2-base encoding. Because of the number of reads and more advanced library types, we recommend the SOLiD for all RNA and bisulfite sequencing projects.

Solexa/Illumina

The Solexa/Illumina generates shorter reads at 36-75bp but produces up to 160 million reads per run. All reads are of similar length. The Illumina has the highest *raw* quality scores and its errors are mostly base substitutions. Paired-end reads with ~200 bp inserts are possible with high efficiency and double the output of the machine by duplicating the read length on each end. Paired-end Illumina reads are suitable for de novo assemblies, especially in combination with 454. The large number of reads makes the Illumina appropriate for de novo transcriptome studies with simultaneous discovery and quantification of RNAs at qRT-PCR accuracy.

Roche/454 FLX

The Roche/454 FLX with Titanium chemistry generates the longest reads (350-500bp) and the most contiguous assemblies, can phase SNPs or other features into blocks, and has the shortest run times. However, 454 also produces the fewest total reads (~1 million) at the highest cost per base. Read lengths are variable. Errors occur mostly at the ends of long same-nucleotide stretches. Libraries can be constructed with many insert sizes (8kb - 20kb) but at half of the read length for each end and with low efficiency.

Wednesday, 26 May 2010

A scientific spectator's guide to next-generation sequencing

ROFL
I love the title!

A scientific spectator's guide to next-generation sequencing

Dr Keith not only looks at next gen sequencing but also the emerging technologies of single molecule sequencing. Interesting read!

My fave parts of the review
"Finally, there is the cost per base, generally expressed in a cost per human genome sequenced at approximately 40X coverage. To show one example of how these trade off, the new PacBio machine has a great cost per sample (~U$100) and per run (you can run just one sample) but a poor cost per human genome – you’d need around 12,000 of those runs to sequence a human genome (~U$120K). In contrast, one can buy a human genome on the open market for U$50K and sub U$10K genomes will probably be generally available this year."

"Length is critical to genome sequencing and RNA-seq experiments, but really short reads in huge numbers are what counts for DGE/SAGE and many of the functional tag sequencing methods. Technologies with really long reads tend not to give as many, and with all of them you can always choose a much shorter run to enable the machine to be turned over to another job sooner – if your application doesn’t need long reads."

Tuesday, 13 April 2010

ABI's pipeline Bioscope possible cloud service?

Hmmm saw an obscure reference to possible cloud hosting for ABI's Bioscope software from a poster on the main page.

BioScope™ 1.2: An Applications Framework for SOLiD™ Sequence Data Analysis. (PDF, 172 KB) Suri, P., et al. (AGBT 2010)

"...Similar performance has been observed for BioScope™software deployed on cloud (SOLiDBioScope.com)."

But googling for the link yielded nothing..

Wednesday, 7 April 2010

Comparing NGS platforms, 454, Solexa, SOLiD

Inspired by Albert's work at http://ngsbuzz.blogspot.com/

Please post discrepancies or views in comments

Tuesday, 22 December 2009

Simulated ABI Solid data sets

Finally found a link that describes how u can generate a test data set for ABI solid runs! Done using SAMtools
now to put my spanking new cluster to the test

Thursday, 10 December 2009

A sign of things to come.

NGS really takes data sizes to new heights. Even downloading the sample data sets for small RNA analysis from ABI website ( Human Small RNA Data Set ) takes this amount of time

Downloaded: 4 files, 14G in 1d 4h 35m 3s (146 KB/s)

well in truth the files were 26 Gb in total but due to my network issues I had to retry the download again.

no md5 checksums. So I will know in a day or two if the downloads went smoothly.

Kevin's GATTACA World

Tuesday, 6 December 2011

Complete Khoisan and Bantu genomes from southern Africa : Article : Nature

Tuesday, 9 November 2010

SOLiD™ BioScope™ Software v1.3 releasing soon

Monday, 8 November 2010

At ASHG, Ion Torrent Drums Up Interest; Provides Preliminary Specs for PGM

Wednesday, 3 November 2010

Life Technologies Launches New SOLiD Sequencer to Drive Advances in Cancer Biology and Genetic Disease Research

Monday, 25 October 2010

AB on Ion Torrent

AB releases 4 HQ and PI as 5500xl and 5500 SOLiD

Tuesday, 12 October 2010

SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data

Sunday, 30 May 2010

Cofactor genomics on the different NGS platforms

Applied Biosystems SOLiD 3

Solexa/Illumina

Roche/454 FLX

Wednesday, 26 May 2010

A scientific spectator's guide to next-generation sequencing

A scientific spectator's guide to next-generation sequencing

Dr Keith not only looks at next gen sequencing but also the emerging technologies of single molecule sequencing. Interesting read!

Tuesday, 13 April 2010

ABI's pipeline Bioscope possible cloud service?

Wednesday, 7 April 2010

Comparing NGS platforms, 454, Solexa, SOLiD

Tuesday, 22 December 2009

Simulated ABI Solid data sets

Thursday, 10 December 2009

A sign of things to come.

Datanami, Woe be me

Analytics code

Contributors