http://schatzlab.cshl.edu/presentations/2013-02-20.AGBT.Assembling%20Crop%20Genomes.pdf
if you need an intro
"In a talk during the evening session, Mike Schatz, an assistant professor at Cold Spring Harbor Laboratory, spoke about “Assembling Crop Genomes with Single Molecule Sequencing.” Crops are important to sequence — 15 crops represent 90% of the world’s food, Schatz said — but are notoriously difficult to study because of their large genome size, high repeat content, and higher ploidy. Along with Sergey Koren and Adam Phillippy, he has built a pipeline to create hybrid genome assemblies using PacBio long reads combined with shorter-read sequence — either CCS reads from PacBio or data from another sequencing platform. In an example he offered of a rice strain, an attempted genome assembly using just Illumina reads yielded an N50 contig of 16Kb, but adding PacBio long reads to that boosted the N50 contig to 25Kb. Ultimately, Schatz said, he expects that as PacBio's readlength improves, this kind of approach could routinely generate megabase-size contigs or even pull plant chromosomes into single contigs.
For more information on Mike Schatz’s work using SMRT Sequencing, check out this case studydescribing an automated pipeline for genome finishing with PacBio long reads."
source: http://blog.pacificbiosciences.com/2013/02/notes-from-agbt-long-read-sequence-data.html
He includes a snippet of code to answer this question from twitter
'What's the longest single contig from a de Bruijn assembler without PE or a jumping library?'
$ perl -e 'print ">random\n"; @D=split //,"ACGT"; \for (1...100000000){print $D[int(rand(4))];} \print "\n"’ | fold > random.fa$ wgsim –r 0 -e 0 -N 50000000 -1 100 -2 1 \random.fa random.reads.fq /dev/null$ SOAPdenovo-63mer all –s random.cfg -K 63 -o random.63$ getlengths random.63.contig 1 99999990
No comments:
Post a Comment