Monday, 25 February 2013

Michael Schatz:Assembling Crop Genomes With SMS

PDF of the presentation on Feb 22, 2013 AGBT, Marco Island, FL

if you need an intro

"In a talk during the evening session, Mike Schatz, an assistant professor at Cold Spring Harbor Laboratory, spoke about “Assembling Crop Genomes with Single Molecule Sequencing.” Crops are important to sequence — 15 crops represent 90% of the world’s food, Schatz said — but are notoriously difficult to study because of their large genome size, high repeat content, and higher ploidy. Along with Sergey Koren and Adam Phillippy, he has built a pipeline to create hybrid genome assemblies using PacBio long reads combined with shorter-read sequence — either CCS reads from PacBio or data from another sequencing platform. In an example he offered of a rice strain, an attempted genome assembly using just Illumina reads yielded an N50 contig of 16Kb, but adding PacBio long reads to that boosted the N50 contig to 25Kb. Ultimately, Schatz said, he expects that as PacBio's readlength improves, this kind of approach could routinely generate megabase-size contigs or even pull plant chromosomes into single contigs.

For more information on Mike Schatz’s work using SMRT Sequencing, check out this case studydescribing an automated pipeline for genome finishing with PacBio long reads."


He includes a snippet of code to answer this question from twitter
'What's the longest single contig from a de Bruijn assembler without PE or a jumping library?'

$ perl -e 'print ">random\n"; @D=split //,"ACGT"; \for (1...100000000){print $D[int(rand(4))];} \print "\n"’ | fold > random.fa$ wgsim –r 0 -e 0 -N 50000000 -1 100 -2 1 \random.fa random.reads.fq /dev/null$ SOAPdenovo-63mer all –s random.cfg -K 63 -o random.63$ getlengths random.63.contig           1 99999990

No comments:

Post a Comment

Datanami, Woe be me