Showing posts with label PacBio. Show all posts
Showing posts with label PacBio. Show all posts

Saturday, 29 September 2018

Koala Genome assembled on AWS



Excerpted from AWS blog 
 
Five years ago, a research team led by Dr. Rebecca Johnson (Director of the Australian Museum Research Institute) set out to learn more about koala populations, genetics, and diseases. As a biologically unique animal with a limited appetite, maintaining a healthy and genetically diverse population are both key elements of any conservation plan. In addition to characterizing the genetic diversity of koala populations, the team wanted to strengthen Australia’s ability to lead large-scale genome sequencing projects.
Inside the Koala Genome
Last month the team published their results in Nature Genetics. Their paper (Adaptation and Conservation Insights from the Koala Genome) identifies the genomic basis for the koala’s unique biology. 


This work was performed on AWS. The research team used cfnCluster to create multiple clusters, each with 500 to 1000 vCPUs, and running Falcon from Pacific Biosciences. All in all, the team used 3 million EC2 core hours, most of which were EC2 Spot Instances.

Thursday, 22 October 2015

THE FUTURE OF GENOMICS & INFORMATICS IN THE NEXT 5 YEARS



Tina Graves-Lindsay, Leader of the Reference Genomes Group at the McDonnell Genome Institute (MGI) at Washington University St. Louis, kicked off the session talking about the research involved in achieving the best human whole genome assembly. As a member of the Genome Reference Consortium, Tina’s team has been working to improve the current reference, GRCh38, and fixing a few genes that are not optimally represented for all individuals or ancestries.

The sequence plan starts with generating 60x coverage of PacBio long read data for a de novo assembly. From there, MGI incorporates BioNano or Dovetail data to create scaffolds that in some cases nearly cover entire chromosome arms. Since MGI is targeting difficult to assemble regions of the genome, they sequence bacterial artificial chromosomes (BACs) to fill the targeted regions and then incorporate all this data together to generate a very high quality whole genome assembly labeled the “Gold Genome”.

http://blog.dnanexus.com/2015-10-15-post-ashg-the-future-of-genomics-informatics-in-the-next-5-years/


Monday, 25 February 2013

Michael Schatz:Assembling Crop Genomes With SMS

PDF of the presentation on Feb 22, 2013 AGBT, Marco Island, FL

http://schatzlab.cshl.edu/presentations/2013-02-20.AGBT.Assembling%20Crop%20Genomes.pdf

if you need an intro

"In a talk during the evening session, Mike Schatz, an assistant professor at Cold Spring Harbor Laboratory, spoke about “Assembling Crop Genomes with Single Molecule Sequencing.” Crops are important to sequence — 15 crops represent 90% of the world’s food, Schatz said — but are notoriously difficult to study because of their large genome size, high repeat content, and higher ploidy. Along with Sergey Koren and Adam Phillippy, he has built a pipeline to create hybrid genome assemblies using PacBio long reads combined with shorter-read sequence — either CCS reads from PacBio or data from another sequencing platform. In an example he offered of a rice strain, an attempted genome assembly using just Illumina reads yielded an N50 contig of 16Kb, but adding PacBio long reads to that boosted the N50 contig to 25Kb. Ultimately, Schatz said, he expects that as PacBio's readlength improves, this kind of approach could routinely generate megabase-size contigs or even pull plant chromosomes into single contigs.

For more information on Mike Schatz’s work using SMRT Sequencing, check out this case studydescribing an automated pipeline for genome finishing with PacBio long reads."

source: http://blog.pacificbiosciences.com/2013/02/notes-from-agbt-long-read-sequence-data.html

He includes a snippet of code to answer this question from twitter
'What's the longest single contig from a de Bruijn assembler without PE or a jumping library?'


$ perl -e 'print ">random\n"; @D=split //,"ACGT"; \for (1...100000000){print $D[int(rand(4))];} \print "\n"’ | fold > random.fa$ wgsim –r 0 -e 0 -N 50000000 -1 100 -2 1 \random.fa random.reads.fq /dev/null$ SOAPdenovo-63mer all –s random.cfg -K 63 -o random.63$ getlengths random.63.contig           1 99999990

Datanami, Woe be me