Whole-genome sequencing of 644 elephant tissue samples using the HiSeq 2500 System identified multiple copies of TP53. Compared to human cells, elephant cells demonstrated increased apoptotic response following DNA damage, which could account for the low incidence of cancer (4.81%) in elephant populations.
Related Links
What elephants can teach scientists about fighting cancer in humans http://www.latimes.com/science/sciencenow/la-sci-sn-elephant-cancer-story-20151007-story.html
How elephants avoid cancer http://www.nature.com/news/how-elephants-avoid-cancer-1.18534
Potential Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in Humans Journal of the American Medical Association, DOI: 10.1001/jama.2015.13134
http://jama.jamanetwork.com/article.aspx?articleid=2456041
Showing posts with label Illumina. Show all posts
Showing posts with label Illumina. Show all posts
Monday, 14 March 2016
Wednesday, 24 July 2013
Illumina produces 3k of 8500 bp reads on HiSeq using Moleculo Technology
Keith blogged about how super long read sequencing methods would be a threat to Illumina in Jan 2013. Today, Illumina can now openly acknowledge the shortcomings of their short reads for various applications like
the reason?
This latest set of data released on BaseSpace
image source: http://blog.basespace.illumina.com/2013/07/22/first-data-set-from-fasttrack-long-reads-early-access-service/
with the integration of Moleculo they have managed to generate ~30 gb of raw sequence data. They have refrained from talking about 'key analysis metrics' that's available in the pdf report. Perhaps it's much easier to let the blogosphere and data scientists dissect the new data themselves.
Am wondering when the 454 versus Illumina Long Reads side-by-side comparison will pop up
so please update me if you see it otherwise I just have to run something on it
These are the files that I have now
total 512M
259M Jul 18 01:01 mol-32-2832.fastq.gz
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_281c.pdf
149K Jul 24 2013 mol-32-281c-scaffolds.txt
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_2832.pdf
151K Jul 24 2013 mol-32-2832-scaffolds.txt
253M Jul 24 2013 mol-32-281c.fastq.gz
md5sums
6845fc3a4da9f93efc3a52f288e2d7a0 FastTrackLongReads_dmelanogaster_281c.pdf
02f5de4f7e15bbcd96ada6e78f659fdb FastTrackLongReads_dmelanogaster_2832.pdf
586599bb7fca3c20ba82a82921e8ba3f mol-32-281c-scaffolds.txt
b25010e9e5e13dc7befc43b5dff8c3d6 mol-32-281c.fastq.gz
6822cfbd3eb2a535a38a5022c1d3c336 mol-32-2832-scaffolds.txt
873f09080cdf59ed37b3676cddcbe26f mol-32-2832.fastq.gz
I have ran FastQC (FastQC v0.10.1) on both samples the images below are from 281c.
you can download the full HTML report here
https://www.dropbox.com/sh/5unu3zba9u21ywj/JT4HdkzfOP/mol-32-281c_fastqc.zip
https://www.dropbox.com/s/mpxa5wx51iqmiz3/mol-32-2832_fastqc.zip
Reading about the Moleculo sample prep method, it seems like it's just a rather ingenious way to stitch short reads which are barcoded to form a single long contig. if that is the case, then I am not sure if the base quality scores here are meaningful anymore since it's a mini-assembly. Also this takes out any quantitative value of the number of reads I presume. So accurate quantification of long RNA molecules or splice variants isn't possible. Nevertheless it's an interesting development on the Illumina platform. Looking forward to seeing more news about it.
Moleculo technology: synthetic long reads for genome phasing, de novo sequencing
CoreGenomics: Genome partitioning: my moleculo-esque idea
Moleculo and Haplotype Phasing - The Next Generation TechnologistNext Generation Technologist
Abstract: Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome (Plant and Animal Genome XXI Conference)
http://www.moleculo.com/ (no info on this page though)
Illumina Announces Phasing Analysis Service for Human Whole-Genome Sequencing - MarketWatch
https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20130079231.pdf
- assembly of complex genomes (polyploid, containing excessive long repeat regions, etc.),
- accurate transcript assembly,
- metagenomics of complex communities,
- and phasing of long haplotype blocks.
the reason?
This latest set of data released on BaseSpace
![]() |
Read length distribution of synthetic long reads for a D. melanogaster library |
The data set, available as a single project in BaseSpace, can be accessed here.
image source: http://blog.basespace.illumina.com/2013/07/22/first-data-set-from-fasttrack-long-reads-early-access-service/
with the integration of Moleculo they have managed to generate ~30 gb of raw sequence data. They have refrained from talking about 'key analysis metrics' that's available in the pdf report. Perhaps it's much easier to let the blogosphere and data scientists dissect the new data themselves.
Am wondering when the 454 versus Illumina Long Reads side-by-side comparison will pop up
UPDATE:
Can't find the 'key analysis metrics' in the pdf report files. Perhaps it's still being uploaded? *shrugs*so please update me if you see it otherwise I just have to run something on it
These are the files that I have now
total 512M
259M Jul 18 01:01 mol-32-2832.fastq.gz
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_281c.pdf
149K Jul 24 2013 mol-32-281c-scaffolds.txt
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_2832.pdf
151K Jul 24 2013 mol-32-2832-scaffolds.txt
253M Jul 24 2013 mol-32-281c.fastq.gz
md5sums
6845fc3a4da9f93efc3a52f288e2d7a0 FastTrackLongReads_dmelanogaster_281c.pdf
02f5de4f7e15bbcd96ada6e78f659fdb FastTrackLongReads_dmelanogaster_2832.pdf
586599bb7fca3c20ba82a82921e8ba3f mol-32-281c-scaffolds.txt
b25010e9e5e13dc7befc43b5dff8c3d6 mol-32-281c.fastq.gz
6822cfbd3eb2a535a38a5022c1d3c336 mol-32-2832-scaffolds.txt
873f09080cdf59ed37b3676cddcbe26f mol-32-2832.fastq.gz
I have ran FastQC (FastQC v0.10.1) on both samples the images below are from 281c.
you can download the full HTML report here
https://www.dropbox.com/sh/5unu3zba9u21ywj/JT4HdkzfOP/mol-32-281c_fastqc.zip
https://www.dropbox.com/s/mpxa5wx51iqmiz3/mol-32-2832_fastqc.zip
Reading about the Moleculo sample prep method, it seems like it's just a rather ingenious way to stitch short reads which are barcoded to form a single long contig. if that is the case, then I am not sure if the base quality scores here are meaningful anymore since it's a mini-assembly. Also this takes out any quantitative value of the number of reads I presume. So accurate quantification of long RNA molecules or splice variants isn't possible. Nevertheless it's an interesting development on the Illumina platform. Looking forward to seeing more news about it.
Other links
Illumina Long-Read Sequencing ServiceMoleculo technology: synthetic long reads for genome phasing, de novo sequencing
CoreGenomics: Genome partitioning: my moleculo-esque idea
Moleculo and Haplotype Phasing - The Next Generation TechnologistNext Generation Technologist
Abstract: Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome (Plant and Animal Genome XXI Conference)
http://www.moleculo.com/ (no info on this page though)
Illumina Announces Phasing Analysis Service for Human Whole-Genome Sequencing - MarketWatch
Illumina Announces Moleculo Long Read Technology and Phasing As Service
First publication using the Long Read Seq (LRseq) The genome sequence of the colonial chordate, Botryllus schlosseri | eLife Contains a diagram explaining the LRSeq protocol. This experiment yielded ~1000 6.3kb fragments
Patent information on the Long Read technologyFirst publication using the Long Read Seq (LRseq) The genome sequence of the colonial chordate, Botryllus schlosseri | eLife Contains a diagram explaining the LRSeq protocol. This experiment yielded ~1000 6.3kb fragments
https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20130079231.pdf
Labels:
454,
assembly,
de novo,
Illumina,
Illumina long read,
long reads,
LRSeq,
metagenomics,
Moleculo,
Moleculo Long Read,
NGS,
phasing,
sequencing
Tuesday, 6 December 2011
Complete Khoisan and Bantu genomes from southern Africa : Article : Nature
http://www.nature.com/nature/journal/v463/n7283/full/nature08795.html
Just attended a very good lecture by Stephan Schuster, entitled "African Genomes: Charting Human Diversity"
Just attended a very good lecture by Stephan Schuster, entitled "African Genomes: Charting Human Diversity"
He offered unbiased views / charts on the platform differences between 454, GAIIx, HiSeq, SOLiD for NGS sequencing coverage (which I think I should not repeat here). It points to the need to do sequencing on 2 different platforms to get a more accurate SNP list.
He also gave compelling reasons for getting a 20x coverage of Human genome done in 454 to complete the human genome (457 gaps in hg19).
Yes, I often forget that the media / lay person thinks that the human genome is 'complete'. It's a often ignored 'secret' that actually it isn't. Maybe the next marketing ploy(i mean strategy) for emerging sequencing platforms would be to be THE ONE that actually finishes the human genome.
Friday, 4 March 2011
Guide/tutorial for the analysis of RNA-seq data
link in seqanswers
Excellent starting point for those confused about the RNA-seq data analysis procedure.
Hello,
I've written a guide to the analysis of RNA-seq data, for the purpose of differential expression analysis. It currently lives on our internal wiki that can't be viewed outside of our division, although printouts have been used at workshops. It is by no means perfect and very much a work in progress, but a number of people have found it helpful, so I thought it would useful to have it somewhere more publicly accessible.
I've attached a pdf version of the guide, although really what I was hoping was that someone here could suggest somewhere where it could be publicly hosted as a wiki. This area is so multifaceted and fast-moving that the only way such a guide can remain useful is if it can be constantly extended and updated.
If anyone has any suggestions about potential hosting, they can contact me at myoung @wehi.edu.au
Cheers
Matt
Update: I've put a few extra things on our local Wiki and seeing as people here seem to be finding this useful I thought I'd post an updated version. I'm also an author on a review paper on Differential Expression using RNA-seq which people who find the guide useful, might also find relevant...
RNA-seq Review
Excellent starting point for those confused about the RNA-seq data analysis procedure.
Hello,
I've written a guide to the analysis of RNA-seq data, for the purpose of differential expression analysis. It currently lives on our internal wiki that can't be viewed outside of our division, although printouts have been used at workshops. It is by no means perfect and very much a work in progress, but a number of people have found it helpful, so I thought it would useful to have it somewhere more publicly accessible.
I've attached a pdf version of the guide, although really what I was hoping was that someone here could suggest somewhere where it could be publicly hosted as a wiki. This area is so multifaceted and fast-moving that the only way such a guide can remain useful is if it can be constantly extended and updated.
If anyone has any suggestions about potential hosting, they can contact me at myoung @wehi.edu.au
Cheers
Matt
Update: I've put a few extra things on our local Wiki and seeing as people here seem to be finding this useful I thought I'd post an updated version. I'm also an author on a review paper on Differential Expression using RNA-seq which people who find the guide useful, might also find relevant...
RNA-seq Review
Saturday, 15 January 2011
DNAvision offers Human WGS for 7,500 euros
I had imagined exome sequencing would still have a good run for the next 2-3 years but seeing how commercial service providers are throwing caution to the wind and offering Whole genome sequencing at ever decreasing costs, I think many will soon revert to WGS instead. Exome sequencing kits will have a lot to catch up in terms of price and useful data if they are to match up with quickly plummeting WGS prices.
DNAVision to Offer $10K Human Genome Sequencing Services; Purchases Four SOLiDs
Landed: First Illumina HiSeq Machines Advertised (By Nick Loman on February 10, 2010)
DNAVision to Offer $10K Human Genome Sequencing Services; Purchases Four SOLiDs
Landed: First Illumina HiSeq Machines Advertised (By Nick Loman on February 10, 2010)
Tuesday, 12 October 2010
Human Whole genome sequencing at 11x coverage
http://genomebiology.com/2010/11/9/R91
Just saw this paper Sequencing and analysis of an Irish human genome. AFAIK WGS is usually done at 30x coverage. In this paper, the authors “describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information.” I thought it was pretty good considering that they had 99.3% of the reference genome covered for 10.6x coverage. That leaves only like 21 Mbases missing ..
For those interested in the tech details
Four single-end and five paired-end DNA libraries were generated and sequenced using a GAII Illumina Genome Analyzer. The read lengths of the single-end libraries were 36, 42, 45 and 100 bp and those of the paired end were 36, 40, 76, and 80 bp, with the span sizes of the paired-end libraries ranging from 300 to 550 bp (± 35 bp). In total, 32.9 gigabases of sequence were generated (Table 1). Ninety-one percent of the reads mapped to a unique position in the reference genome (build 36.1) and in total 99.3% of the bases in the reference genome were covered by at least one read, resulting in an average 10.6-fold coverage of the genome.
...
At 11-fold genome coverage, approximately 99.3% of the reference genome was covered and more than 3 million SNPs were detected, of which 13% were novel and may include specific markers of Irish ancestry.
Just saw this paper Sequencing and analysis of an Irish human genome. AFAIK WGS is usually done at 30x coverage. In this paper, the authors “describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information.” I thought it was pretty good considering that they had 99.3% of the reference genome covered for 10.6x coverage. That leaves only like 21 Mbases missing ..
For those interested in the tech details
Four single-end and five paired-end DNA libraries were generated and sequenced using a GAII Illumina Genome Analyzer. The read lengths of the single-end libraries were 36, 42, 45 and 100 bp and those of the paired end were 36, 40, 76, and 80 bp, with the span sizes of the paired-end libraries ranging from 300 to 550 bp (± 35 bp). In total, 32.9 gigabases of sequence were generated (Table 1). Ninety-one percent of the reads mapped to a unique position in the reference genome (build 36.1) and in total 99.3% of the bases in the reference genome were covered by at least one read, resulting in an average 10.6-fold coverage of the genome.
...
At 11-fold genome coverage, approximately 99.3% of the reference genome was covered and more than 3 million SNPs were detected, of which 13% were novel and may include specific markers of Irish ancestry.
Labels:
Illumina,
journal,
news,
Next Generation Sequencing,
resequencing,
Solexa,
whole genome
SRMA: tool for Improved variant discovery through local re-alignment of short-read next-generation sequencing data
Have a look at this tool http://genomebiology.com/2010/11/10/R99/abstract
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)
Excerpted
it is a realigner for NGS reads, that doesn't use a lot of ram. Not too sure how it compares to GATK's Local realignment around indels as it is not mentioned. but the authors used reads that were aligned with the popular BWA or BFAST as input. (Bowtie was left out though.)
Excerpted
SRMA was able to improve the ultimate variant calling using a variety of measures on the simulated data from two different popular aligners (BWA and BFAST. These aligners were selected based on their sensitivity to insertions and deletions (BFAST and BWA), since a property of SRMA is that it produces a better consensus around indel positions. The initial alignments from BFAST allow local SRMA re-alignment using the original color sequence and qualities to be assessed as BFAST retains this color space information. This further reduces the bias towards calling the reference allele at SNP positions in ABI SOLiD data, and reduces the false discovery rate of new variants. Thus, local re-alignment is a powerful approach to improving genomic sequencing with next generation sequencing technologies. The alignments to the reference genome were implicitly split into 1Mb regions and processed in parallel on a large computer cluster; the re-alignments from each region were then merged in a hierarchical fashion. This allows for the utilization of multi-core computers, with one re-alignment per core, as well as parallelization across a computer cluster or a cloud. The average peak memory utilization per process was 876Mb (on a single-core), with a maximum peak memory utilization of 1.25GB. On average, each 1Mb region required approximately 2.58 minutes to complete, requiring approximately 86.17 hours total running time for the whole U87MG genome. SRMA also supports re- alignment within user-specified regions for efficiency, so that only regions of interest need to be re-aligned. This is particularly useful for exome-sequencing or targeted re-sequencing data.
Saturday, 5 June 2010
Illumina Cuts Price of Personal Genome Sequencing Service by At Least 60 Percent
For individuals, the new price will be $19,500, while groups of five or more participants using the same ordering physician will pay $14,500 per person. In addition, individuals with serious medical conditions for whom whole-genome sequencing could be of clinical value will pay $9,500 to have their genome sequenced. Read full article
Sunday, 30 May 2010
Cofactor genomics on the different NGS platforms
Original post here
They are a commercial company that offers NGS on ABI and Illumina platforms and since this is on their company page I guess its their official stand on what rocks on each platform
Excerpted.
They are a commercial company that offers NGS on ABI and Illumina platforms and since this is on their company page I guess its their official stand on what rocks on each platform
Excerpted.
Applied Biosystems SOLiD 3
The Applied Biosystems SOLiD 3 has the shortest but also the highest quantity of reads. The SOLiD produces up to 240 million 50bp reads per slide per end. As with the Illumina, Mate-Pairs produce double the output by duplicating the read length on each end, and the SOLiD supports a variety of insert lengths like the 454. The SOLiD can also run 2 slides at once to again double the output. SOLiD has the lowest *raw* base qualities but the highest processed base qualities when using a reference due to its 2-base encoding. Because of the number of reads and more advanced library types, we recommend the SOLiD for all RNA and bisulfite sequencing projects.Solexa/Illumina
The Solexa/Illumina generates shorter reads at 36-75bp but produces up to 160 million reads per run. All reads are of similar length. The Illumina has the highest *raw* quality scores and its errors are mostly base substitutions. Paired-end reads with ~200 bp inserts are possible with high efficiency and double the output of the machine by duplicating the read length on each end. Paired-end Illumina reads are suitable for de novo assemblies, especially in combination with 454. The large number of reads makes the Illumina appropriate for de novo transcriptome studies with simultaneous discovery and quantification of RNAs at qRT-PCR accuracy.Roche/454 FLX
The Roche/454 FLX with Titanium chemistry generates the longest reads (350-500bp) and the most contiguous assemblies, can phase SNPs or other features into blocks, and has the shortest run times. However, 454 also produces the fewest total reads (~1 million) at the highest cost per base. Read lengths are variable. Errors occur mostly at the ends of long same-nucleotide stretches. Libraries can be constructed with many insert sizes (8kb - 20kb) but at half of the read length for each end and with low efficiency.
Labels:
454,
ABI,
comparison,
Illumina,
Next Generation Sequencing,
pyrosequencing,
Solexa,
SOLiD
Friday, 28 May 2010
Illumina: an alternative to 454 in metagenomics?
Check out this BMC Bioinformatics paper entitled "Short clones or long clones? A simulation study on the use of paired reads in metagenomics"
"This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs."
"Our study suggests that a higher percentage of Illumina paired reads than of Roche-454 single reads are correctly assigned to species."
"The gain of long-clone data (75 bp paired reads) over long single-read data (250 bp reads) is still significant at ≈ 4% (not shown)."
of course more importantly
"The authors declare that they have no competing interests."
I am not sure if such a program exists but I wonder if there is a aligner that takes into account the size between mate pairs and paired ends. Theoratically it should improve mapping. but by how much is unknown
"This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs."
"Our study suggests that a higher percentage of Illumina paired reads than of Roche-454 single reads are correctly assigned to species."
"The gain of long-clone data (75 bp paired reads) over long single-read data (250 bp reads) is still significant at ≈ 4% (not shown)."
of course more importantly
"The authors declare that they have no competing interests."
I am not sure if such a program exists but I wonder if there is a aligner that takes into account the size between mate pairs and paired ends. Theoratically it should improve mapping. but by how much is unknown
Wednesday, 26 May 2010
A scientific spectator's guide to next-generation sequencing
ROFL
I love the title!
My fave parts of the review
I love the title!
A scientific spectator's guide to next-generation sequencing
Dr Keith not only looks at next gen sequencing but also the emerging technologies of single molecule sequencing. Interesting read!
My fave parts of the review
"Finally, there is the cost per base, generally expressed in a cost per human genome sequenced at approximately 40X coverage. To show one example of how these trade off, the new PacBio machine has a great cost per sample (~U$100) and per run (you can run just one sample) but a poor cost per human genome – you’d need around 12,000 of those runs to sequence a human genome (~U$120K). In contrast, one can buy a human genome on the open market for U$50K and sub U$10K genomes will probably be generally available this year."
"Length is critical to genome sequencing and RNA-seq experiments, but really short reads in huge numbers are what counts for DGE/SAGE and many of the functional tag sequencing methods. Technologies with really long reads tend not to give as many, and with all of them you can always choose a much shorter run to enable the machine to be turned over to another job sooner – if your application doesn’t need long reads."
Wednesday, 7 April 2010
Comparing NGS platforms, 454, Solexa, SOLiD
Inspired by Albert's work at http://ngsbuzz.blogspot.com/
Please post discrepancies or views in comments
Please post discrepancies or views in comments
Friday, 26 February 2010
Illumina assembly using velvet
UC Davis has a wiki on this.
Covers Single End and Paired End
Covers Single End and Paired End
Labels:
assembly,
de novo,
Illumina,
Next Generation Sequencing,
Solexa
Subscribe to:
Posts (Atom)