And finally, simple instructions for setting up password-free SSH login http://linuxproblem.org/art_9.html
Thursday, 31 May 2012
Getting Genetics Done: Monday Links: 23andMe, RStudio, PacBio+Galaxy, Data Science One-Liners, Post-Linkage RFA, SSH
And finally, simple instructions for setting up password-free SSH login http://linuxproblem.org/art_9.html
Wednesday, 30 May 2012
Sichuan Agricultural University and LC Sciences Uncover the Epigenetics of Obesity - Houston Chronicle
Sichuan Agricultural University and LC Sciences Uncover the Epigenetics of Obesity
PRWeb
Published 09:06 a.m., Tuesday, May 29, 2012
Press Release
In a new study published online in Nature Communications, researchers from Sichuan Agricultural University and LC Sciences report the miRNAome in porcine adipose and muscle tissues. The report provides a valuable epigenomic source for obesity prediction and prevention and furthers the development of pig as a model organism for human obesity research.
Hangzhou, China (PRWEB) May 29, 2012
In a new study published online in Nature Communications, researchers from Sichuan Agricultural University and LC Sciences report the miRNAome in porcine adipose and muscle tissues. The report provides a valuable epigenomic source for obesity prediction and prevention and furthers the development of pig as a model organism for human obesity research[1].
Scientists now know that the genetic code alone isn't responsible for adult phenotype or even the offspring of these adults. Epigenetics refers to changes in gene expression affecting phenotype that don't involve changes to the DNA nucleotide sequence itself, and yet are heritable. DNA methylation, histone modification and microRNA (miRNA) expression are examples of epigenetic mechanisms that have recently been identified as important regulators of gene expression in many biological systems.
Obesity is a huge problem worldwide. Recently, the World Health Organization reported that obesity levels doubled in every region of the world between 1980 and 2008, spurring rates of non-communicable diseases such as diabetes and cancer that now account for almost two out of three deaths globally. It has become evident that epigenetic factors, such as DNA methylation and miRNA expression, have essential roles in obesity development.
Now, a team led by Researchers at the Institute of Animal Genetics and Breeding, Sichuan Agricultural University, China has used a pig model to investigate the systematic association between epigenetic regulators and obesity. Pigs are an excellent model system to study obesity due to their similar physiology to ours including: metabolic features, cardiovascular systems, and proportional organ sizes. The researchers generated a genome-wide DNA methylation map as well as miRNA expression and gene expression maps for adipose and muscle tissues from three pig breeds living within comparable environments but displaying distinct fat levels.
Genome-wide identification and expression analysis of heat-responsive and novel microRNAs in Populus tomentosa.
(Display the citation in PubMed)
1. | Gene. 2012 May 24. [Epub ahead of print]Genome-wide identification and expression analysis of heat-responsive and novel microRNAs in Populus tomentosa.Chen L, Ren Y, Zhang Y, Xu J, Sun F, Zhang Z, Wang Y.AbstractPlant microRNAs have a vital role in various abiotic stress responses by regulating gene expression. Heat stress is one of the most severe abiotic stresses, and affects plant growth and development, even leading to death. To identify heat-responsive miRNAs at the genome-wide level in Populus, Solexa sequencing was employed to sequence two libraries from Populus tomentosa, treated and untreated by heat stress. Sequence analysis identified 134 conserved miRNAs belonging to 30 miRNA families, and 16 novel miRNAs belonging to 14 families. Among these miRNAs, 52 miRNAs from 15 families were responsive to heat stress and most of them were down-regulated. qRT-PCR analysis confirmed that the conserved and novel miRNAs were expressed in P. tomentosa, and revealed similar expression trends to the Solexa sequencing results obtained under heat stress. One hundred and nine targets of the novel miRNAs were predicted. This study opens up a new avenue for understanding the regulatory mechanisms of miRNAs involvement in the heat stress response of trees.Copyright © 2012 Elsevier B.V. All rights reserved. |
PMID: 22634103 [PubMed - as supplied by publisher] | |
|
Tuesday, 29 May 2012
TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage.
1. | Nucleic Acids Res. 2012 May 22. [Epub ahead of print]TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage.Brandt BW, Bonder MJ, Huse SM, Zaura E.SourceDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands and Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA, USA. AbstractAmplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ∼100-450 nt (depending on the technology), as compared to the full rRNA gene of ∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at http://www.ibi.vu.nl/programs/taxmanwww/. |
PMID: 22618877 [PubMed - as supplied by publisher] | |
|
VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era.
1. | Nucleic Acids Res. 2012 May 22. [Epub ahead of print]VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era.Cheng YC, Hsiao FC, Yeh EC, Lin WJ, Tang CY, Tseng HC, Wu HT, Liu CK, Chen CC, Chen YT, Yao A.SourceNational Center for Genome Medicine and Institute of Biomedical Sciences, Academia Sinica, Taiwan 11529, R.O.C.AbstractVarioWatch (http://genepipe.ncgm.sinica.edu.tw/variowatch/) has been vastly improved since its former publication GenoWatch in the 2008 Web Server Issue. It is now at least 10 000-times faster in annotating a variant. Drastic speed increase, through complete re-design of its working mechanism, makes VarioWatch capable of annotating millions of human genomic variants generated from next generation sequencing in minutes, if not seconds. While using MegaQuery of VarioWatch to quickly annotate variants, users can apply various filters to retrieve a subgroup of variants according to the risk levels, interested regions, etc. that satisfy users' requirements. In addition to performance leap, many new features have also been added, such as annotation on novel variants, functional analyses on splice sites and in/dels, detailed variant information in tabulated form, plus a risk level decision tree regarding the analyzed variant. Up to 1000 target variants can be visualized with our carefully designed Genome View, Gene View, Transcript View and Variation View. Two commonly used reference versions, NCBI build 36.3 and NCBI build 37.2, are supported. VarioWatch is unique in its ability to annotate comprehensively and efficiently millions of variants online, immediately delivering the results in real time, plus visualizes up to 1000 annotated variants. |
PMID: 22618869 [PubMed - as supplied by publisher] | |
|
Haploinsufficiency of CELF4 at 18q12.2 is associated with developmental and behavioral disorders, seizures, eye manifestations, and obesity.
(Display the citation in PubMed)
1. | Eur J Hum Genet. 2012 May 23. doi: 10.1038/ejhg.2012.92. [Epub ahead of print]Haploinsufficiency of CELF4 at 18q12.2 is associated with developmental and behavioral disorders, seizures, eye manifestations, and obesity.Halgren C, Bache I, Bak M, Myatt MW, Anderson CM, Brøndum-Nielsen K, Tommerup N.SourceDepartment of Cellular and Molecular Medicine, Wilhelm Johannsen Centre for Functional Genome Research, University of Copenhagen, Faculty of Health Sciences, Copenhagen, Denmark.AbstractOnly 20 patients with deletions of 18q12.2 have been reported in the literature and the associated phenotype includes borderline intellectual disability, behavioral problems, seizures, obesity, and eye manifestations. Here, we report a male patient with a de novo translocation involving chromosomes 12 and 18, with borderline IQ, developmental and behavioral disorders, myopia, obesity, and febrile seizures in childhood. We characterized the rearrangement with Affymetrix SNP 6.0 Array analysis and next-generation mate pair sequencing and found truncation of CELF4 at 18q12.2. This second report of a patient with a neurodevelopmental phenotype and a translocation involving CELF4 supports that CELF4 is responsible for the phenotype associated with deletion of 18q12.2. Our study illustrates the utility of high-resolution genome-wide techniques in identifying neurodevelopmental and neurobehavioral genes, and it adds to the growing evidence, including a transgenic mouse model, that CELF4 is important for human brain development.European Journal of Human Genetics advance online publication, 23 May 2012; doi:10.1038/ejhg.2012.92. |
PMID: 22617346 [PubMed - as supplied by publisher] | |
|
pypeFLOW is light weight and reusable make / flow data process library written in Python.
https://github.com/cschin/pypeFLOW
What is pypeFLOW
pypeFLOW is light weight and reusable make / flow data process library written in Python.
Most of bioinformatics analysis or general data analysis includes various steps combining data files, transforming files between different formats and calculating statistics with a variety of tools. Ian Holmes has a great summary and opinions about bioinformatics workflow at http://biowiki.org/BioinformaticsWorkflows. It is interesting that such analysis workflow is really similar to constructing software without an IDE in general. Using a "makefile" file for managing bioinformatics analysis workflow is actually great for generating reproducible and reusable analysis procedure. Combining with a proper version control tool, one will be able to manage to work with a divergent set of data and tools over a period of time for a project especially when there are complicate dependence between the data, tools and customized code for the analysis tasks.
However, using "make" and "makefile" implies all data analysis steps are done by some command line tools. If you have some customized analysis tasks, you will have to write some scripts and to make them into command line tools. In my personal experience, I find it is convenient to bypass such burden and to combine those quick and simple steps in a single scripts. The only caveat is that if an analyst does not save the results of any intermediate steps, he or she has to repeat the computation all over again for every steps from the beginning. This will waste a lot of computation cycles and personal time. Well, the solution is simple, just like the traditional software building process, one have to track the dependencies and analyze them and only reprocess those parts that are necessary to get the most up-to-date final results.
How Not To Be A Bioinformatician Source Code for Biology and Medicine 2012, 7:3 doi:10.1186/1751-0473-7-3
LMAO
"Be unreachable and isolated. Configure your contact email to either bounce back or
was this even neccessary to be in the paper?
BPS: Men with brown eyes are perceived as more dominant, but it's not because their eyes are brown
Sixty-two student participants, half of them female, rated the dominance and/or attractiveness of the photographed faces of forty men and forty women. All models were Caucasian, and all of them were holding a neutral expression. Men with brown eyes were rated consistently as more dominant than blue-eyed men. No such effect of eye-colour was found for the photos of women. Eye colour also bore no association to the attractiveness ratings.
Next the researchers used Photoshop to give the brown-eyed men blue eyes and the blue-eyed men brown eyes. The photos were then rated by a new batch of participants. The intriguing finding here was that the dominance ratings were left largely unaffected by the eye colour manipulation. The men who really had brown eyes, but thanks to Photoshop appeared with blue eyes, still tended to be rated as more dominant.
Monday, 28 May 2012
a visual dictionary of R Graphs with code and thumbnails!
SCORE-Seq: Score-Type Tests for Detecting Disease Associations With Rare Variants in Sequencing Studies
PROGRAM: seqtk for sampling, trimming, fastq2fasta, subsequence, reverse complement and more
Seqtk supports both fasta and fastq input files, which can be optionally gzip compressed. Each module is perhaps the most efficient among tools of the same functionality. For example, I know fasta-to-fastq is 10X faster than another converter, while being more flexible.
Seqtk is implemented in a single .c file and two header files and only depends on zlib. The source code is freely available here (MIT license):
https://github.com/lh3/seqtk
Heng
Sunday, 27 May 2012
The Three Sexy Skills of Data Geeks « Dataspora
"The sexy job in the next ten years will be statisticians… The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that's going to be a hugely important skill."
http://www.dataspora.com/2009/05/sexy-data-geeks/
Fwd: A new approach for detecting low-level mutations in next-generation sequence data.
1. | Genome Biol. 2012 May 23;13(5):R34. [Epub ahead of print]A new approach for detecting low-level mutations in next-generation sequence data.Li M, Stoneking M.AbstractABSTRACT: We propose a new method that incorporates population re-sequencing data, distribution of reads, and strand bias in detecting low-level mutations. The method can accurately identify low-level mutations down to a level of 2.3%, with an average coverage of 500x, and with a false discovery rate of less than 1%. In addition, we also discuss other problems in detecting low-level mutations, including chimeric reads and sample cross-contamination, and provide possible solutions to them. |
PMID: 22621726 [PubMed - as supplied by publisher] | |
|
[Velvet-users] Velvet 1.2.06 no need for interleaving of paired end reads
Date: 24 May 2012 10:18
Subject: [Velvet-users] Velvet 1.2.06
Dear Velvet users,
Torsten Seeman and David Powell from Monash University have been
cleaning up Velvet code, available as usual on github or
www.ebi.ac.uk/~zerbino/velvet/velvet_latest.tgz
They cleaned up the parsing code, added some unit tests, but especially
added a feature which many people have clamored for a long time: the
interleaving of paired-end files is no longer necessary. By default,
Velvet's behavior stays the same but with the '-separate' flag, you can
now provide pairs of files, as in:
velveth Assem 31 -shortPaired -fasta -separate left.fa right.fa
Many thanks to Torsten and David for their work,
Best regards,
Daniel
_______________________________________________
Velvet-users mailing list
http://listserver.ebi.ac.uk/mailman/listinfo/velvet-users
Saturday, 26 May 2012
What can long reads tell us about centromere evolution?
Thursday, 24 May 2012
Have you heard of ReadCube? PDF / journal organizer
looks interesting .. though I use foxit for pdf annotation
http://www.readcube.com/
- Let ReadCube organize your article collection
- Import article PDFs from your computer.
- Your articles immediately become full-text searchable so you can find what you want.
- ReadCube will automatically identify the author, title, and journal citation information of every article.
Differential confounding of rare and common variants in spatially structured populations.
1. | Nat Genet. 2012 Feb 5;44(3):243-6. doi: 10.1038/ng.1074.Differential confounding of rare and common variants in spatially structured populations.Mathieson I, McVean G.SourceWellcome Trust Centre for Human Genetics, University of Oxford, UK. mathii@well.ox.ac.uk AbstractWell-powered genome-wide association studies, now made possible through advances in technology and large-scale collaborative projects, promise to characterize the contribution of rare variants to complex traits and disease. However, while population structure is a known confounder of association studies, it remains unknown whether methods developed to control stratification are equally effective for rare variants. Here, we demonstrate that rare variants can show a stratification that is systematically different from, and typically stronger than, common variants, and this is not necessarily corrected by existing methods. We show that the same process leads to inflation for load-based tests and can obscure signals at truly associated variants. Furthermore, we show that populations can display spatial structure in rare variants, even when Wright's fixation index F(ST) is low, but that allele frequency-dependent metrics of allele sharing can reveal localized stratification. These results underscore the importance of collecting and integrating spatial information in the genetic analysis of complex traits. |
PMID: 22306651 [PubMed - indexed for MEDLINE] | |
|
Wednesday, 23 May 2012
An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People.
Sent on: Tue May 22 12:27:53 2012
1 selected item: 22604722
PubMed Results |
1. | Science. 2012 May 17. [Epub ahead of print]An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People.Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D, Warren L, Aponte J, Zawistowski M, Liu X, Zhang H, Zhang Y, Li J, Li Y, Li L, Woollard P, Topp S, Hall MD, Nangle K, Wang J, Abecasis G, Cardon LR, Zöllner S, Whittaker JC, Chissoe SL, Novembre J, Mooser V.SourceQuantitative Sciences, GlaxoSmithKline, RTP, NC, USA; Upper Merion, PA, USA; and Stevenage, UK.AbstractRare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall, we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk. |
PMID: 22604722 [PubMed - as supplied by publisher] | |
|
Cyber-T web server: differential analysis of high-throughput data.
Cyber-T web server: differential analysis of high-throughput data.
Source
Abstract
Tuesday, 22 May 2012
annoyance at 'case-insensitiveness' of Mac terminal shell
BUT all this is gone in MacOS :/
data visualisation: Getting matplotlib on MacOS X
But omg I thought that installing stuff on Ubuntu was troublesome, Macs actually ups the level of troublesome one notch up
Gonna leave it undone .. but in chronological order of discovery
python eggs = FAILED (not sure why the script insists that it doesn't have write permissions to create files
next up was trying this helpful post
Installing matplotlib in Lion
http://the.taoofmac.com/space/blog/2011/07/24/2222
oh okay I need homebrew
http://mxcl.github.com/homebrew/
https://github.com/mxcl/homebrew/wiki/installation
(oh wow I didn't know ruby is installed by default)
but hit another snag as per below ..
continue another day ..
Press enter to continue
==> /usr/bin/sudo /bin/chmod g+rwx /usr/local/. /usr/local/bin /usr/local/lib
==> /usr/bin/sudo /usr/bin/chgrp admin /usr/local/. /usr/local/bin /usr/local/lib
==> Downloading and Installing Homebrew...
==> Installation successful!
You should run `brew doctor' *before* you install anything.
Now type: brew help
------------------------------------------------------------------------------------------------------------------------------------ 03:07:56
k@k:~$ brew doctor
Error: You have no /usr/bin/cc.
This means you probably can't build *anything*. You need to install the Command
Line Tools for Xcode. You can either download this from http://connect.apple.com
or install them from inside Xcode's Download preferences. Homebrew does not
require all of Xcode! You only need the Command Line Tools package!
Error: Git could not be found in your PATH.
Homebrew uses Git for several internal functions, and some formulae use Git
checkouts instead of stable tarballs. You may want to install Git:
brew install git
Error: Your compilers are different from the standard versions for your Xcode.
If you have Xcode 4.3 or newer, you should install the Command Line Tools for
Xcode from within Xcode's Download preferences.
Otherwise, you should reinstall Xcode.
Error: Your Xcode is configured with an invalid path.
You should change it to the correct path. Please note that there is no correct
path at this time if you have *only* installed the Command Line Tools for Xcode.
If your Xcode is pre-4.3 or you installed the whole of Xcode 4.3 then one of
these is (probably) what you want:
sudo xcode-select -switch /Developer
sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer
DO NOT SET / OR EVERYTHING BREAKS!
Saturday, 19 May 2012
[Denovoassembler-users] Ray v2.0.0-rc7 is available online !
From: SĂ©bastien Boisvert
Date: Thu, May 17, 2012 at 11:02 PM
Subject: [Denovoassembler-users] Ray v2.0.0-rc7 is available online !
Hello !
I am proud to announce the immediate availability of the Ray assembler
version 2.0.0 release candidate 7, code name "Dark Astrocyte of Knowledge".
This version ships with RayPlatform v1.0.2, code name "Timely Gate of
Yields".
Link for download: http://denovoassembler.
Changes in Ray
* The CMakeList file was updated.
* GC content for contigs are dumped in XML files.
* New option -one-color-per-file for graph coloring.
* Optimized file system input/output operations.
* Network testing is more verbose.
* Fixed an integer overflow bug in the scaffolder.
* New guide in Documentation/ for software message routing.
* Fixed an integer overflow bug in the profiler.
* Fixed a synchronization bug in the coloring algorithm.
* Increased the sensitivity of the biological profiling algorithms.
* Disabled the plugin for neighbourhoods.
* New plugin to compute gene ontology profiles.
* Added various missing code headers.
* Simplified the plugin creation process.
* Fixed some divisions per 0.
* Fixed a synchronization bug for gene ontology.
* Added simple profile files for sequence abundance, taxonomy profiles
and gene ontology profiles.
* A bug that caused k-mers with >= 65536 coverage to have less coverage
was fixed.
---> This was a long-standing bug that caused some issues.
* Added some datatypes.
Changes in RayPlatform
* Command line arguments can be obtained.
* Simplified the plugin creation process.
* Fixed two divisions per 0.
* Added some datatypes.
seb
______________________________
Denovoassembler-users mailing list
https://lists.sourceforge.net/
Friday, 18 May 2012
[BioRuby] New biogems for IonTorrent, pileup files, pfam and hmmer
> Hi guys,
>
> Here's some blatant advertising for some code I've recently written in
> biogem form.
>
> bio-gag: "gag error" is the term I've coined to describe an error that
> various people have observed on certain sequencing kits with IonTorrent,
> though it has not previously been characterised very well that I know of
> (we noticed that the errors seemed to occur at GAG positions in the reads
> that were supposed to be GAAG). This biogem tries to find and fix these
> errors. It isn't benchmarked for accuracy but worked well enough for my
> lab's own purposes. Actually to be honest we've only used an older version
> of the software on real data and the logic has a little since given some
> recent evidence we have, but I thought I'd push it out with the latest and
> greatest error model.
> https://github.com/wwood/bioruby-gag
>
> bio-pileup_iterator: To find gag errors bio-gag iterates through pileup
> files looking for particular patterns e.g. strand bias of insertions. This
> gem can be used to iterate through pileup files one position (one line) at
> a time, building up the sequence of each read as it goes, recording their
> direction etc. Probably not the fastest piece of code in the world, sorry.
> I'm not sure whether this should/can be incorporated into bio-samtools? It
> adds functionality - there's no duplication (I don't think).
> https://github.com/wwood/bioruby-pileup_iterator
>
> bio-hmmer_model: This is a parser of HMM files e.g. from PFAM according to
> the hmmer v3 manual.
> https://github.com/wwood/bioruby-hmmer_model
>
> bio-hmmer3_report: Parsing of HMMER3 result files. Currently only handles
> tabular format files - the guts of this were written by Christian - see
> yesterday's thread for details. I'm hoping to add regular (non-tabular)
> format parsing in the near future, but no promises.
> https://github.com/wwood/bioruby-hmmer3_report
>
> I'm sure there is bugs and deficiencies - apologies in advance.
>
> Enjoy,
> ben
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> http://lists.open-bio.org/mailman/listinfo/bioruby
Francesco
BioRuby Project - http://www.bioruby.org/
BioRuby mailing list
BioRuby@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioruby
Thursday, 17 May 2012
Ion Torrent complains to Nature Biotech about bias in Loman paper - SEQanswers
Nick Loman's original paper comparing desktop sequencers (which we've discussed here before)...has been subject of controversy. LifeTech apparently didn't like how the study was presented, and wrote a letter to Nature about the study.
Other takes on it:
- GenomeWeb article
- flxlex's blog
PLoS: Gene Mapping via Bulked Segregant RNA-Seq (BSR-Seq)
Read the open-access, full-text article here:
http://dx.plos.org/10.1371/journal.pone.0036406
===================================================
Gene Mapping via Bulked Segregant RNA-Seq (BSR-Seq)
Abstract:
Bulked segregant analysis (BSA) is an efficient method to rapidly and efficiently map genes responsible for mutant phenotypes. BSA requires access to quantitative genetic markers that are polymorphic in the mapping population. We have developed a modification of BSA (BSR-Seq) that makes use of RNA-Seq reads to efficiently map genes even in populations for which no polymorphic markers have been previously identified. Because of the digital nature of next-generation sequencing (NGS) data, it is possible to conduct de novo SNP discovery and quantitatively genotype BSA samples by analyzing the same RNA-Seq data using an empirical Bayesian approach. In addition, analysis of the RNA-Seq data provides information on the effects of the mutant on global patterns of gene expression at no extra cost. In combination these results greatly simplify gene cloning experiments. To demonstrate the utility of this strategy BSR-Seq was used to clone the glossy3 (gl3) gene of maize. Mutants of the glossy loci exhibit altered accumulation of epicuticular waxes on juvenile leaves. By subjecting the reference allele of gl3 to BSR-Seq, we were able to map the gl3 locus to an ~2 Mb interval. The single gene located in the ~2 Mb mapping interval whose expression was down-regulated in the mutant pool was subsequently demonstrated to be the gl3 gene via the analysis of multiple independent transposon induced mutant alleles. The gl3 gene encodes a putative myb transcription factor, which directly or indirectly affects the expression of a number of genes involved in the biosynthesis of very-long-chain fatty acids.
Copy number variation detection and genotyping from exome sequence data.
Copy number variation detection and genotyping from exome sequence data.
Source
University of Washington;
Abstract
While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and non-uniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r2 = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.
- PMID:
- 22585873
- [PubMed - as supplied by publisher]
Tackling formalin-fixed, paraffin-embedded tumor tissue with next-generation sequencing.
Tackling formalin-fixed, paraffin-embedded tumor tissue with next-generation sequencing.
Source
Departments of Pathology and Molecular and Medical Genetics, and Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon.
Abstract
Most tumor samples available for clinical genotyping are formalin-fixed and paraffin-embedded (FFPE), but there has been relatively little published on the suitability of such samples for next-generation sequencing approaches. A new study by Wagle and colleagues shows that a combination of hybridization-capture and deep sequencing yields high-quality data from FFPE specimens. Cancer Discovery; 2(1); 23-4. ©2012 AACR.
- PMID:
- 22585165
- [PubMed - in process]