Tuesday, 31 July 2012
Picard release 1.74
Picard release 1.74
http://picard.sourceforge.net/
30 July 2012
- Added a new "ProgressLogger" class that facilitates more useful and
standard progress logging for any program that iterates through a stream
of SAMRecords. Adapted most command line programs to use it.
- Add support for targetedPcrMetrics and collected common HsMetrics and
TargetedPcrMetrics behavior into TargetMetricsCollector
- New program CollectTargetedPcrMetrics
- MultiHitAlignedReadIterator.java: Handle case where an alignment
record has no cigar elements that consume both the read and the
reference (e.g. the read is all soft-clipped)
Monday, 30 July 2012
bioawk- AWK for gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with the column names.
Alerted to this on biostars.org
About bioawk
awk -c fastq '{ print $seq }' test.fq
awk -c bed '{ print $end - $start }' test.bed
man ./awk.1
Usage Examples
- Extract unmapped reads without header:
awk -c sam 'and($flag,4)' aln.sam.gz
- Extract mapped reads with header:
awk -c sam -H '!and($flag,4)'
Saturday, 28 July 2012
Speed up Chromium downloading in Ubuntu with PyAxelWS Accelerator
http://ubuntuguide.net/speed-up-chromium-downloading-in-ubuntu-with-pyaxelws-accelerator
The original pyaxel is a CLI-based download accelerator in Python that works seamlessly in networks that are behind proxy servers and for protocols like http and ftp. Features present in pyaxel 0.1 are accelerated downloads, persistent reconnection, resumable downloads, download speed limiting, and download progress indication. Pyaxelws is a clone of pyaxel that introduces new features such as HTML5 Websocket server implementation, a Javascript library that provides an interface to the server, as well as a client application designed for the Chrome web browser.
Reduced Bams, neatest thing out of GATK 2.0
Reducing BAMs to minimize file sizes and improve calling performance
ReduceReads is a novel (perhaps even breakthrough?) GATK2 data compression algorithm. The purpose of ReducedReads is to take a BAM file with NGS data and reduce it down to just the information necessary to make accurate SNP and indel calls, as well as genotype reference sites (hard to achieve) using GATK tools like UnifiedGenotyper or HaplotypeCaller. ReduceReads accepts as an input a BAM file and produces a valid BAM file (it works in IGV!) but with a few extra tags that the GATK can use to make accurate calls.You can find more information about reduced reads in some of our presentations in the archive.
ReduceReads works well for exomes or high-coverage (at least 20x average coverage) whole genome BAM files. In this case we highly recommend using ReduceReads to minimize the file sizes. Note that ReduceReads performs a lossy compression of the sequencing data that works well with the downstream GATK tools, but may not be supported by external tools. Also, we recommend that you archive your original BAM file, or at least a copy of your original FASTQs, as ReduceReads is highly lossy and doesn't quality as an archive data compression format.
Using ReduceReads on your BAM files will cut down the sizes to approximately 1/100 of their original sizes, allowing the GATK to process tens of thousands of samples simultaneously without excessive IO and processing burdens. Even for single samples ReduceReads cuts the memory requirements, IO burden, and CPU costs of downstream tools significantly (10x or more) and so we recommend you preprocess analysis-ready BAM files with ReducedReads.
for each sample sample.reduced.bam <- ReduceReads(sample.bam)
Thursday, 26 July 2012
Interactive Plotting with Manipulate / Advanced Topics / Knowledge Base - RStudio Support
http://support.rstudio.org/help/kb/advanced/interactive-plotting-with-manipulate
Basic Usage
The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, or checkbox) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn.
For example, to create a plot that enables manipulation of a parameter using a slider control you could use syntax like this:
library(manipulate) manipulate(plot(1:x), x = slider(1, 100))
After this code is executed the plot is drawn using an initial value of 1 for x. A manipulator panel is also opened adjacent to the plot which contains a slider control used to change the value of x from 1 to 100.
Slider Control
The slider control enables manipulation of plot variables along a numeric range. For example:
manipulate( plot(cars, xlim=c(0,x.max)), x.max=slider(15,25))
Results in this plot and manipulator:
Slider controls also support custom labels and step increments.
Wednesday, 25 July 2012
GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes.
1. | Bioinformatics. 2012 Jul 23. [Epub ahead of print]GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes.Kamphans T, Krawitz PM.SourceGeneTalk, Finckensteinallee 84, 12205 Berlin, Germany. AbstractSummary Next-generation sequencing (NGS) has become a powerful tool in personalized medicine. Exomes or even whole genomes of patients suffering from rare diseases are screened for sequence variants. After filtering out common polymorphisms, the assessment and interpretation of detected personal variants in the clinical context is an often time consuming effort. We have developed GeneTalk, a web-based platform that serves as an expert exchange network for the assessment of personal and potentially disease relevant sequence variants. GeneTalk assists a clinical geneticist who is searching for information about specific sequence variants and connects this user to other users with expertise for the same sequence variant. AVAILABILITY: GeneTalk is available at www.gene-talk.de. Users can login without registering in a demo account. CONTACT: peter.krawitz@gene-talk.de. |
PMID: 22826540 [PubMed - as supplied by publisher] | |
howto SKAT R library
Tuesday, 24 July 2012
Win an iPad by identifying the casual variant anyone?
You will be provided with an Analysis Case that includes existing Complete Genomics whole human genome sequencing data and phenotypic information. Use Ingenuity Variant Analysis to determine the casual variant. When you decide which variant is most likely causing the symptoms in the Case, submit your answer via the Feedback button within the application. All correct entries will be entered into a drawing for a chance to win an Apple iPad!
http://pages.ingenuity.com/CaseofMonthMay2012_Landingpage2.html
Saturday, 21 July 2012
MolBioLib: A C++11 Framework for Rapid - PubMed Mobile
Abstract MOTIVATION: We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable, and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative SNPs in whole genome sequencing, detect balanced chromosomal rearrangements, and compute enrichment of mRNAs on microtubules, typically requiring applications of under 200 lines of code.
Sequencing the genome of an entire population | ScienceNordic
The FarGen project is preparing to sequence the genetic material of the entire 50,000 population of the Faroe Islands, and could become a model for personalised medicine throughout the world. "We will not only be creating a genetic biobank but a completely new health system," says program director Bogi Eliasen. ScienceNordic
Galaxy July 20, 2012 Distribution & News Brief
Galaxy July 20, 2012 Distribution & News Brief
Complete News Brief
http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_07_20
http://wiki.g2.bx.psu.edu/News/Jul202012%20Distribution%20News%20Brief
-
Freebayes has moved from the Galaxy distribution to the Galaxy's Main Tool Shed
-
EMBOSS version 5.0.0 tool dependencies in the emboss_5 repository of the Galaxy Main Tool Shed updated to include information for automatically installing.
-
Tool Shed now also supports specifying the third party tool dependencies to be automatically installed in new repositories
-
Admin Genome Indexing is now in BETA. Download, index, and track progress right from the admin UI!
-
Improved Error Handling that captures EXIT codes, STDOUT, and STDERR from tools in XML. Be sure to read full details.
-
TopHat2/Bowtie2 latest support includes option to 'report discordant pairs', updated tests, and more preset options.
-
Trackster new parameter space visualization. Includes BRAND NEW Features!! More details coming soon, but give a test drive now.
new: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist upgrade: % hg pull -u -r ec29ce8e27a1
Thanks for using Galaxy!
We hope to see everyone in Chicago @ GCC2012!!
The Galaxy Team
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Crossbow 1.2.0 released
Version 1.2.0 - July 20, 2012
* Added support for Hadoop version 0.20.205.
* Dropped support for Hadoop versions prior to 0.20.
* Updated default Hadoop version for EMR jobs to 0.20.205.
* Updated Bowtie version used to 0.12.8.
* Fixed issues with streaming jar version parsing
* Fixed documentation bugs regarding --sra-toolkit option, which is
superseded by the --fastq-dump option.
http://bowtie-bio.sourceforge.net/crossbow
Thanks,
Ben
------------------------------------------------------------------------------
_______________________________________________
Bowtie-bio-announce mailing list
https://lists.sourceforge.net/lists/listinfo/bowtie-bio-announce
Friday, 20 July 2012
GATK 2.0 On July 23rd, 2012
GATK 2.0
On July 23rd, 2012, the Genome Sequencing and Analysis (GSA) team will release a beta of GATK 2.0. GATK 2.0 includes all of the original GATK 1.x tools as well as many newer and more advanced tools for error modeling, data compression, and variant calling:- Base quality score recalibration (BQSR) v2, an upgrade to BQSR that generates a base substitution, insertion, and deletion error model.
- ReduceReads, a BAM compression algorithm that reduces file sizes by 20x-100x while preserving all information necessary for accurate SNP and indel calling. ReduceReads enables the GATK to call tens of thousands of deeply sequenced NGS samples simultaneously.
- The HaplotypeCaller, a multi-sample local de novo assembly and integrated SNP, indel, and short SV caller.
- Powerful
extensions to the Unified Genotyper to support variant calling of
pooled samples, mitochondrial DNA, and non-diploid organisms.
Additionally, the extended Unified Genotyper introduces a novel error
modeling approach that uses a reference sample to build a site-specific
error model for SNPs and indels that vastly improves calling accuracy.
Mixed open/closed source model
is there an "average" chromosome or a good abridged chromosome?
And i often wondered about the practice of using the 1st 10Mb of Chr1 as a test alignment target for sequencing runs on the SOLiD machine and how informative that might be.
I wonder since someone probably has some sort of summary statistics of individual chromosomes, is there a particular chromosome that's representative of the whole genome or perhaps a chimera that represents the rest of the chromosomes that one might be good to run through various sequencing platforms to validate results or compare sequencing platform error profiles ...
just a random thought
I've always thought chr20 is the most gentlemanly. 1,6,9 : crazy hetreochromatin. 19 - a zoo of zinc fingers. [1/2]
Stay away from the acrocentrics (13,14,15,21,22), and chr17 has >> duplications. X and Y obviously odd.
Myrna 1.2.0 released
Version 1.2.0 - July 19, 2012
* Added support for Hadoop version 0.20.205.
* Dropped support for Hadoop versions prior to 0.20.
* Updated default Hadoop version for EMR jobs to 0.20.205.
* Updated Bowtie version used to 0.12.8.
* Updated R version used to 2.14.2.
* Updated jar files to use Ensembl v67 (used to be v61). In the
process, fixed an issue whereby $MYRNA_HOME/reftools scripts
would die due to unexpected new format of Ensembl database schema.
* Fixed issues with streaming jar version parsing
* Fixed documentation bugs regarding --sra-toolkit option, which is
superseded by the --fastq-dump option.
* Removed some diagnostic counters because Hadoop began to enforce
an upper limit on the number of counters allowed per job. For
instance, per-label summary statistics are no longer printed in
the Normalize step.
Thanks,
Ben
------------------------------------------------------------------------------
_______________________________________________
Bowtie-bio-announce mailing list
https://lists.sourceforge.net/lists/listinfo/bowtie-bio-announce
Thursday, 19 July 2012
Efficiency and power as a function of sequence coverage, SNP array density, and imputation.
- PMID:
- 22807667
- [PubMed - in process]
- PMCID:
- PMC3395607
1. | PLoS Comput Biol. 2012 Jul;8(7):e1002604. Epub 2012 Jul 12.Efficiency and power as a function of sequence coverage, SNP array density, and imputation.Flannick J, Korn JM, Fontanillas P, Grant GB, Banks E, Depristo MA, Altshuler D.SourceBroad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.AbstractHigh coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms ([Formula: see text]), when low coverage sequence reads are added to dense genome-wide SNP arrays - the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. |
PMID: 22807667 [PubMed - in process] | |
GRCh38 in the summer of 2013! Genome Reference Consortium
See our blog for more information on why we think this is important.
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/
FAQs » Mobile Element Insertion Detection
Mobile Element Insertion (MEI) Detection
- Are MEIs detected using the same method as SV detection?
- What is the resolution of insertion site detection? Does Complete Genomics assemble the insertion site?
- What MEI type does Complete Genomics detect?
- How should I filter for high-confidence MEIs?
- Does Complete Genomics identify somatic MEIs?
- Is the zygosity of events reported?
- Can I get access to the reference data used to create the MEI baseline?
[galaxy-user] Galaxy intro webinar
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation.
Slim-Filter: An interactive windows-based application for Illumina Genome Analyzer data assessment and manipulation
G. Golovko1,2, K. Khanipov1, M. Rojas1,2, A. Martinez-Alcántara1, J. J. Howard1, E. Ballesteros1, S. Gupta1, W. Widger1,3, and Y. Fofanov1,2,3
1Center for BioMedical and Environmental Genomics, University of Houston, Houston, TX, USA. Department of Computer Science2 and the Department of Biology and Biochemistry3, University of Houston, Houston, TX, USA 77204
The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities.
We present a windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process.Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats.
Slim-Filter Performance was estimated using following computer configurations:
Windows | Linux | |
OS | Windows 2008 Server SP1 | CentOS 5.6 |
CPU | Dual Quad Core Intel™ Xeon W5590 3.33GHz,8M L3 | AMD Magny Cours 6128 8-Core Processor, 2.0 GHz, 12MB Cache |
RAM | 128GB, DDR3 RDIMM, 1066MHz, ECC | 512 GB DDR3 1333Mhz ECC |
HDD | 2TB SATA 3.0Gb/s, 7200 RPM | 2TB SATA 3.0Gb/s, 7200 RPM |
Number of reads 36 bases long | RAM required to perform computations for Linux (Mb) | RAM required to perform computations Windows (Mb) | Time to apply all possible filter settings in Windows (seconds) | Time to apply all possible filter settings in Linux (seconds) |
10,000 | 10 | 32-40 | <1 | <1 |
100,000 | 25 | 85-100 | 6.5 | 4.5 |
1,000,000 | 300 | 600-800 | 66 | 40 |
10,000,000 | 2,000 | 3,500 | 590 | 442 |
50,000,000 | 13,000 | 45,000 | 3,000 | 2,156 |
Crowd-funded Exome Sequencing for Rare Genetic Diseases
Crowd-funded Exome Sequencing for Rare Genetic Diseases
http://massgenomics.org/2012/07/crowd-funded-exome-sequencing-for-rare-genetic-diseases.html
Crowdfunding and Families with Rare Diseases
That being said, I'm writing about this because it's a good story. Friends, relatives, and total strangers made cash donations, in tough economic times, to help this little girl.
As Daniel MacArthur (@dgmacarthur) put it on Twitter, this story makes me feel good about humanity.
Exome and whole-genome sequencing have enabled the discovery of many causal variants behind rare disorders, but this is the first time it's been accomplished by raising funds on the internet. The "crowd-funding" model, as it's called, may offer some hope to the thousands of families dealing with a rare genetic disorder. The majority of them won't have the opportunity to be studied in a government-funded research. And next-gen sequencing isn't usually covered by health insurance. If they can raise the funds on their own, however, a genetic diagnosis may be possible.
It will not come easy. The analysis and interpretation of sequence data requires considerable time and expertise. And as I've recently written,exome sequencing does not guarantee an answer even for Mendelian diseases. Even so, discoveries are possible. That likely provides a glimmer of hope for those with rare genetic disorders.
Wednesday, 18 July 2012
we all have these days ... identify-indel-regions.pl - popoolation2 - Allows comparision of allele frequencies between two ore more populations - Google Project Hosting
my $nucs="";
while(@ar)
{
my $cov=shift @ar;
my $n=shift @ar;
my $q=shift @ar;
die "mpileup fucked" unless(defined($q));
$nucs.=$n;
}
#
lol chanced on this trying to find out why I got signal 139 on mpileup segfaulted on me ...
Principal Components Analysis Using R - P1 - YouTube
http://www.youtube.com/watch?v=5zk93CpKYhg&feature=related
Part 1 - This video tutorial guides the user through a manual
principal components analysis of some simple data. The goal is to
acquaint the viewer with the underlying concepts and terminology
associated with the PCA process. This will be helpful when the user
employs one of the "canned" R procedures to do PCA (e.g. princomp,
prcomp), which requires some knowledge of concepts such as loadings
and scores. You may download the R code used in this tutorial from
http://www.bimcore.emory.edu/BB_phys_stats_ex1.R
Tuesday, 17 July 2012
SNAP Sequence Aligner
http://snap.cs.berkeley.edu/
Sunday, 15 July 2012
Udacity - 21st Century University
Hmm seem to be on a roll of OOT blog posts today .. chanced on this link for the perennial FAQ of where do I learn python/R
Udacity is a totally new kind of learning experience. You learn by solving challenging problems and pursuing udacious projects with world-renowned university instructors (not by watching long, boring lectures). At Udacity, we put you, the student, at the center of the universe. Keep Reading
Psychopathy Prediction Based on Twitter Usage - Kaggle
would be cool if it was a model based on this AND sequence data ...
https://www.kaggle.com/c/twitter-psychopathy-prediction
The aim of the competition is to determine to what degree it's possible to predict people with a sufficiently high degree of Psychopathy based on Twitter usage and Linguistic Inquiry.
The organizers provide all interested participants an anonymised dataset of users self assessed psychopathy scores together with 337 variables derived from functions of Twitter information, useage and lingusitc analysis. Psychopathy scores are based on a checklist developed by Professor Del Paulhus at the University of British Columbia.
The model should aim to identify people scoring high in Psychopathy, for the purpose of this competition, defined as 2 SD's above a mean of 1.98. This accounts for roughly 3% of the entire sample and therefore the challenge with this dataset is developing a model to work with a highly imbalanced dataset.
The best performing model(s) will be formally cited in a future paper/papers. The authors of the winning model may also be invited to attend future conferences to discuss their model.
Saturday, 14 July 2012
Fwd: An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.
1. | Science. 2012 Jul 6;337(6090):100-4. Epub 2012 May 17.An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D, Warren L, Aponte J, Zawistowski M, Liu X, Zhang H, Zhang Y, Li J, Li Y, Li L, Woollard P, Topp S, Hall MD, Nangle K, Wang J, Abecasis G, Cardon LR, Zöllner S, Whittaker JC, Chissoe SL, Novembre J, Mooser V.SourceDepartment of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA. Comment in
AbstractRare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk. |
PMID: 22604722 [PubMed - in process] | |
|
Friday, 13 July 2012
Archon Genomics X PRIZE presented by Express Scripts, an incentivized prize competition that will award $10 million
Grant R. Campany | Senior Director & Prize Lead Archon Genomics X PRIZE
Thursday, 12 July 2012
SQLite Python tutorial
To work with this tutorial, we must have Python language, SQLite database, pysqlite language binding and the sqlite3 command line tool installed on the system. If we have Python 2.5+ then we only need to install the sqlite3 command line tool. Both the SQLite library and the pysqlite language binding are built into the Python languge.
http://zetcode.com/db/sqlitepythontutorial/
Monday, 9 July 2012
Cufflinks 2.0.2 released
2.0.2 release - 7/8/2012
This release fixes several bugs:
Some users were experience a crash on exit in Cufflinks when run with bias correction. The source of the crash has been fixed.
A few minor fixes in the estimation routines for cross-replicate variability.
Providing the same BAM file multiple times was producing inconsistent expression values. This has been corrected.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bowtie-bio-announce mailing list
https://lists.sourceforge.net/lists/listinfo/bowtie-bio-announce
Saturday, 7 July 2012
Installing breakdancer.sourceforge.net
Dependencies for breakdancer-1.1_2011_02_21
sudo perl -MCPAN -e 'install Statistics::Descriptive'
sudo perl -MCPAN -e 'install Math::CDF'
sudo perl -MCPAN -e 'install GD::Graph::histogram'
#if you haven't done it already, install samtools and add the binary to your path bam2cfg.pl needs to call it.
Writing and Optimizing Parallel Programs — A complete example | Future Chips
I am using the histogram kernel because it is very simple and clearly demonstrates some very important concepts in parallel programming: thread spawning, critical sections, atomic operations, barriers, false sharing, and thread join. Here is our problem statement:
Problem: Count the number of times each ASCII character occurs on a page of text.
Input: ASCII text stored as an array of characters.
Output: A histogram with 128 buckets –one for each ascii character– where each entry stores the number of occurrences of the corresponding ascii character on the page.
http://www.futurechips.org/tips-for-power-coders/writing-optimizing-parallel-programs-complete.html
OTT: Stephen Wolfram Blog : A Moment for Particle Physics: The End of a 40-Year Story?
http://blog.stephenwolfram.com/2012/07/a-moment-for-particle-physics-the-end-of-a-40-year-story/
Friday, 6 July 2012
OmegaPlus: A Scalable Tool for Rapid Detection of Selective Sweeps in Whole-Genome Datasets.
1. | Bioinformatics. 2012 Jul 3. [Epub ahead of print]OmegaPlus: A Scalable Tool for Rapid Detection of Selective Sweeps in Whole-Genome Datasets.Alachiotis N, Stamatakis A, Pavlidis P.SourceThe Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies. AbstractMOTIVATION: Recent advances in sequencing technologies have led to the rapid accumulation of molecular sequence data. Analyzing whole-genome data (as obtained from next-generation sequencers) from intra-species samples allows to detect signatures of positive selection along the genome and therefore identify potentially advantageous genes in the course of the evolution of a population.We introduce OmegaPlus, an open-source tool for rapid detection of selective sweeps in whole-genome data based on linkage dis-equilibrium. The tool is up to two orders of magnitude faster than existing programs for this purpose and also exhibits up to two orders of magnitude smaller memory requirements. AVAILABILITY: OmegaPlus is available under GNU GPL at http://www.exelixis-lab.org/software.html. CONTACT: pavlos.pavlidis@h-its.org SUPPLEMENTARY INFORMATION: Available at Bioinformatics online. |
PMID: 22760304 [PubMed - as supplied by publisher] | |
http://bioinformatics.oxfordjournals.org/content/early/2012/07/03/bioinformatics.bts419.long |
JoVE: Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
JoVE: Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
Cite this Article
Abstract
Wednesday, 4 July 2012
STAT-Seq: Rapid WGS on the HiSeq 2500 - Implications for a Neonatal Intensive Care Unit recorded webinar
|