Ah! I didn't know there's Windows 8.1
The ISOs should be helpful if you wish to 'futureproof' your spanking new application in the latest windows or test exisiting apps to see if they might break in the new win8.1
and *cough*usingtheisosasVMsinyourpreferredLinuxenvbutyoukindaneedawindozemachinetodothosetasksthatyoucan'tdoinlinuxcosotherprogrammershaven'theardofbuildingformultiplatformmachines*cough*
well another good reason to use it is that I am pretty sure this ain't happening in Mac or Linux
Microsoft is adding native support for 3D printing as part of the Windows 8.1 update, making it possible to print directly from an app to a 3D printer. The company is announcing the new feature this morning, working with partners including MakerBot Industries, 3D Systems, Afinia, AutoDesk, Netfabb and others.
http://www.geekwire.com/2013/dimension-windows-microsoft-adds-3d-printing-support/
:)
Go http://msdn.microsoft.com/en-us/windows/apps/bg182409 now!
loving the 1.5 Mb/s download here
Showing posts with label review. Show all posts
Showing posts with label review. Show all posts
Friday, 5 July 2013
Wednesday, 6 February 2013
Handling R packages Feb 2013 issue Linux Journal
The kind folks at http://www.linuxjournal.com/ have provided me an 2013 Feb issue. Can't tell you how much of Linux I have picked up from there with its easy prose and graphical howtos. In the Feb 2013 issue, they have focused on the theme sys admin. Definitely useful things inside for the starting bioinformatician who wishes to dabble with working directly off a *nix machine :)
Other topics in this issue includes
You can check out a preview of the contents here
Other topics in this issue includes
In the February 2013 issue:
- Manage Your Virtual Deployment with ConVirt
- Use Fabric for Sysadmin Tasks on Remote Machines
- Spin up Linux VMs on Azure
- Make Your Android Device Play with Your Linux Box
- Create a Colocated Server with Raspberry Pi
You can check out a preview of the contents here
February 2013 Issue of Linux Journal: System Administration
By Shawn Powers | Feb 01, 2013
Sunday, 11 September 2011
Differential expression in RNA-seq: A matter of d... [Genome Res. 2011] - PubMed - NCBI
http://www.ncbi.nlm.nih.gov/pubmed/21903743
Abstract Next Generation Sequencing (NGS) technologies are revolutionizing genome research and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established and additional research is needed for understanding how these data respond to differential expression analysis. In this work we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach -NOISeq-that differs from existing methods in that it is data-adaptive and non-parametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the dataset and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data and the issue of replication.
PMID: 21903743 [PubMed -as supplied by publisher]
Friday, 12 August 2011
is 12 million 90 bp transcriptome reads enough for transcriptome assembly?
Posted a pubmed link recently, the authors "report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly."
Do you think such an assembly truly is useful for research? or would a higher coverage been better?
Do you think such an assembly truly is useful for research? or would a higher coverage been better?
Monday, 8 August 2011
Braintrust: What Neuroscience Tells Us about Morality.
What can science tell us about morality?
While the slippery and subjective nature of morality makes it a troubling specimen, it remains a crucial part of our lives—and therefore a topic ripe for scientific research.
However, scientists are skilled at describing what is—the circumstances under which people are more likely to lie, for instance—which is not the same as describing how we ought to live our lives, like when it’s OK to lie. So it’s not entirely clear what scientists can offer here without overstepping their bounds.
Yet in Braintrust: What Neuroscience Tells Us about Morality, Patricia Churchland carefully leads the reader through scientific findings with implications for morality and ethics, well aware of the pitfalls and rewards she may encounter along the way. Churchland, a professor of philosophy at the University of California, San Diego, quickly informs the reader that science cannot tell us what we ought to do to be moral, but that a review of findings from psychology and biology may explain how or why we do it. Her goal is to draw on these findings to build an objective framework in which to understand human morality.
I think the challenge is to actually link human genomics with neurochemistry. Although, I am not sure if anyone is prepared to face the ramifications of the studies.
Full review article here
While the slippery and subjective nature of morality makes it a troubling specimen, it remains a crucial part of our lives—and therefore a topic ripe for scientific research.
However, scientists are skilled at describing what is—the circumstances under which people are more likely to lie, for instance—which is not the same as describing how we ought to live our lives, like when it’s OK to lie. So it’s not entirely clear what scientists can offer here without overstepping their bounds.
Yet in Braintrust: What Neuroscience Tells Us about Morality, Patricia Churchland carefully leads the reader through scientific findings with implications for morality and ethics, well aware of the pitfalls and rewards she may encounter along the way. Churchland, a professor of philosophy at the University of California, San Diego, quickly informs the reader that science cannot tell us what we ought to do to be moral, but that a review of findings from psychology and biology may explain how or why we do it. Her goal is to draw on these findings to build an objective framework in which to understand human morality.
I think the challenge is to actually link human genomics with neurochemistry. Although, I am not sure if anyone is prepared to face the ramifications of the studies.
Full review article here
Tuesday, 12 July 2011
A 3rd party evaluation of Ion Torrent's 316 chip data
Dan Koboldt (from massgenomics) has posted about what I know to be the 1st independent look at the data from Ion Torrent's 316 chip,
Granted the data was handed to him in a 'shiny report with color images' but he has bravely ignored that to give an honest look at the raw data itself.
The 316 chip gives a throughout that nicely covers WGS reseq experiments for bacterial sized genomes. "The E. coli reference genome totals about 4.69 Mbp. With 175 Mbp of data, the theoretical coverage is around 37.5-fold across the E. coli genome."
For those wary of dry reviews, fear not, easily comprehensible graphs are posted within!
read the full post here
Granted the data was handed to him in a 'shiny report with color images' but he has bravely ignored that to give an honest look at the raw data itself.
The 316 chip gives a throughout that nicely covers WGS reseq experiments for bacterial sized genomes. "The E. coli reference genome totals about 4.69 Mbp. With 175 Mbp of data, the theoretical coverage is around 37.5-fold across the E. coli genome."
For those wary of dry reviews, fear not, easily comprehensible graphs are posted within!
read the full post here
Labels:
bacterial,
Ion Torrent,
Next Generation Sequencing,
PGM,
real-time PCR,
review,
whole genome
Saturday, 30 April 2011
Evaluation of next-generation sequencing software in mapping and assembly.
Evaluation of next-generation sequencing software in mapping and assembly.
J Hum Genet. 2011 Apr 28;
Authors: Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ
Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 28 April 2011; doi:10.1038/jhg.2011.43.
PMID: 21525877 [PubMed - as supplied by publisher]
More...
J Hum Genet. 2011 Apr 28;
Authors: Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ
Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 28 April 2011; doi:10.1038/jhg.2011.43.
PMID: 21525877 [PubMed - as supplied by publisher]
More...
Labels:
journal,
Next Generation Sequencing,
next-generation_sequencing,
NGS,
pubmed,
review,
software
Friday, 3 December 2010
When Playing games is working (if you are biologist that is)
Check out this flash game Phylo
If you are thinking it's related to phylogenetics then Bingo.. Kudos for excellent idea and excellent graphics and interface but wished they had a better name and less verbose introduction for laymen.
waiting eagerly for the iphone/ipod version to be out..
from http://phylo.cs.mcgill.ca/eng/about.html
What's Phylo all about?
Though it may appear to be just a game, Phylo is actually a framework for harnessing the computing power of mankind to solve a common problem; Multiple Sequence Alignments.
What is a Multiple Sequence Alignment? A sequence alignment is a way of arranging the sequences of D.N.A, R.N.A or protein to identify regions of similarity. These similarities may be consequences of functional, structural, or evolutionary relationships between the sequences.
From such an alignment, biologists may infer shared evolutionary origins, identify functionally important sites, and illustrate mutation events. More importantly, biologists can trace the source of certain genetic diseases.
The Problem Traditionally, multiple sequence alignment algorithms use computationally complex heuristics to align the sequences.
Unfortunately, the use of heuristics do not guarantee global optimization as it would be prohibitively computationally expensive to achieve an optimal alignment. This is due in part to the sheer size of the genome, which consists of roughly three billion base pairs, and the increasing computational complexity resulting from each additional sequence in an alignment.
Our Approach Humans have evolved to recognize patterns and solve visual problems efficiently.
By abstracting multiple sequence alignment to manipulating patterns consisting of coloured shapes, we have adapted the problem to benefit from human capabilities.
By taking data which has already been aligned by a heuristic algorithm, we allow the user to optimize where the algorithm may have failed.
The Data All alignments were generously made available through UCSC Genome Browser.
Infact, all alignments contain sections of human DNA which have been speculated to be linked to various genetic disorders, such as breast cancer.
Every alignment is received, analyzed, and stored in a database, where it will eventually be re-introduced back into the global alignment as an optimization.
If you are thinking it's related to phylogenetics then Bingo.. Kudos for excellent idea and excellent graphics and interface but wished they had a better name and less verbose introduction for laymen.
waiting eagerly for the iphone/ipod version to be out..
from http://phylo.cs.mcgill.ca/eng/about.html
What's Phylo all about?
Though it may appear to be just a game, Phylo is actually a framework for harnessing the computing power of mankind to solve a common problem; Multiple Sequence Alignments.
What is a Multiple Sequence Alignment? A sequence alignment is a way of arranging the sequences of D.N.A, R.N.A or protein to identify regions of similarity. These similarities may be consequences of functional, structural, or evolutionary relationships between the sequences.
From such an alignment, biologists may infer shared evolutionary origins, identify functionally important sites, and illustrate mutation events. More importantly, biologists can trace the source of certain genetic diseases.
The Problem Traditionally, multiple sequence alignment algorithms use computationally complex heuristics to align the sequences.
Unfortunately, the use of heuristics do not guarantee global optimization as it would be prohibitively computationally expensive to achieve an optimal alignment. This is due in part to the sheer size of the genome, which consists of roughly three billion base pairs, and the increasing computational complexity resulting from each additional sequence in an alignment.
Our Approach Humans have evolved to recognize patterns and solve visual problems efficiently.
By abstracting multiple sequence alignment to manipulating patterns consisting of coloured shapes, we have adapted the problem to benefit from human capabilities.
By taking data which has already been aligned by a heuristic algorithm, we allow the user to optimize where the algorithm may have failed.
The Data All alignments were generously made available through UCSC Genome Browser.
Infact, all alignments contain sections of human DNA which have been speculated to be linked to various genetic disorders, such as breast cancer.
Every alignment is received, analyzed, and stored in a database, where it will eventually be re-introduced back into the global alignment as an optimization.
Tuesday, 30 November 2010
Why can't Bioscope / mapreads write to bam natively?
Spotted this small fact in Bioscope 1.3.1 release notes.
There is significant disk space required for converting ma to BAM
when the option output.filter=none is used, which roughly needs
2TB peak disk space for converting a 500 million reads ma file.
Other options do not need such large peak disk space. The disk
space required per node is smaller if more jobs are dispatched to
more nodes.
I would love to see the calculation on how they arrived at the figure of 2 TB. I am glad that they moved to bam in bioscope workflow but I am not entirely sure what's the reason for keeping the .ma file format when only they are the ones using it.
There is significant disk space required for converting ma to BAM
when the option output.filter=none is used, which roughly needs
2TB peak disk space for converting a 500 million reads ma file.
Other options do not need such large peak disk space. The disk
space required per node is smaller if more jobs are dispatched to
more nodes.
I would love to see the calculation on how they arrived at the figure of 2 TB. I am glad that they moved to bam in bioscope workflow but I am not entirely sure what's the reason for keeping the .ma file format when only they are the ones using it.
Monday, 8 November 2010
Trimming adaptor seq in colorspace (SOLiD)
Needed to do research on small RNA seq using SOLiD.
Wasn't clear of the adaptor trimming procedure (its dead easy in basespace fastq files but oh well, SOLiD has directionality and read lengths dont' really matter for small RNA)
novoalign suggests the use of cutadapt as a colorspace adaptor trimming tool
was going to script one in python if it didn't exist
Check their wiki page
Sadly on CentOS I most probably will get this
Wasn't clear of the adaptor trimming procedure (its dead easy in basespace fastq files but oh well, SOLiD has directionality and read lengths dont' really matter for small RNA)
novoalign suggests the use of cutadapt as a colorspace adaptor trimming tool
was going to script one in python if it didn't exist
Check their wiki page
Sadly on CentOS I most probably will get this
If you get this error:
File "./cutadapt", line 62 print("# There are %7d sequences in this data set." % stats.n, file=outfile) ^ SyntaxError: invalid syntax
Then your Python is too old. At least Python 2.6 is needed for cutadapt.
have to dig up how to have two versions of Python on a CentOS box..
Labels:
adaptor,
colorspace,
Next Generation Sequencing,
review,
software,
SOLiD,
trimming
Wednesday, 27 October 2010
Tophat adds support for strand-specific RNA-Seq alignment and colorspace
Hooray!
testing Tophat 1.1.2 now
1.1.1
on a 8 Gb Ram CentOS box managed to align 1 million reads to hg18 in 33 mins and 2 million reads in 59 mins. using 4 threads
Nice scalability! But it was slower than I was used to for bowtie. I kept killing my full set of 90 million reads thinking there's something wrong. Guess I need to be more patient and wait for 45 hours.
I do wonder if the process can be mapped to separate nodes to speed up.
testing Tophat 1.1.2 now
1.1.1
on a 8 Gb Ram CentOS box managed to align 1 million reads to hg18 in 33 mins and 2 million reads in 59 mins. using 4 threads
Nice scalability! But it was slower than I was used to for bowtie. I kept killing my full set of 90 million reads thinking there's something wrong. Guess I need to be more patient and wait for 45 hours.
I do wonder if the process can be mapped to separate nodes to speed up.
Monday, 6 September 2010
Evaluation of next generation sequencing platforms for population targeted sequencing studies
I came across this paper earlier but didn't have time to blog much about it.
Papers that compare the sequencing platforms are getting rarer as the hype for NGS dies down and people are more interested in the next next gen seq machines (usually termed single molecule seq )
targetted reseq is a popular use of NGS as prices for human whole genome reseq is still not within reach for most. (see Exome sequencing: the sweet spot before whole genomes. )
There are inherent biases that people should be aware of before they jump right into it.
1)The NGS technologies generate a large amount of sequence but, for the platforms that produce short-sequence reads, greater than half of this sequence is not usable.
Admittedly, the numbers have changed for this now that Illumina has longer read lengths. (the paper tested 36 bp vs 35 bp )
2) For PCR-based targetted sequencing, they observed that the mapped sequences corresponding to the 50 bp at the ends and the overlapping intervals of the amplicons have extremely high coverage.
I am not sure if this has changed since.
Note: Will update thoughts when i have more time.
Other Interesting papers
WGS vs exome seq
Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations.
Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia.
Exome sequencing: the sweet spot before whole genomes.
Whole human exome capture for high-throughput sequencing.
Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.
Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing.
Family-based analysis and exome seq
Molecular basis of a linkage peak: exome sequencing and family-based analysis identify a rare genetic variant in the ADIPOQ gene in the IRAS Family Study.
Papers that compare the sequencing platforms are getting rarer as the hype for NGS dies down and people are more interested in the next next gen seq machines (usually termed single molecule seq )
targetted reseq is a popular use of NGS as prices for human whole genome reseq is still not within reach for most. (see Exome sequencing: the sweet spot before whole genomes. )
There are inherent biases that people should be aware of before they jump right into it.
1)The NGS technologies generate a large amount of sequence but, for the platforms that produce short-sequence reads, greater than half of this sequence is not usable.
- On average, 55% of the Illumina GA reads pass quality filters, of which approximately 77% align to the reference sequence
- ABI SOLiD, approximately 35% of the reads pass quality filters, and subsequently 96% of the filtered reads align to the reference sequenc
- n contrast to the platforms generating short-read lengths, approximately 95% of the Roche 454 reads uniquely align to the target sequence.
Admittedly, the numbers have changed for this now that Illumina has longer read lengths. (the paper tested 36 bp vs 35 bp )
2) For PCR-based targetted sequencing, they observed that the mapped sequences corresponding to the 50 bp at the ends and the overlapping intervals of the amplicons have extremely high coverage.
- These regions, representing about 2.3% (approximately 6 kb) of the targeted intervals, account for up to 56% of the sequenced base pairs for Illumina GA technology.
- For the ABI SOLiD platform an amplicon end depletion protocol was employed to remove the overrepresented amplicon ends; this was partially successful and resulted in the ends accounting for up to 11% of the sequenced base pairs.
- For the Roche 454 technology, overrepresentation of amplicon ends versus internal bases is substantially less, with the ends composing only 5% of the total sequenced bases; this is likely due to library preparation process differences between Roche 454 and the short-read length platforms.
I am not sure if this has changed since.
Note: Will update thoughts when i have more time.
Other Interesting papers
WGS vs exome seq
Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations.
Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia.
Exome sequencing: the sweet spot before whole genomes.
Whole human exome capture for high-throughput sequencing.
Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.
Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing.
Family-based analysis and exome seq
Molecular basis of a linkage peak: exome sequencing and family-based analysis identify a rare genetic variant in the ADIPOQ gene in the IRAS Family Study.
Wednesday, 18 August 2010
Playing with NFS & GlusterFS on Amazon cc1.4xlarge EC2 instance types
I wished I had time to do stuff like what they do at bioteam.
Benchmarking the Amazon cc1.4xlarge EC2 instance.
These are the questions they aimed to answer
We are asking very broad questions and testing assumptions along the lines of:
Benchmarking the Amazon cc1.4xlarge EC2 instance.
These are the questions they aimed to answer
We are asking very broad questions and testing assumptions along the lines of:
- Does the hot new 10 Gigabit non-blocking networking fabric backing up the new instance types really mean that “legacy” compute farm and HPC cluster architectures which make heavy use of network filesharing possible?
- How does filesharing between nodes look and feel on the new network and instance types?
- Are the speedy ephemeral disks on the new instance types suitable for bundling into NFS shares or aggregating into parallel or clustered distribtued filesystems?
- Can we use the replication features in GlusterFS to mitigate some of the risks of using ephemeral disk for storage?
- Should the shared storage built from ephermeral disk be assigned to “/scratch” or other non-critical duties due to the risks involved? What can we do to mitigate the risks?
- At what scale is NFS the easiest and most suitable sharing option? What are the best NFS server and client tuning parameters to use?
- When using parallel or cluster filesystems like GlusterFS, what rough metrics can we use to figure out how many data servers to dedicate to a particular cluster size or workflow profile?
Tuesday, 10 August 2010
PyroNoise:Accurate determination of microbial diversity from 454 pyrosequencing data
Using 454 to do microbial ecology / metagenomics of environmental / soil samples?
Then I think you should take a look at this paper.
Quince, C., Lanzén, A., Curtis, T., Davenport, R., Hall, N., Head, I., Read, L., & Sloan, W. (2009). Accurate determination of microbial diversity from 454 pyrosequencing data Nature Methods, 6 (9), 639-641 DOI: 10.1038/nmeth.1361
The Pathogens blog has a good summary post on it.
Then I think you should take a look at this paper.
Quince, C., Lanzén, A., Curtis, T., Davenport, R., Hall, N., Head, I., Read, L., & Sloan, W. (2009). Accurate determination of microbial diversity from 454 pyrosequencing data Nature Methods, 6 (9), 639-641 DOI: 10.1038/nmeth.1361
The Pathogens blog has a good summary post on it.
Wednesday, 14 July 2010
the nuts and bolts behind ABI's SAET
I really do not like to use tools that I have no idea what they are trying to do.
ABI's SAET SOLiD™ Accuracy Enhancer Tool (SAET) is a one example that had extremely brief documentation except what it promised to do
- The SOLiD™ Accuracy Enhancer Tool (SAET) uses raw data generated by SOLiD™ Analyzer to correct miscalls within reads prior to mapping or contig assembly.
- Use of SAET, on various datasets of whole or sub-genomes of < 200 Mbp in size and of varying complexities, readlengths, and sequence coverages, has demonstrated improvements in mapping, SNP calling, and de novo assembly results.
- For denovo applications, the tool reduces miscall rate substantially
Recently attended an ABI's talk and finally someone explained it in a nice diagram. It is akin to Softgenetic's condensation tool.( I made the link ). Basically, it groups reads by similarity and where they find a mismatch that is not supported by high quality reads they correct the low quality read to reach a 'consensus'. I see it as a batch correction of sequencing errors which one can typically do by eye (for small regions). This correction isn't without its flaws. I now understand why such an error correction isn't implemented on the instrument. And is presented as a user choice. My rough experience with this tool is that it increases mapping by ~ 10% how this 10% would affect your results is debatable.
Wednesday, 26 May 2010
A scientific spectator's guide to next-generation sequencing
ROFL
I love the title!
My fave parts of the review
I love the title!
A scientific spectator's guide to next-generation sequencing
Dr Keith not only looks at next gen sequencing but also the emerging technologies of single molecule sequencing. Interesting read!
My fave parts of the review
"Finally, there is the cost per base, generally expressed in a cost per human genome sequenced at approximately 40X coverage. To show one example of how these trade off, the new PacBio machine has a great cost per sample (~U$100) and per run (you can run just one sample) but a poor cost per human genome – you’d need around 12,000 of those runs to sequence a human genome (~U$120K). In contrast, one can buy a human genome on the open market for U$50K and sub U$10K genomes will probably be generally available this year."
"Length is critical to genome sequencing and RNA-seq experiments, but really short reads in huge numbers are what counts for DGE/SAGE and many of the functional tag sequencing methods. Technologies with really long reads tend not to give as many, and with all of them you can always choose a much shorter run to enable the machine to be turned over to another job sooner – if your application doesn’t need long reads."
Wednesday, 19 May 2010
What do you use for citation / bibliography / reference in writing?
Am looking at
http://www.zotero.org/
also exploring
http://www.wizfolio.com/
Found this on the web as well
http://www.easybib.com/
While I like it that zotero is well integrated with my browser and has openoffice plugins. But keeping a backup of the references and keeping it synced is a problem. I would much rather have my references on the cloud. which makes for easier sharing. Suggestions anyone?
Not Endnote please.. I seldom work on windows machines.
http://www.zotero.org/
also exploring
http://www.wizfolio.com/
Found this on the web as well
http://www.easybib.com/
While I like it that zotero is well integrated with my browser and has openoffice plugins. But keeping a backup of the references and keeping it synced is a problem. I would much rather have my references on the cloud. which makes for easier sharing. Suggestions anyone?
Not Endnote please.. I seldom work on windows machines.
Tuesday, 18 May 2010
Book review:Programming Collective Intelligence
Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran
Permalink: http://amzn.com/0596529325
I have always wanted to explore classification methods and their theory to see how I can apply these to bioinformatics. But so far I have yet to encounter a book or website that explains the topic well with examples that you can do. It's a bonus that the examples are written in Python a language I know and has highly readable code for those that do not know.
Although the examples are not from biology but it is easy to see how some classical biological problems can be solved by SVM.
p.s. This amazon associates widget is cool! it will throw up relevant books based on my words in the blog post!
Friday, 14 May 2010
Lincoln Stein makes his case for moving genome informatics to the Cloud
Matthew Dublin summarizes Lincoln's paper in Making the Case for Cloud Computing & Genomics in genomeweb
excerpt "....
Stein walks the reader through an nice explanation of what exactly cloud computing is, the benefits of using a compute solution that grows and shrinks as needed, and makes an attempt at tackling the question of the cloud's economic viability when compared to purchasing and managing local compute resources.
The take away is that Moore's Law and its effect on sequencing technology will soon force researchers to analyze their mountains of sequencing data in a paradigm where the software comes to the data rather than the current, and opposite, approach. Stein says that this means now more than ever, cloud computing is a viable and attractive option..... "
Yet to read it (my weekend bedtime story) will post comments here.
excerpt "....
Stein walks the reader through an nice explanation of what exactly cloud computing is, the benefits of using a compute solution that grows and shrinks as needed, and makes an attempt at tackling the question of the cloud's economic viability when compared to purchasing and managing local compute resources.
The take away is that Moore's Law and its effect on sequencing technology will soon force researchers to analyze their mountains of sequencing data in a paradigm where the software comes to the data rather than the current, and opposite, approach. Stein says that this means now more than ever, cloud computing is a viable and attractive option..... "
Yet to read it (my weekend bedtime story) will post comments here.
Labels:
cloud computing,
genome,
journal,
Next Generation Sequencing,
review
Subscribe to:
Posts (Atom)