Friday, 30 November 2012

GalaxyUpdates/2012_12 - Galaxy Wiki

Enis Afgan, Brad Chapman and James Taylor, "CloudMan as a platform for tool, data, and analysis distribution." BMC Bioinformatics 2012, 13:315

Jeremy Goecks, Nate Coraor, The Galaxy Team, Anton Nekrutenko & James Taylor, "NGS analyses by visualization with Trackster." Nature Biotechnology 30, 1036–1039 (2012)

Samantha Baldwin, Roopashree Revanna, Susan Thomson, et al., "A Toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.)," BMC Genomics, Vol. 13, No. 1. (2012), 637

Jeremy C. Morgan, Robert W. Chapman, Paul E. Anderson, "A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Bo Liu, Borja Sotomayor, Ravi Madduri, Kyle Chard, "Deploying Bioinformatics Workflows on Clouds with Galaxy and Globus Provision." Third International Workshop on Data Intensive Computing in the Clouds (DataCloud 2012)
These papers were among 37 papers added to the Galaxy CiteULike group since the last Galaxy Update.

Article: Gattaca Alert? Or Should We Welcome the New Age of Eugenics?

Sent via Flipboard

Bleh eugenics. I think Nature has its own way of bringing balance back. Have seen a lot of doomsday predictions of how a species has so little living members that it will breed itself into extinction very soon hence it doesn't represent a viable population anymore. Hence that argument against eugenics (creating a homogenous population that isn't resilient  to change ) seems void.

On the flipside I think attempts at making faster stronger smarter and more good looking humans is also doomed to failure. At least what I observe is that people who are doing well seem to prefer to have less kids.

if anything u would have thought that thousands of years of "selective breeding "would have brought us closer to being perfect as a species.

Random late night thoughts

Thursday, 29 November 2012

Article: Dell releases powerful, well-supported Linux Ultrabook

I want this over a Macbook!

In our recent ZaReason UltraLap 430 review, Ars alum Ryan Paul lamented that even though putting Linux on laptops is easier today than ever, it's still not perfect. Some things (particularly components like trackpads and Wi-Fi chips) take some fiddling to get working. Major OEMs aren't yet puttin...

Sent via Flipboard

Article: The Dyslexia Candidate Locus on 2p12 Is Associated with General Cognitive Ability and White Matter Structure

Open Access

Research Article

Thomas S. Scerri1¤, Fahimeh Darki2, Dianne F. Newbury1, Andrew J. O. Whitehouse3, Myriam Peyrard-Janvid4, Hans Matsson4, Qi W. Ang5, Craig E. Pennell5, Susan Ring6, John Stein7, Andrew P. Morris1, Anthony P. Monaco1, Juha Kere4,8,9, Joel B. Talcott10, Torkel Kling...

Sent via Flipboard

Article: Hack could let browsers use cloud to carry out big attacks on the cheap

Scientists have devised a browser-based exploit that allows them to carry out large-scale computations on cloud-based services for free, a hack they warn could be used to wage powerful online attacks cheaply and anonymously.

The method, described in a research paper scheduled to be presented at...

Sent via Flipboard

Wednesday, 28 November 2012

Detecting Rare Variant Effects Using Extreme... [Genet Epidemiol. 2012] - PubMed - NCBI

 2012 Nov 26. doi: 10.1002/gepi.21699. [Epub ahead of print]

Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies.


Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts.


In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. Although application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on the optimal Sequence Kernel Association Test that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.

Sent from my phone

Tuesday, 27 November 2012

I want to seq every martian for $1000

This is pretty funny poke at the Ion Torrent (&Proton) vs MiSeq debate!

Other memorable quotes

"Now you're telling me Martians with long stretches of repeats can't be sequenced?"

I wonder if there will be a video retort from the other camp.

Monday, 26 November 2012

One for all, all for one « Wellcome Trust Sanger Institute Blog

did you know ? 

One key reason for this discrepancy is that HIV-1 is one of the most genetically diverse viruses known. The HIV-1 diversity within just one infected person at any one time is as great as the diversity of influenza viruses worldwide in an entire year. For example, there are as many as four genetic groups of HIV-1, nine subtypes and 55 circulating recombinant forms, or forms that have swapped their genetic material. This extensive genetic diversity has limited our ability to rapidly and cost-effectively sequence HIV-1 genomes from different populations and geographical regions.

I also didn't know that plus the diversity of human influenza viruses is numbered ~90,000. I must admit that I have had always assumed that microbial genomes are easy to work with but the diversity within a species itself sounds like it might be a mind boggling task! 

New Galaxy CloudMan Release

From: Enis Afgan
Date: 26 November 2012 11:16
Subject: [galaxy-user] New Galaxy CloudMan Release
To: Galaxy-user <galaxy-user>

We just released an update to CloudMan. CloudMan offers an easy way to get a personal and completely functional instance of Galaxy in the cloud in just a few minutes, without any manual configuration.

This update brings a large number of updates and new features, the most prominent ones being:
- Support for Eucalyptus cloud middleware. Thanks to Alex Richter. Also, CloudMan can now run on the HPcloud in basic mode (note that there is no public image available on the HPcloud at the moment and one would thus need to be built by you).
- Added a new file system management interface on the CloudMan Admin page, allowing control and providing insight into each available file system
- Added quite a few new user data options. See the UserData page for details. Thanks to John Chilton.
- Galaxy can now be run in multi-process mode. Thanks to John Chilton.
Added Galaxy Reports app as a CloudMan service. Thanks to John Chilton.
- Introduced a new format for cluster configuration persistence, allowing more flexibility in how services are maintained
- Added a new file system service for instance's transient storage, allowing it to be used across the cluster over NFS. The file system is available at /mnt/transient_nfs just know that any data stored there will not be preserved after a cluster is terminated.
- Support for Ubuntu 12.10
- Worker instances are now also SGE submit hosts

This update comes as a result of 175 code changesets; for a complete list of changes, see the commit messages

Any new cluster will automatically start using this version of CloudMan. Existing clusters will be given an option to do an automatic update once the main interface page is refreshed.

Let us know what you think,

The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

FAQ: How much sequencing is needed for ... [soil metagenomics]?

Titus (Living in an Ivory Basement) blogs about the about of sequencing coverage required for various NGS applications and explains how he gets the number (via an this the spreadsheet ) 

Fig 2 which shows the coverage required (in non log Y scale) is really his main point of how crazy a lot of coverage that will make Illumina a very happy company if everyone started doing soil metagenomes

I must say frankly I am surprised that marine samples or even human gut samples have such a vast difference in species diversity compared to soil. 

Definitely thought provoking when you think about the search for extra terrestrial life perhaps it's easier  assume we will find an alien bacteria first than anything else .. (X-files will be terribly less exciting though if it's all about sending space soil probes to retrieve alien bacteria that eats your flesh)

haha sorry random thought on a Monday ... 

Wednesday, 21 November 2012

A unified method for detecting secondar - PubMed Mobile

Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations.

Sent from my phone

Tuesday, 20 November 2012

Amazon Glacier stores data for as little as $0.01 per gigabyte per month

Cheap cloud storage! 

Amazon Web Services

Dear Amazon Web Services Customer,

We are pleased to introduce a new storage option for Amazon S3 that enables you to utilize Amazon Glacier's extremely low-cost storage service for data archival. Amazon Glacier stores data for as little as $0.01 per gigabyte per month, and is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With the new Amazon Glacier storage option for Amazon S3, you can define rules to automatically archive sets of Amazon S3 objects to Amazon Glacier for even lower cost storage.

To store Amazon S3 objects using the Amazon Glacier storage option, you define archival rules for a set of objects in your Amazon S3 bucket, specifying a prefix and a time period. The prefix (e.g. "logs/") identifies the object(s) subject to the rule, and the time period specifies either the number of days from object creation date (e.g. 180 days) or the specified date after which the object(s) should be archived (e.g. June 1st 2013). Going forward, any Amazon S3 standard or Reduced Redundancy Storage objects past the specified time period and having names beginning with the specified prefix are then archived to Amazon Glacier. To restore Amazon S3 data stored using the Amazon Glacier option, you first initiate a restore job using the Amazon S3 API or the Amazon S3 Management Console. Restore jobs typically complete in 3 to 5 hours. Once the job is complete, you can access your data through an Amazon S3 GET request.

You can easily configure rules to archive your Amazon S3 objects to the new Amazon Glacier storage option by opening the Amazon S3 Management Console and following these simple steps:

  1. Select the Amazon S3 bucket containing the objects that you wish to archive to Amazon Glacier.
  2. Click on "Properties. Under the "Lifecycle" tab, click "Add rule."
  3. Enter an object prefix in the "Object prefix:" input box. This rule is now applicable to all objects with names that start with the specified prefix.
  4. Choose whether you want to archive your objects based on the age of a given object or based on a specified date. Click the "Add Transition" button and specify the age or date value. Click the "Save" button.

The Amazon Glacier storage option for Amazon S3 is currently available in the US-Standard, US-West (N. California), US-West (Oregon), EU-West (Ireland), and Asia Pacific (Japan) Regions. You can learn more by visiting the Amazon S3 Developer Guide or joining our Dec 12 webinar.

The Amazon S3 Team

AWS Blog  ln brk  Facebook  Twitter  YouTube  Slidesharere: Invent

We hope you enjoyed receiving this message. If you wish to remove yourself from receiving future product announcements and the monthly AWS Newsletter, please update your communication preferences.

Amazon Web Services, Inc. is a subsidiary of, Inc. is a registered trademark of, Inc. This message produced and distributed by Amazon Web Services, Inc., 410 Terry Ave. North, Seattle, WA 98109-5210.

9.2% of Singapore's males had a childhood dream to be a scientist



For males, 11.4 per cent of those surveyed in Singapore wanted to be engineers; 9.2 per cent wanted to be scientists; and 8.5 per cent, airplane or helicopter pilots. Following close are those in the health profession – doctors/nurses/paramedics came in at 6.3 per cent. Police officers made the cut-off at 5.5 per cent. 

A total of 8,000 professionals all over the world took part in the survey.

Globally, males wanted to be engineers (10.9 per cent), pilots (10 per cent), scientists (7.7 per cent), doctors/nurses/paramedics (5.3 per cent) and astronauts (4 per cent). Apart from the last entry, Singaporeans, it seems, reach for similar stars. 

Singapore-based females seem to be the nurturing sort, with 'teacher' leading the surveyed pack at 14.8 per cent. Doctors/nurses/paramedics follow with 13 per cent; lawyers at 8.7 per cent; journalist/novelist at 4.3 per cent; and fashion designer and stylist at 4.3 per cent. 

Internationally, females aspired to be teachers (10.7 per cent); doctors/nurses/paramedics (9.5 per cent); journalists/novelists (6.8 per cent); veterinarians (5.4 per cent); and lawyers (5.2 per cent). 

Though aspiring, 'Superhero', 'Prince/Princess' and 'Ninja' clocked in at 1.3 per cent, 0.5 per cent and 0.3 per cent respectively. 

Article: VirusSeq: Software to identify viruses and their integration sites using nextgeneration sequencing of human cancer tissue

VirusSeq: Software to identify viruses and their integration sites using nextgeneration sequencing of human cancer tissue

Sent via Flipboard

Sent from my phone

Sunday, 18 November 2012

uBiome -- Sequencing Your Microbiome for $59!

This should be fun! Love the simplicity of "Science press here"
Have signed up for the early adopter and looking forward to receiving my sample kit ;)

Join me in helping make it happen for uBiome -- Sequencing Your Microbiome on @indiegogo

uBiome -- Sequencing Your Microbiome
By joining uBiome, you can explore your own microbiome and participate in the exciting scientific discovery to unlock this mystery. In order for it to work, we need lots of samples and a little bit of money. By pulling together as a group, we can do cutting-edge biomedical research at a fraction of the normal cost. It's called citizen science.

Friday, 16 November 2012

Article: PLOS Genetics: Lessons from Model Organisms: Phenotypic Robustness and Missing Heritability in Complex Disease

Fascinating read. 

PLOS Genetics: Lessons from Model Organisms: Phenotypic Robustness and Missing Heritability in Complex Disease

Sent via Flipboard

Sent from my phone

Article: 7 Python Libraries you should know about

7 Python Libraries you should know about

Sent via Flipboard

Sent from my phone

Life Optimizations and the Tempation to optimize

Down with flu so am kinda doing the next best thing to working: reading random articles on the net. 

anyway I chanced on this article on the gliffy blog on optimization for programmer time which led me to a recorded talk by Jonathon Blow on programming. It's an interesting talk, general enough to offer casual programmers useful nuggets of info and the anecdote on the asset loading in Doom (the video game) was like a blast to the past omg ... 

stuff that I took away from the talk was that the 
 "industry average" for programmers is to generate ~  3,250 lines of code / year 

Optimization is not ALWAYS bad.

It is good when you are optimizing things that actually matter.

 It is only bad when you are optimizing for the wrong thing.

Hence when you optimize for Speed and Space don't forget to optimize for "years of my life 
per program implementation (life) "

I am so guilty of this, fixating on how to 'efficiently' max out the nodes/cores of our shared cluster that I forget that I can actually use the time 'wasted' to continue on another side project/ catch up on emails / or even that coffee / toilet break I have been postphoning
trying to see my time as a limited resource for which I need to fit everything within 3250 lines of code that I can generate in a year should be an interesting paradigm shift. 

Jonathan makes the rather daring claim that "almost all applied CS research papers are bad" and "this isn't fooling anyone any more ... " 

the reason for this is that the papers 

propose  adding a lot of complexity for a very marginal benefit

doesn't work in all cases (limited inputs, robustness) "supported" by bogus numbers, unfair comparisons

Hmm sounds like some papers I have seen lately ... 

his list of criteria for someone he would like to hire is something that I feel should exist in all job descriptions :

  • gets things done quickly
  • gets things done robustly
  • makes things simple
  • finishes what he writes (for real)
  • broad knowledge of advanced   ideas and techniques (but only uses them when genuinely helpful)

Datanami, Woe be me