Tuesday, 29 November 2016

Verily (Google) is hiring Computational Biologists


the role is described as 'hardware engineering' interestingly. the preferred qualifications are very loose...

  • Demonstrated knowledge of core concepts in machine learning or probability and statistics.
  • Willingness to learn molecular and cell biology, computer science, and statistics.
  • Demonstrated effective written and verbal communication skills.

I bet they will be inundated with submissions! 

Wednesday, 9 November 2016

Compiling BWA on Ubuntu 16.04.1 LTS

#install prereq else you will get utils.c:33:18: fatal error: zlib.h: No such file or directory
sudo apt-get install zlib1g-dev

#download the latest version and compile
$ wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.12.tar.bz2
$ tar jxvf bwa-0.7.12.tar.bz2
$ cd bwa-0.7.12/
$ make 

Wednesday, 18 May 2016

CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer

CIViC's Role in Precision Medicine

Realizing precision medicine will require this information to be centralized, debated and interpreted for application in the clinic. CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. Our goal is to enable precision medicine by providing an educational forum for dissemination of knowledge and active discussion of the clinical significance of cancer genome alterations.
CIViC is a community-edited forum for discussion and interpretation of peer-reviewed publications pertaining to the clinical relevance of variants (or biomarker alterations) in cancer. These interpretations may include associations between molecular alterations (or lack of alteration) and one or more drugs, diagnoses, prognoses or other treatment decisions. These interpretations of clinical significance (or lack of clinical significance) are purely for research purposes. A finding of no interpretation does not necessarily indicate lack of relevance for any specific variant or biomarker alteration. Interpretations are not presented in ranked order of potential or predicted importance.These interpretations make no promise or guarantee of any clinical benefit (or lack of clinical benefit).

Thursday, 5 May 2016

Verily Life Sciences new hires


I guess scouting a company's recruitment page to understand the projects is a universally common thing. Interestingly this page even has a word cloud from the LinkedIn Profiles to see where the hires are previously from.

The new hires reflect the scope of the few Verily projects Google/Alphabet has allowed to escape to the public so far: a contact lens venture with Novartis AG for diabetics to track blood glucose levels, its buyout of a company with a spoon that counters shaking by Parkinson's disease patients and the big picture Baseline study, a deep research project designed to define a healthy human being."The future of biotech, medical and tech is going to coalesce. We've seen that with medical sensors that do more than count steps, artificial intelligence and virtual reality," Topol said. "All these things are going to have a big impact in medicine. It's a natural evolution." excerpted from @SFBusinessTimes 

Friday, 29 April 2016

FDA launching the second precisionFDA challenge.

PrecisionFDA Truth Challenge

The challenge begins with two precisionFDA-provided input datasets, corresponding to whole-genome sequencing of the HG001 (NA12878) and HG002 (NA24385) human samples. Both samples were sequenced under similar sequencing conditions and instruments, at the same sequencing site. Your mission is to process these two FASTQ datasets through your mapping and variation calling pipeline and create VCF files. You can generate those results on your own environment, and upload them to precisionFDA, or you can reconstruct your pipeline on precisionFDA and run it there. Regardless of how you generate your VCF files, you will subsequently submit them as your entry to the challenge.
For HG002, the truth data will not be known during the challenge. After submissions close on May 26, GiaB will publish their reference VCF file for HG002. The precisionFDA team will then run and publish comparisons between each contestant’s HG002 VCF file and the GiaB HG002 reference VCF. This will publicly reveal how similar is each result to the GiaB HG002 reference.
For HG001, the reference VCF is already available. You are therefore asked to conduct a comparison between your VCF and the GiaB HG001 (NA12878) reference VCF, and include it in your submission entry, for the following reasons:
  1. to ensure that your VCF files are compatible with the comparison process (remember that we won’t be able to check on your HG002 VCF until after the end of submissions, so you are using your HG001 VCF as a check that your files can be compared without issues)
  2. for the community to be able to contrast your performance on a previously known sample (HG001) versus a previously unknown (HG002), and to evaluate any overfitting on HG001
Your entry to the challenge comprises your submitted HG001 and HG002 VCFs, your submitted HG001 comparison, and the HG002 comparison conducted by precisionFDA. Each comparison outputs several metrics (such as precision*, recall*, f-measure, or number of common variants). Selected participants and winners** will be recognized on the precisionFDA website. Therefore, we hope you are willing to share your experience with others to further enhance the community's effort to ensure accuracy and consistency of tests.
The challenge runs until May 26, 2016.

Wednesday, 27 April 2016

Free Imputation servers

Free imputation servers will allow anyone to use the full haplotype reference panel to impute missing genotypes in their data.  Users will be able to upload (pre-phased or unphased) genotype data to the server. Imputation will be carried out remotely on the server, and the imputed data will then be made available to the user. 

Prototype imputation servers are already available at 

A prototype phasing server for phasing high coverage sequenced samples is available at

Source: http://www.haplotype-reference-consortium.org/data-access

Wednesday, 13 April 2016

Best Disclaimer on sharing personal genetic information thus far

My vote goes to https://opensnp.org/signup

This sentence sums it all There is zero privacy anyway, get over it 

copied from the above URL

By signing up for openSNP you declare that you have understood the possible risks and side-effects that can occur by making your genetical and medical information available on this platform. In short:
  • Data uploaded to the internet can not be fully deleted, there may always be a backup somewhere
  • By publishing data you expose information about you and your next of kin worldwide
  • Genetic and medical information can be used by employers, insurance companies and the government to know more about you than you would like
  • new findings about your genotypes can be negative
What has been seen can not be unseen
You agree that all data you upload to openSNP will be freely available online (well, except your mail-address and password) under a Creative Commons Zero license. The data can be viewed and downloaded through this webpage, RSS-feeds, in future maybe via an API and via FTP. Although you can delete your data from openSNP this does not guarantee that no one else did already create a backup of the data (who may re-publish the data somewhere else).
There is zero privacy anyway, get over it
Although you can upload your data using a pseudonym, there is no way to anonymously submit data. Statistically speaking it is really unlikely that your medical and genetic information matches that of someone else. By uploading you do not only disclose information about yourself, but also about your next kinship (parents and siblings), that shares half of a genome with you. Before uploading any genetical data you should make sure that those people approve of you doing so. This is especially important if you have monozygotic twin, who shares all of your genome!
Jobs, insurance, the government
Medical and genetic data can be used to discriminate people. Due to medical or genetic information an employer may not give you a job, an insurance company may request higher payments and who knows what any evil™ government will do with your data? Although some countries have laws against genetic discrimination, these laws certainly will not cover possible discrimination scenarios and could change in the future. Again: These are side effects and risks which also can apply to your kinship, if you chose to upload this information.
Knowledge about genes and SNPs is not static
Nearly every week there are new scientific publications that find new associations between certain traits (like diseases) with existing genetic information. Because of this you should not publish your data just because it currently looks harmless and unsuspicious. It may be true that your genotyping data is of no greater interest for your employer, your insurance company or the government right now, but this can easily change (Remember: One of the reasons to upload your data here in the first place is, to enable everyone to find such new associations).
Think of the hypothetical SNP rs666. One day after you upload your genotyping-data to this website, a new publication finds that your genotype at rs666 will give you, your siblings and your parents a fatal disease that will most certainly strike all of you. Due to this disease you (and you kin) may lose your jobs and your insurance. Chances for a association of this kind may be small, but by uploading the data you are nonetheless taking this risk!
Accounts which only serve advertising will be deleted.

Datanami, Woe be me