Tuesday, 20 March 2018

JD: Sr. Software DevOps Engineer at Guardant Health

Gotta love this line 
“We wanted flying cars and instead we got 140 characters” is a much-repeated complaint about Silicon Valley. But with all due respect to flying cars, we believe that our mission is even more critical. 

notable skills in the JD to pursue 
Ansible / Chef

This paragraph sounds exactly like what I face on a daily basis

Your troubleshooting skills are excellent, and you enjoy a good daily challenge in supporting rapid growth and a diverse set of end user needs. You have the ability to maintain day to day support while running various key projects that move the business forward by automating and creating new tools that facilitate management of the environment.

Friday, 23 February 2018

Exploring the 1000 genome dataset with Hail on Amazon EMR and Amazon Athena

 Blog post from Roy Hasson


Genomics analysis has taken off in recent years as organizations continue to adopt the cloud for its elasticity, durability, and cost. With the AWS Cloud, customers have a number of performant options to choose from. These options include AWS Batch in conjunction with AWS Lambda and AWS Step Functions; AWS Glue, a serverless extract, transform, and load (ETL) service; and of course, the AWS big data and machine learning workhorse Amazon EMR.
For this task, we use Hail, an open source framework for exploring and analyzing genomic data that uses the Apache Spark framework. In this post, we use Amazon EMR to run Hail. We walk through the setup, configuration, and data processing. Finally, we generate an Apache Parquet–formatted variant dataset and explore it using Amazon Athena.

Friday, 8 December 2017


15 months ago by
cshevlin  40
Spotted this ad in Biostars .. https://www.biostars.org/p/210796/

The IBM Watson Health business division is now looking for talented individuals destined to usher in the next era of healthcare. We live in a moment of remarkable change and opportunity. The convergence of data and technology is transforming healthcare and life sciences organizations in every way. New roles are being created that never existed before to meet the demands of this transformation.
We are now looking for a Genomic Data Scientist to join our team.
You will have an opportunity to work directly with the team building new healthcare solutions using genomic analytics and serving oncologists, pathologists and other specialists caring for patients. You will help define, design, and build those solutions and apply your expertise to work in different analytical and statistical models.
Key Responsibilities: Develop tools to transform load and validate data Strategizes new uses for data and its interaction with data design Perform data studies of new and diverse data sources Find new uses for existing data sources Discover “stories” told by the data and presents them to other scientists and business managers Generate algorithms and create computer models**
Ideal Candidates will possess the following:
Candidates should foremost have a strong background in data mining and statistics. Hands-on background in programming and using databases and tools to mine data including practical experience in extracting, transforming and load data as well as developing statistical and analytical models. Candidates must have demonstrated capacity to adapt to demanding and high pressure projects and adaptability to client’s needs. Background on bioinformatics Experience in Healthcare or Life Sciences
Learn more about IBM Watson Health and what we are doing …. And apply Now to explore this opportunity with us!
*U.S. Department of Veterans Affairs Enlists IBM’s Watson in the War on Cancer Public-Private Partnership Will Help Doctors Scale Precision Medicine Access for up to 10,000 VA Cancer Patients http://www-03.ibm.com/press/us/en/pressrelease/50061.wss
IBM and New York Genome Center’s new cancer tumor repository aims to revolutionize treatment
IBM's Watson to help doctors devise optimal cancer treatment
Employment Type
Required Technical and Professional Expertise At least 2 years of experience in data mining At least 1 year of experience with one or more data/statistics tools, such as Python, R, SPSS, Perl At least 1 year of programming Demonstrated ability in effective communication skills
Fluent in English
Preferred Technical and Professional Experience 3 years of experience with one or more data/statistics tools such as Python, R, SPSS 1 year of experience with relational databases, such as DB2, NoSQL, etc

Sunday, 13 August 2017

Meet Nephele: Harness the Power of the Cloud for Your Microbiome Data Analysis

Nephele is a project from the National Institutes of Health (NIH) that brings together microbiome data and analysis tools in a cloud computing environment. It aims to address a major challenge facing researchers today — namely, analyzing, transferring, and storing biomedical "big data" — through the use of cloud-based resources

 Why Use Nephele?

  • Liberating: Nephele enables you to break free from constraints imposed on high-throughput computational analysis
  • Simple: Nephele is designed to be a no-hassle, easy-to-use tool to support your research
  • Sophisticated: Nephele is the most intuitive, advanced and secure microbiome analysis platform designed by our experienced computational biologists and software development team to provide exceptional capability with little effort on your part
  • Fast: Nephele speeds up microbiome data analysis and paves the path to getting to your results
  • Economical: Nephele's on-demand, pay-as-you-go setup offers a cost-effective alternative to using of dedicated resources for your microbiome data analysis
Ready to get started? Visit https://nephele.niaid.nih.gov/ and enter your email address. Check your inbox for a message with the subject "Your Nephele Promotional Codes."
Stay in touch! Email nephele@mail.nih.gov with your questions and feedback. You can also visit our Google+ community page to connect with other researchers in the microbiome community (https://plus.google.com/communities/107278901311674483366).

Source: https://www.biostars.org/p/204081/

demo bam file Ion Torrent 314 chip of E. coli 400 bp run for download

BAM file of B22-730 (314v2 E. coli 400 bp run)
Ion Torrent PGM 314v2 run with a mode read length of 400bp and per-base raw read accuracy >99%.


Source: https://apps.thermofisher.com/apps/publiclib/#/datasets

Wednesday, 2 August 2017

Creating filtered fastq files of ONLY mapped reads from a BAM file

Filtering BAM files for mapped or unmapped reads

To get the unmapped reads from a bam file use :
samtools view -f 4 file.bam > unmapped.sam, the output will be in sam
to get the output in bam use : samtools view -b -f 4 file.bam > unmapped.bam
To get only the mapped reads use the parameter 'F', which works like -v of grep and skips the alignments for a specific flag.
samtools view -b -F 4 file.bam > mapped.bam

Source: https://www.biostars.org/p/56246/ Sukhdeep Singh

To do this as efficiently as possible, using BBTools:
reformat.sh in=reads.sam out=mapped.fq mappedonly
Also, BBMap has a lot of options designed for filtering, so it can output in fastq format and separate mapped from unmapped reads, preventing the creation of intermediate sam files.  This approach also keeps pairs together, which is not very easy using samtools for filtering.

bbmap.sh ref=reference.fa in=reads.fq outm=mapped.fq outu=unmapped.fq
Source: https://www.biostars.org/p/127992/ Brian Bushnell

Wednesday, 12 April 2017

Control a fleet of embedded unix systems (eg Raspberry Pi, Orange Pi) using saltstack

HAHAHA I share the same name as a software project. Bizarre discovery today

Control a fleet of embedded unix systems (eg Raspberry Pi, Orange Pi) using saltstack

Datanami, Woe be me