Tuesday, 6 September 2011

Where to find information on Ion Torrent - a useful guide!

For those that have trouble navigating the Ion Community for useful info amidst the tangled web here's a good guide that arrived in my inbox.

Where to find Information

The various Ion Community sites are the best place to start when looking for information quickly. There are a good range of technical documents, troubleshooting advice and how-to videos that have been compiled by both PGM users and staff. The Ion Community consists of 4 sites:

Bioinformatics Videos on the Ion Community


General Documentation


Was the Run Successful?

The wet lab training covers how to read the run report and a basic understanding of the metrics it contains. A complete description of the information in the report, how it is generated and what it means can be found in the Torrent Browser Analysis Report Guide. There is also a video.

Data Analysis Software

Currently, most PGM users are making complete alignments on their Torrent Server. These alignments are made by default in order to produce run statistics. There is the possibility to randomly sample from the FASTQ file, but due to the manageable output from the 314 and 316 chips, the sampling is turned off by default (meaning hat 100% of reads are used in this alignment). A summary of the files that are automatically created on the Torrent Server can be found here.
New in Torrent Suite 1.4, there is an Alignment Plugin that can perform a new alignment after the run is complete (instructional video). There is also a Variant Calling Plugin which uses mPileup from SAMtools to call SNPs and indels in VCF format. However, using the plugins in Torrent Suite, you will not get the full range of options that are available for these tools. The extended options are available if you run them through command line on your offline server.
The standalone versions of these software are freely available but since they do not have a graphic interface, their use is intended for bioinformaticians and researchers with Linux experience.
  • TMAP - Aligns PGM reads to a reference. This is the software used to make alignments on the Torrent Server. A description of the software can be found here and discussion on parameters for offline analysis can be found here. There is even a manual.
  • SAMtools - Various SAM/BAM manipulation tools and variant caller
Since Ion Torrent produces data in a commonly-used standard format (FASTQ), there are a range of 3rd party commercial softwares available. Demo licences are also available for some of these, please contact the vendor for details. These companies can provide information on the particular capabilities of their software and will also provide software support.

Life Technologies Partners

Other Vendors


Technology Overview

A high level overview to the technology along with some videos can be found here.

Hardware

This is a breakdown of the various hardware components of the PGM and Torrent Server.

PGM

  • Wind River Linux Operating System - http://www.bsdi.com/
  • Two 2TB drives
    • One is used for the operating system and the other is used for the results storage

Torrent Server

  • Ubuntu (10.04 LTS) Lucid Lynx Linux Operating System - http://www.ubuntu.com
  • Eight 2TB drives
    • Formatted RAID5 leaving 11TB of usable space and can withstand two simultaneous drive failures
  • Two 6-core CPUs
  • 48GB RAM
  • More information can be found here

Key Files and Locations

A quick summary of the location of key files and the file structure on the Torrent Server.
  • /results/[PGM]/[Run]/ - Storage location for primary data files copied from PGM
    • DAT files - the raw voltage files copied over from the PGM
    • explog.txt - Contains name-value pairs with run level information that was entered on the PGM
      • The presence of this file and the last DAT file are what causes ionCrawler to insert the new run into the database
    • explog_final.txt - Same as explog.txt
      • The presence of this file is what signifies the end of the analysis and if enabled can be used as a trigger for the PGM to auto-remove the run from the instrument
  • /results/analysis/output/Home/[Analysis]/ - Storage location for primary analysis data and BAM file if mapping was done
    • *.sff.zip - Zip file containing the SFF file
    • *.fastq.zip - Zip file containing the FASTQ file
    • *.bam - BAM file containing alignment data against specfied reference
    • *.support.zip - Archieve containing system level information and logs from when the analysis was done
  • /results/analysis/output/Home/[Analysis]/plugin_out/[Plugin]/ - Storage location for plugin specific output files
    • ./Alignment_out/ - Storage location for the realignment plugin
      • *.bam - BAM file containing alignment data against the specificed secondary reference
    • ./variantCalling_out/ - Storage location for the variant caller plugin
      • *.vcf.gz - VCF file containing variant calls from the variant call pipeline
  • /opt/ion/iondb/ - Primary location for Torrent Suite Software executables
    • TLScript.py - Master script which drives the initial analysis from DAT to BAM
  • /etc/init.d/ - Primary location for Torrent Server service scripts
    • ionCrawler - Responsible for looking at the local file structure and identifying when a new run has been copied over by looking for the explog.txt file and the presence of the last DAT file and inserting the new run into the database
    • ionJobServer - Responsible for submitting primary analysis jobs to SGE
    • ionPlugin - Same as ionJobServer only it drives the plugin analyses
  • /var/log/ion/ - Primary storage location for ion specfic process logs
    • crawl.log - ionCrawler service log
    • jobserver.log - ionJobServer service log
    • ionPlugin.log - ionPlugin service log
    • iarchive.log - Archive service log
    • tsconf - TSconfig service log

Data Formats

More information is available at Ion Torrent File Formats.

DAT

More information can be found at File Format - Raw Image Acquisition File (DAT).

WELLS

More information can be found at File Format - 1.wells files (WELLS).

SFF

Standard Flowgram Format (SFF) is a standard binary format used encode the sequence data in flow space. More information can be found at File Format - Sequence Files (SFF and FASTQ) and here.

FASTQ

A standard text-based format for storing raw sequence information.
@J16EU:4:72
CCTCACCCGCCGTCACGTGATGAAAGGATTACTGCTGTTGCTCGGCGCTGGCGGAGGCTGGCAGCTCTGGCAGTC
+
4144655,7715777778997788,3366168858777377755/444551563773654.100.'.....0...
More information can be found File Format - Sequence Files (SFF and FASTQ) and here.

SAM/BAM

Sequence Alignment / Map (SAM) format is a standard text-based format for storing alignment results. BAM is the binary equivalent of SAM.
J16EU:755:498  0 gi|49175990|ref|NC_000913.2|   8       70      39M1D42M16S     *       0       0
CATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCGTGCTGAGACTGACAG
?<?;??=???>??;?>>6;77;;:;<;=;=<==;?;??;?????':8:==<==>><=<9;9:94:9<<6:;:/4052/2...9;;;=
<;;:752/.. RG:Z:J16EU      PG:Z:tmap       MD:Z:39^A42 NM:i:1 AS:i:74        XA:Z:map3-1
XS:i:-2147483647        XT:i:56
More information can be found at File Format - Alignment Files (SAM) and here.

VCF

Variant Call Format (VCF) is a standard text-based format for storing information on genomic variations (e.g., SNPs and Indels).
#CHROM  POS    ID       REF     ALT     QUAL    FILTER  INFO    FORMAT  snappqc
gi|49175990|ref|NC_000913.2|    547694  .       A       G       59      .       DP=9;AF1=1;CI95=1,1;
DP4=0,0,2,7;MQ=56;FQ=-54     GT:PL:GQ 1/1:92,27,0:51
More information can be found here

Data Flow

A high-level overview of the data flow on and between the PGM and Torrent Server can be found at Torrent Data Flow and Analysis Quick Start and a more detailed description can be found at Technical Note - Analysis Pipeline.

Ion Specific Algorithm Overview

The documents referenced here are meant to provide a high level overview of what is occuring at each step of the analysis pipeline.

Image Processing: DAT To WELLS

More information can be found at Technical Note - Analysis Pipeline.

Signal Processing: WELLS To SFF

More information can be found at Technical Note - Analysis Pipeline.

Base Calling: SFF To FASTQ

More informatoin can be found at Technical Note - Analysis Pipeline.

7 comments:

  1. Very good information, where did you get all of this?

    ReplyDelete
  2. Very useful. Thank you for the effort.

    ReplyDelete
  3. Thanks for the information!

    I wonder if there is any Ion Proton sequence data available yet ?

    ReplyDelete
  4. Thank you so much!

    ReplyDelete
  5. An Ion Torrent employee28 July 2015 at 02:26

    Thanks for the blog! In case anyone is still trying to use the above links and finds them broken, the Ion Community is now hosted here: https://ioncommunity.lifetechnologies.com/welcome
    And by the end of 2015 will migrate to https://ioncommunity.thermofisher.com/

    ReplyDelete
    Replies
    1. Dear Ion Torrent employee,
      very good remark. Do you by any chance know, if the document describing the DAT file format ( File Format - Raw Image Acquisition File (DAT) ) is still somewhere available ?

      Thanks ! Elain

      Delete
  6. Dear Kevin,
    very nice info collection !

    Unfortunatly this link to the File Format - Raw Image Acquisition File (DAT)
    is broken (and the file format description does not seem to be on the new repositories above). Do you have an idea, where I get some info about the DAT file structure ?

    Thanks a lot for you help, Elain !

    ReplyDelete

Datanami, Woe be me