Thursday 9 February 2012

ea-utils - FASTQ processing utilities - Google Project Hosting

http://code.google.com/p/ea-utils/

Suite of processing tools for sequencing output. Barcode demultiplexing, adapter trimming, etc.
Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

Overview:

  • fastq-mcf
  • Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
  • fastq-multx
  • Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
  • fastq-join
  • Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.

Other Stuff:

  • sam-stats - Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (Click for source)
  • fastq-stats - Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers). (Click for source)
  • determine-phred - Returns the phred scale of the input file. Works with sams, fastq's or pileups and gzipped files.
  • Chrdex.pm - indexes a delimited file by chromosome start/stop. There are lots of tools for this. This one works pretty well if you're a perl user. It handles overlapping regions reasonably well. It uses RAM comparable to the size of the annotation file.
  • Sqldex.pm - just like Chrdex.pm, except uses a disk-based btree. Not as fast, but close, and uses very little RAM.
  • qsh - Runs a bash script file like a "cluster aware makefile"...only processing newer things, die'ing if things go wrong, and sending jobs to a queue manager if they're big. That way you don't have to write makefiles, or wrap things in "qsub" calls for every little program. Not really ready yet.
  • grun - Fast, lightweight grid queue software. Keeps the job queue on disk at all times. Very fast. Works well by now
  • gwrap - Bash wrapper shell that downloads all dependencies that are not the local system.... good for EC2 nodes. Linux only. Will use it if we ever go to EC2.

2 comments:

  1. Hi, I am a fastq-mcl new user I have some trouble runing the program on OS X platform because the version of ea-utils that I download have fastq-mcl.c which neighter is recognised in my command line so what should I do?

    This is the command line:
    rsb-33-073:ea-utils.1.1.2-537 jaime$ fastq-mcf.c -l 50 -q 30 -P 32 -C 100000 -o trimming test ../adapter.fa Desktop/RNAseq/ inoculated libraries/M1_I_ATCACG_L001_R1_003.fastq
    -bash: fastq-mcf.c: command not found

    ReplyDelete
  2. You have to run "make" first.

    ReplyDelete

Datanami, Woe be me