Monday, 10 May 2010

A plethora of solid2fastq or csfasta convertors to fastq

I hadn't realised that there's an accumulation of prog/scripts to do the same task. Last count is 4 of these in my tool closet.

The C binary from bfast

solid2fastq 0.6.4a

Usage: solid2fastq [options]
        -c              produce no output.
        -n      INT     number of reads per file.
        -o      STRING  output prefix.
        -j              input files are bzip2 compressed.
        -z              input files are gzip compressed.
        -J              output files are bzip2 compressed.
        -Z              output files are gzip compressed.
        -t      INT     trim INT bases from the 3' end of the reads.
        -h              print this help message.

 send bugs to bfast-help@lists from bfast-0.6.4a
with notes in the script to refer to the above
# Author: Nils Homer
# Please see the C implementation of this script.

EDIT: THANKS to iceman for his reminder in the comments
"Make sure that you use the BWA's if you are going to use BWA as it "double-encodes" the reads." from bwa-0.5.7

Note: is the string showed in the `# Title:' line of a
      ".csfasta" read file. Then F3.csfasta is read sequence
      file and F3_QV.qual is the quality file. If
      R3.csfasta is present, this script assumes reads are
      paired; otherwise reads will be regarded as single-end.

      The read name will be :panel_x_y/[12] with `1' for R3
      tag and `2' for F3. Usually you may want to use short
      to save diskspace. Long also causes troubles to maq.

# Author: lh3
# Note: Ideally, this script should be written in C. It is a bit slow at present.
# Also note that this script is different from the one contained in MAQ.



Note: is the string showed in the `# Title:' line of a
      ".csfasta" read file. Then F3.csfasta is read sequence
      file and F3_QV.qual is the quality file. If
      R3.csfasta is present, this script assumes reads are
      paired; otherwise reads will be regarded as single-end.

      The read name will be :panel_x_y/[12] with `1' for F3
      tag and `2' for R3. Usually you may want to use short
      to save diskspace. Long also causes troubles to maq.

# Author: lh3
# Note: Ideally, this script should be written in C. It is a bit slow at present.


  1. Use the BFAST C-version for speed, but the perl version is easier to customize. Make sure that you use the BWA's if you are going to use BWA as it "double-encodes" the reads.

  2. The lastest version of the solid2fastq program in the bfast+bwa branch of bfast includes a few additional options. Some relevant ones:

    -b encodes the output in the format bwa wants (double encoded, reads order reversed)

    -w leaves the output in color space, but splits the reads in into read1, read2, and single files.

    There's also a script, which will convert and split the files in parallel on an SGE cluster. It works surprisingly well if splitting reads is useful for you. (Disclaimer: I wrote the script.)

  3. Hey,

    I am using MAQ for the first time. I attempted to use and it executes without any errors, however, the files it produces are empty.

    I have PE reades "_F3.csfata" and "_R3.csfasta" with their corresponding Qual files.

    I have no clue why this is happening. Any ideas??

  4. Hi Anonymous, I would advise you to try to use BWA if you can.
    perhaps you can paste your entire command with

    with the names of your files?

  5. Hi I have bwa-0.5.9/ version. I have two files SolF3.csfasta & SolF3_QV.qual which i want to convert in 'fastq'. After running the command as :

    perl /usr/local/ngs/bwa/bwa-0.5.9/ Sol SolTest

    I am getting the file SolTest.single.fastq.gz but with no reads in file after i unzip it, whereas i have good and equivalent amount of reads in my input file.Can you explain me the reason if you have any idea.

    Strange to say the say command is working with another set of file....

    You can reply me on my mail id

  6. @braj sounds like a problem I had before .. did you delete the comments header in the csfasta and qual file? I vaguely recall those gave me problems with some scripts.

  7. i want to convert csfasta file to .fastq format so please help

    vijay sharma

  8. The first script is correct in its R3 / F3 conversion to 1 / 2 .

    " The read name will be :panel_x_y/[12] with `1' for R3 tag and `2' for F3. "

    The second is incorrect in that it names F3 as 1 and R3 as 2.


Datanami, Woe be me