Thursday 31 December 2009

use csplit to split fasta files

Got this off the net a looong while back. Sorry I can't attribute the source. please drop a comment if you know the orginal author. Naming the files using a increasing counter is a godsend if you wanna batch qsub / pbs jobs.

#!/bin/sh
# split fasta file into separate sequence files
#

if [ $# -gt 1 ]
then
 seqfile="$1"
 destdir="$2"
else
 echo "Use: fsplit SEQFILE DESTDIR"
 echo "     Splits fasta file SEQFILE into separate files in DESTDIR folder"
 exit
fi

mkdir $2
#names the fa files as sequence00 i.e. with padding
#csplit -f $destdir/sequence $seqfile "%^>%" "/^>/" "{*}" -s
#names the fa files as sequence0 i.e. without padding
csplit -n 1 -f $destdir/sequence $seqfile "%^>%" "/^>/" "{*}" -s

No comments:

Post a Comment

Datanami, Woe be me