To split your BED file by chromosome, this simple GNU AWK script will create "my.chrNNN.bed" file for each chromosome:
awk '{ print >> "my." $1 ".bed" }' my.bed
BSD's AWK
awk '{ file = "TFBS." $1 ".bed" ; print >> file }' TFBS.bed
credit: Assaf Gordon on BedTools mailing list for pointing this out
On a side note,
To split your BAM file by chromosome, you can use "bamtools split" ( bamtools here: https://github.com/
There is a SAMtools filter option as well.
Anyone benchmarked both to see which runs faster?
this discussion arose from someone trying to compute the mean coverage
bedtools coverage -abam my.bam -b my.bed -d | sort -k1,1 -k2,2n | groupby -g 1,2,3,4,5,6 -c 8 -o mean > my.txt
it doesn't scale too well with the no. of entries in the bed file
1,000 lines takes a few minutes
1,200,000 lines " it's been running for 12 hours and still not done yet " on a Mac Pro with speed of 2.66 GHz and 8 GB of Memory.
So splitting the file by chromosome helps to parallelize the process although the job might scale linearly
What if I want to keep headers?
ReplyDelete