Friday, 29 January 2010

gawk Sequence Alignment/Map (SAM) Format files for mapping info

There's lots of stuff you can do with sed awk grep in linux.
Below is a few commands that I modified from samtools helplist that will help others in trying to sieve out mapping info from sam files
Basically it works by looking at col 2 of the sam file for the flag to see if the read is mapped, unmapped. Look at above link for more info on the sam format.

#extract unmapped reads from sam file
gawk '!/^@/&&and($2,4)' aln.sam > aln.sam.unmapped.reads

#extracts readname & name of seq in reference where read is mapped
gawk '!/^@/&&!and($2,4){print  $1 $3}' aln.sam > aln.sam.mapped.reads

#counts the number of reads mapped to seq in reference
gawk '{print $3}' aln.sam.mapped.reads|sort  |uniq -c |sort -n > aln.sam.mapped.reads.stats

Tuesday, 26 January 2010

Mass seq of MRSA with NGS

Omics! Omics!: A plethora of MRSA sequences

chanced on the above

Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA, Edgeworth JD, de Lencastre H, Parkhill J, Peacock SJ, Bentley SD.
Science. 2010 Jan 22;327(5964):469-74.PMID: 20093474 [PubMed - in process]

It would be fantastic if the cost can reach $320 USD per sample. It will definitely be incentive to do this as a routine or additional pipeline to do 'personalized' medicine. Of course, in this example the 'person' is the type of the MRSA and the specific treatment will be targetted for the strain of MRSA.

Saturday, 2 January 2010

linux sort file by file size

I keep forgetting this useful line of code. use -r to reverse the sort order. right now it shows the biggest file last

/bin/ls -l $1|awk '{print $5,  $9}' |sort -n -k 1

Which Browser for NGS?

Interesting discussion at linkedin forum about which browser for NGS data visualization

Datanami, Woe be me