Monday, 12 December 2011

CNAnorm is a Bioconductor package to estimate Copy Number Aberrations (CNA) in cancer samples.

http://www.precancer.leeds.ac.uk/software-and-datasets/cnanorm/

CNAnorm

CNAnorm is a Bioconductor package to estimate Copy Number Aberrations (CNA) in cancer samples.

It is described in the paper:

Gusnanto, A., Wood, H.M., Pawitan, Y., Rabbitts, P. and Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next generation sequence data. 2011. Bioinformatics, epub ahead of print.

CNAnorm performs ratio, GC content correction and normalization of data obtained using very low coverage (one read every 100-10,000 bp) high throughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It also provides tumour content if at least two ploidy states can be found.

Availability

Get the latest version of CNAnorm and its documentation from Bioconductor. Prerequisite: you need a Fortran compiler, make and DNAcopy from Bioconductor

You can also download the perl script bam2windows.pl (version 0.3.3) to convert sam/bam files to the text files required by CNAnorm. For documentation on usage, run the script without arguments

perl bam2windows.pl

For further information on both programs, please contact Stefano Berri

Additional data files

GC content

We provide gc1000Base.txt.gz, an example file for GC content (build GRCh37/hg19) to optionally use with bam2windows.pl. It provides average GC content every 1000 bp. The size of the window in the GC content file should be at least an order of magnitude smaller than the window used for CNAnorm to minimise boundary effects. If you require higher resolution, you candowload the gc5Base tables from UCSD and/or make your own. The smaller the window size in the GC content file, the larger this will be, and the longer it will take to bam2windows.pl to process it.

LS041 bam files

We provide the bam files used to produce the dataset included in CNAnorm
LS041_tumour.bam (139 MB)
LS041_control.bam (130 MB)

To produce a text file suitable as input for CNAnorm you can enter the following

perl bam2windows.pl --gc_file gc1000Base.txt.gz LS041_tumour.bam LS041_control.bam > LS041.tab

It will produce this file

You need samtools installed in a directory in your $PATH if your input files are bam format

No comments:

Post a Comment

Datanami, Woe be me