Gusnanto, A., Wood, H.M., Pawitan, Y., Rabbitts, P. and Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next generation sequence data. 2011. Bioinformatics, epub ahead of print.
CNAnorm performs ratio, GC content correction and normalization of data obtained using very low coverage (one read every 100-10,000 bp) high throughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It also provides tumour content if at least two ploidy states can be found.
Get the latest version of CNAnorm and its documentation from Bioconductor. Prerequisite: you need a Fortran compiler, make and DNAcopy from Bioconductor
You can also download the perl script bam2windows.pl (version 0.3.3) to convert sam/bam files to the text files required by CNAnorm. For documentation on usage, run the script without arguments
For further information on both programs, please contact Stefano Berri
Additional data files
We provide gc1000Base.txt.gz, an example file for GC content (build GRCh37/hg19) to optionally use with bam2windows.pl. It provides average GC content every 1000 bp. The size of the window in the GC content file should be at least an order of magnitude smaller than the window used for CNAnorm to minimise boundary effects. If you require higher resolution, you candowload the gc5Base tables from UCSD and/or make your own. The smaller the window size in the GC content file, the larger this will be, and the longer it will take to bam2windows.pl to process it.