CNAnorm
CNAnorm is a Bioconductor package to estimate Copy Number Aberrations (CNA) in cancer samples.
It is described in the paper:
Gusnanto, A., Wood, H.M., Pawitan, Y., Rabbitts, P. and Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next generation sequence data. 2011. Bioinformatics, epub ahead of print.
CNAnorm performs ratio, GC content correction and normalization of data obtained using very low coverage (one read every 100-10,000 bp) high throughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It also provides tumour content if at least two ploidy states can be found.
Availability
Get the latest version of CNAnorm and its documentation from Bioconductor. Prerequisite: you need a Fortran compiler, make
and DNAcopy from Bioconductor
You can also download the perl script bam2windows.pl
(version 0.3.3) to convert sam/bam files to the text files required by CNAnorm. For documentation on usage, run the script without arguments
perl bam2windows.pl
For further information on both programs, please contact Stefano Berri
Additional data files
GC content
We provide gc1000Base.txt.gz, an example file for GC content (build GRCh37/hg19) to optionally use with bam2windows.pl. It provides average GC content every 1000 bp. The size of the window in the GC content file should be at least an order of magnitude smaller than the window used for CNAnorm to minimise boundary effects. If you require higher resolution, you candowload the gc5Base tables from UCSD and/or make your own. The smaller the window size in the GC content file, the larger this will be, and the longer it will take to bam2windows.pl
to process it.
LS041 bam files
We provide the bam files used to produce the dataset included in CNAnorm
LS041_tumour.bam (139 MB)
LS041_control.bam (130 MB)
To produce a text file suitable as input for CNAnorm you can enter the following
perl bam2windows.pl --gc_file gc1000Base.txt.gz LS041_tumour.bam LS041_control.bam > LS041.tab
It will produce this file
You need samtools installed in a directory in your $PATH
if your input files are bam format
No comments:
Post a Comment