Monday, 28 May 2012

SCORE-Seq: Score-Type Tests for Detecting Disease Associations With Rare Variants in Sequencing Studies


SCORE-Seq: Score-Type Tests for Detecting Disease Associations With Rare Variants in Sequencing Studies
SCORE-Seq is a command-line program which implements the methods of Lin and Tang (2011) for detecting disease associations with rare variants in sequencing studies. The mutation information is aggregated across multiple variant sites of a gene through a weighted linear combination and then related to disease phenotypes through appropriate regression models. The weights can be constant or dependent on allele frequencies and phenotypes. The association testing is based on score-type statistics. The allele-frequency threshold can be fixed or variable. Statistical significance can be assessed by using asymptotic normal approximation or resampling. A detailed description of the methods is given in Lin and Tang (2011). The current release covers binary and continuous traits with arbitrary covariates under case-control and cross-sectional sampling. The newest version was released on May 21, 2012 with some new features. We are working intensely to improve the capabilities of SCORE-Seq, so please check back frequently for updates.
General information
SCORE-Seq is a command-line program written in the C language to implement the methods of Lin and Tang (2011) for detecting disease associations with rare variants in sequencing studies. In the software, various tests are conducted for each gene. There are options for the minor allele frequency (MAF) upper bound, the call rate (CR) lower bound and the minor allele count (MAC) lower bound. A variant is deleted if its MAF is greater than the MAF upper bound or its CR is lower than the CR lower bound. A gene is excluded from the analysis if the MAFs of all its variants are greater than the MAF upper bound, the CRs of all its variants are less than the CR lower bound or its MAC is less than the MAC lower bound. By default, the MAF upper bound is 0.05, the CR lower bound is 0 and the MAC lower bound is 1. The MAFs may be determined internally (i.e., calculated from the genotype file) or externally (i.e., input in the mapping file). Under the additive genetic model (default), the test statistics are based on one or several sets of genetic scores that are calculated by a weighted sum of mutation counts for each subject. A set of genetic scores corresponds to a specifically defined weight function. A description of the genetic score and weight function for each test is given in the OPTIONS section below. A fixed-threshold test only involves one set of genetic scores in the test statistic, while a variable threshold test involves multiple sets of genetic scores. We perform three fixed-threshold tests (T1, T5 and Fp) plus one variable-threshold test (VT test). T1 and T5 pertain to the MAF thresholds of 1% and 5%, respectively. The user may request any threshold less than 5% (e.g. 3% or 0.5%) by setting the MAF upper bound to the desired threshold. Asymptotic p-values are provided by default while resampling p-values can be generated by using the option -resample. The software also outputs the p-value of the EREC test (for detecting variants with opposite effects) if resampling is turned on. In addition, the T1, T5 and VT tests under the dominant genetic model can be obtained by using the option -dominant. In that case, all the tests based on the additive genetic model are suppressed. Besides the rare variant analysis described above, the user can conduct single variant analysis for common SNPs by using the option -com. To suppress the rare variant analysis, use the option -noRare.

No comments:

Post a Comment

Datanami, Woe be me