It is widely believed that both common and rare variants contribute to the risks of common diseases or complex traits and the cumulative effects of multiple rare variants can explain a significant proportion of trait variances. Advances in high-throughput DNA sequencing technologies allow us to genotype rare causal variants and investigate the effects of such rare variants on complex traits. We developed an adaptive ridge regression method to analyze the collective effects of multiple variants in the same gene or the same functional unit. Our model focuses on continuous trait and incorporates covariate factors to remove potential confounding effects. The proposed method estimates and tests multiple rare variants collectively but does not depend on the assumption of same direction of each rare variant effect. Compared with the Bayesian hierarchical generalized linear model approach, the state-of-the-art method of rare variant detection, the proposed new method is easy to implement, yet it has higher statistical power. Application of the new method is demonstrated using the well-known data from the Dallas Heart Study.
Citation: Zhan H, Xu S (2012) Adaptive Ridge Regression for Rare Variant Detection. PLoS ONE 7(8): e44173. doi:10.1371/journal.pone.0044173
Editor: Momiao Xiong, University of Texas School of Public Health, United States of America
Received: May 28, 2012; Accepted: July 30, 2012; Published: August 28, 2012
We believe that the high power of the BhGLM is due to (1) appropriate combination of the individual rare variants (the new score) and (2) assignment of the hierarchical priors. After a thorough evaluation of these new methods, we found that there is still some room for improvement. The new score of the BhGLM method is a first moment parameter (shared effect). An alternative score may be a second moment parameter (shared variance). The Yi and Zhi's method requires prior distributions, and thus different priors may produce different results. The hyper-parameters involved in the prior distributions may also affect the results. In this study, we proposed to use a shared variance among rare variants as the new score. The method is originally called ridge regression . It is further modified to discriminate against rare variants with small effects. This modified ridge regression is called the adaptive ridge regression . The adaptive ridge regression (ARR) is performed under the maximum likelihood framework, and thus prior distributions of parameters are not required, equivalent to independent uniform priors for all parameters.
The method is called sequence kernel association test (SKAT). After reading this paper, we agreed that our approach is similar to SKAT. However, SKAT only gives the score test and no parameter estimation is provided. This explains why SKAT is fast computationally. There are three major advantages of the adaptive ridge regression. First, a high score test does not mean the effects are large. It may be caused by small effects but large sample size. The score test cannot tell the difference. Our method not only provides a test but also an estimate of the group variance. We can provide a total proportion of the phenotypic variance contributed by the rare variants. Secondly, we introduced an adaptive step to the original ridge regression. This step plays the role of "weighting" of the SKAT method but it can "homogenize" the effect of each rare variant within a group. The ridge regression performs better under the "homogenized" rare variant effect assumption. Thirdly, our method works for both rare and common variants. However, the SKAT method was particularly designed for rare variants because the "weights" for the common variants will be almost zero (excluded from the model), according to the authors of that paper. There is a possibility to use the score test under our adaptive ridge regression framework. The estimation procedure will remain the same, but we may simply replace the likelihood ratio test by the score test. The "weights" obtained from the adaptive ridge regression will be used in the score test. This needs to be further investigated.
The new method is developed for continuous traits under the linear mixed model framework . In many situations, the trait of interest may be a binary trait. The generalized linear mixed model (GLMM), which is an extension of the linear mixed model, can be used to analyze the association of multiple rare variants and a binary trait. This extension is very straight forward because the methodology of GLMM has been well established. The simple extension includes the adaptive steps.
Finally, we performed all the analyses using an R program. The R package is called Adaptive Ridge Regression (ARR) which can be downloaded from the authors' personal website: www.statgen.ucr.edu.