Genomewide association has been a powerful tool for detecting common disease variants. However, this approach has been underpowered in identifying variation that is poorly represented on commercial SNP arrays, being too rare or population-specific. Recent multipoint methods including SNP tagging and imputation boost the power of detecting and localizing the true causal variant, leveraging common haplotypes in a densely typed panel of reference samples. However, they are limited by the need to obtain a robust population-specific reference panel with sampling deep enough to observe a rare variant of interest. We set out to overcome these challenges by using long stretches of genomic sharing that are identical by descent (IBD). We use such evident sharing between pairs and small subsets of individuals to recover the underlying shared haplotypes that have been co-inherited by these individuals.
We have created a software tool, DASH (DASH Associates Shared Haplotypes), that builds upon pairwise IBD shared segments to infer clusters of IBD individuals. Briefly, for each locus, DASH constructs a graph with links based on IBD at that locus, and uses an iterative min-cut approach to identify clusters. These are densely connected components, each sharing a haplotype. As DASH slides the local window along the genome, links representing new shared segments are added and old ones expire; these changes cause the resultant connected components to grow and shrink. We code the corresponding haplotypes as genetic markers and use them for association testing.