Bioinformatics

PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies

G. Yang, W. Jiang, Q. Yang, W. Yu.

Motivation: The importance of testing associations allowing for interactions has been demonstrated by Marchini et al. (2005). A fast method detecting associations allowing for interactions has been proposed by Wan et al. (2010a). The method is based on likelihood ratio test with the assumption that the statistic follows the 2 distribution. Many single nucleotide polymorphism (SNP) pairs with significant associations allowing for interactions have been detected using their method. However, the assumption of 2 test requires the expected values in each cell of the contingency table to be at least five. This assumption is violated in some identified SNP pairs. In this case, likelihood ratio test may not be applicable any more. Permutation test is an ideal approach to checking the P-values calculated in likelihood ratio test because of its non-parametric nature. The P-values of SNP pairs having significant associations with disease are always extremely small. Thus, we need a huge number of permutations to achieve correspondingly high resolution for the P-values. In order to investigate whether the P-values from likelihood ratio tests are reliable, a fast permutation tool to accomplish large number of permutations is desirable.

Results: We developed a permutation tool named PBOOST. It is based on GPU with highly reliable P-value estimation. By using simulation data, we found that the P-values from likelihood ratio tests will have relative error of >100% when 50% cells in the contingency table have expected count less than five or when there is zero expected count in any of the contingency table cells. In terms of speed, PBOOST completed 107 permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC) genome data (Wellcome Trust Case Control Consortium, 2007) within 1 min on a single Nvidia Tesla M2090 device, while it took 60 min in a single CPU Intel Xeon E5-2650 to finish the same task. More importantly, when simultaneously testing 256 SNP pairs for 107 permutations, our tool took only 5 min, while the CPU program took 10 h. By permuting on a GPU cluster consisting of 40 nodes, we completed 1012 permutations for all 280 SNP pairs reported with P-values smaller than $$1.6\times {10}^{-12}$$ in the WTCCC datasets in 1 week.

Availability and implementation: The source code and sample data are available at http://bioinformatics.ust.hk/PBOOST.zip.

Contact: gyang@ust.hk; eeyu@ust.hk

Supplementary information: Supplementary data are available at Bioinformatics online.