Motivation: In cancer genomics research, one important problem is that the solid tissue sample obtained from clinical settings is always a mixture of cancer and normal cells. The sample mixture brings complication in data analysis and results in biased findings if not correctly accounted for. Estimating tumor purity is of great interest, and a number of methods have been developed using gene expression, copy number variation or point mutation data.
Results: We discover that in cancer samples, the distributions of data from Illumina Infinium 450k methylation microarray are highly correlated with tumor purities. We develop a simple but effective method to estimate purities from the microarray data. Analyses of the TCGA lung cancer data demonstrate favorable performance of the proposed method.
Availability: The method is implemented in InfiniumPurify, which is freely available at https://bitbucket.org/zhengxiaoqi/infiniumpurify.
Contact: xqzheng@shnu.edu.cn; hao.wu@emory.edu