Motivation: The high-throughput sequencing technologies have provided a powerful tool to study the microbial organisms living in various environments. Characterizing microbial interactions can give us insights into how they live and work together as a community. Metagonomic data are usually summarized in a compositional fashion due to varying sampling/sequencing depths from one sample to another. We study the co-occurrence patterns of microbial organisms using their relative abundance information. Analyzing compositional data using conventional correlation methods has been shown prone to bias that leads to artifactual correlations.
Results: We propose a novel method, REBACCA, to identify significant co-occurrence patterns by finding sparse solutions to a system with a deficient rank. To be specific, we construct the system using log ratios of count data and solve the system using the l1-norm shrinkage method. Our comprehensive simulation studies show that REBACCA 1) achieves higher accuracy in general than the existing methods when a sparse condition is satisfied; 2) controls the false positives at a pre-specified level, while other methods fail in various cases; and 3) runs considerably faster than the existing comparable method. REBACCA is also applied to several real metagenomic datasets.
Availability: Availability: The R codes for the proposed method are available at http://faculty.wcas.northwestern.edu/~hji403/REBACCA.htm
Contact: hongmei@northwestern.edu
Supplementary information: