Bioinformatics

UProC: tools for ultra-fast protein domain classification

P. Meinicke.

Motivation: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics.

Results: The ultrafast protein classification (UProC) toolbox implements a novel algorithm (‘Mosaic Matching’) for large-scale sequence analysis. UProC is by three orders of magnitude faster than profile-based methods and in a metagenome simulation study achieved up to 80% higher sensitivity on unassembled 100 bp reads.

Availability and implementation: UProC is available as an open-source software at https://github.com/gobics/uproc. Precompiled databases (Pfam) are linked on the UProC homepage: http://uproc.gobics.de/.

Contact: peter@gobics.de.

Supplementary information: Supplementary data are available at Bioinformatics online.