Motivation: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics.
Results: The ultrafast protein classification (UProC) toolbox implements a novel algorithm (‘Mosaic Matching’) for large-scale sequence analysis. UProC is by three orders of magnitude faster than profile-based methods and in a metagenome simulation study achieved up to 80% higher sensitivity on unassembled 100 bp reads.
Availability and implementation: UProC is available as an open-source software at https://github.com/gobics/uproc. Precompiled databases (Pfam) are linked on the UProC homepage: http://uproc.gobics.de/.
Contact: peter@gobics.de.
Supplementary information: Supplementary data are available at Bioinformatics online.