Bioinformatics

protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences

Xiao, N., Cao, D.-S., Zhu, M.-F., Xu, Q.-S..

Summary: Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package.

Availability and implementation: The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.

Contact: oriental-cds@163.com or dasongxu@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.