BMC Bioinformatics 2015, null:178

Ambivalent covariance models

Stefan Janssen, Robert Giegerich.
Background: Evolutionary variations let us define a set of similar nucleic acid sequences as a family if these different molecules execute a common function. Capturing their sequence variation by using e. g. position specific scoring matrices significantly improves sensitivity of detection tools. Members of a functional (non-coding) RNA family are affected by these variations not only on the sequence, but also on the structural level. For example, some transfer-RNAs exhibit a fifth helix in addition to the typical cloverleaf structure. Current covariance models – the unrivaled homology search approach for structured RNA – do not benefit from structural variation within a family, but rather penalize it. This leads to artificial subdivision of families and loss of information in the Rfam database. Results: We propose an extension to the fundamental architecture of covariance models to allow for several, compatible consensus structures. The resulting models are called ambivalent covariance models. Evaluation on several Rfam families shows that coalescence of structural variation within a family by using ambivalent consensus models is superior to subdividing the family into multiple classical covariance models. Conclusion: A prototype and source code is available at http://bibiserv.cebitec.uni-bielefeld.de/acms.