Bioinformatics

Learning HMMs for nucleotide sequences from amino acid alignments

Fischer, C. N., Carareto, C. M. A., dos Santos, R. A. C., Cerri, R., Costa, E., Schietgat, L., Vens, C..

Profile hidden Markov models (profile HMMs) are known to efficiently predict whether an amino acid (AA) sequence belongs to a specific protein family. Profile HMMs can also be used to search for protein domains in genome sequences. In this case, HMMs are typically learned from AA sequences and then used to search on the six-frame translation of nucleotide (NT) sequences. However, this approach demands additional processing of the original data and search results. Here, we propose an alternative and more direct method which converts an AA alignment into an NT one, after which an NT-based HMM is trained to be applied directly on a genome.

Contact: carlos@rc.unesp.br

Supplementary information: Supplementary data are available at Bioinformatics online.