Molecular Biology and Evolution

Reassessing the "Duon" Hypothesis of Protein Evolution

Xing, K., He, X..

There are two distinct types of DNA sequences, namely coding sequences and regulatory sequences, in a genome. A recent study of the occupancy of transcription factors (TFs) in human cells suggested that protein-coding sequences also serve as the codes of TF occupancy, and proposed a "duon" hypothesis in which up to 15% of codons of human protein genes are constrained by the additional coding requirements that regulate gene expression. This hypothesis challenges our basic understanding on the human genome. We reanalyzed the data and found that the previous study was confounded by ascertainment bias related to base composition. Using an unbiased comparison in which G/C and A/T sites are considered separately, we reveal a similar level of conservation between TF-bound codons and TF-depleted codons, suggesting largely no extra purifying selection provided by the TF occupancy on the codons of human genes. Given the generally short binding motifs of TFs and the open chromatin structure during transcription, we argue that the occupancy of TFs on protein-coding sequences is mostly passive and evolutionarily neutral, with to-be-determined functions in the regulation of gene expression.