Nucleic Acids Research

Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization

Locati, M. D., Terpstra, I., de Leeuw, W. C., Kuzak, M., Rauwerda, H., Ensink, W. A., van Leeuwen, S., Nehrdich, U., Spaink, H. P., Jonker, M. J., Breit, T. M., Dekker, R. J..

There is an increasing interest in complementing RNA-seq experiments with small-RNA (sRNA) expression data to obtain a comprehensive view of a transcriptome. Currently, two main experimental challenges concerning sRNA-seq exist: how to check the size distribution of isolated sRNAs, given the sensitive size-selection steps in the protocol; and how to normalize data between samples, given the low complexity of sRNA types. We here present two separate sets of synthetic RNA spike-ins for monitoring size-selection and for performing data normalization in sRNA-seq. The size-range quality control (SRQC) spike-in set, consisting of 11 oligoribonucleotides (10–70 nucleotides), was tested by intentionally altering the size-selection protocol and verified via several comparative experiments. We demonstrate that the SRQC set is useful to reproducibly track down biases in the size-selection in sRNA-seq. The external reference for data-normalization (ERDN) spike-in set, consisting of 19 oligoribonucleotides, was developed for sample-to-sample normalization in differential-expression analysis of sRNA-seq data. Testing and applying the ERDN set showed that it can reproducibly detect differential expression over a dynamic range of 218. Hence, biological variation in sRNA composition and content between samples is preserved while technical variation is effectively minimized. Together, both spike-in sets can significantly improve the technical reproducibility of sRNA-seq.