Bioinformatics

IVA: accurate de novo assembly of RNA virus genomes

Hunt, M., Gall, A., Ong, S. H., Brener, J., Ferns, B., Goulder, P., Nastouli, E., Keane, J. A., Kellam, P., Otto, T. D..

Motivation: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods.

Results: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers.

Availability and implementation: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva

Contact: iva@sanger.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.