Bioinformatics

QTLMiner: QTL database curation by mining tables in literature

J. Peng, X. Shi, Y. Sun, D. Li, B. Liu, F. Kong, X. Yuan.

Motivation: Figures and tables in biomedical literature record vast amounts of important experiment results. In scientific papers, for example, quantitative trait locus (QTL) information is usually presented in tables. However, most of the popular text-mining methods focus on extracting knowledge from unstructured free text. As far as we know, there are no published works on mining tables in biomedical literature. In this article, we propose a method to extract QTL information from tables and plain text found in literature. Heterogeneous and complex tables were converted into a structured database, combined with information extracted from plain text. Our method could greatly reduce labor burdens involved with database curation.

Results: We applied our method on a soybean QTL database curation, from which 2278 records were extracted from 228 papers with a precision rate of 96.9% and a recall rate of 83.3%, F value for the method is 89.6%.

Availability and implementation: QTLMiner is available at www.soyomics.com/qtlminer/.

Contact: yuanxh@iga.ac.cn