Colorectal cancer drug target prediction using ontology-based inference and network analysis

基于本体的推理和网络分析大肠癌的药物靶标预测

Identification of novel drug targets is a critical step in drug development. Many recent studies have produced multiple types of data, which provides an opportunity to mine the relationships among them to predict drug targets. In this study, we present a novel integrative approach that combines ontology reasoning with network-assisted gene ranking to predict new drug targets. We utilized colorectal cancer (CRC) as a proof-of-concept use case to illustrate the approach. Starting from FDA-approved CRC drugs and the relationships among disease, drug, gene, pathway, and SNP in an ontology representing PharmGKB data, we inferred 113 potential CRC drug targets. We further prioritized these genes based on their relationships with CRC disease genes in the context of human protein–protein interaction networks. Thus, among the 113 potential drug targets, 15 were selected as the promising drug targets, including some genes that are supported by previous studies. Among them, EGFR, TOP1 and VEGFA are known targets of FDA-approved drugs. Additionally, CCND1 (cyclin D1), and PTGS2 (prostaglandin-endoperoxide synthase 2) have reported to be relevant to CRC or as potential drug targets based on the literature search. These results indicate that our approach is promising for drug target prediction for CRC treatment, which might be useful for other cancer therapeutics.

[详细]

  • Database
  • 9年前
  • Original Article

LOTUS-DB: an integrative and interactive database for Nelumbo nucifera study

lotus-db:整合和互动的荷花研究数据库

Besides its important significance in plant taxonomy and phylogeny, sacred lotus (Nelumbo nucifera Gaertn.) might also hold the key to the secrets of aging, which attracts crescent attentions from researchers all over the world. The genetic or molecular studies on this species depend on its genome information. In 2013, two publications reported the sequencing of its full genome, based on which we constructed a database named as LOTUS-DB. It will provide comprehensive information on the annotation, gene function and expression for the sacred lotus. The information will facilitate users to efficiently query and browse genes, graphically visualize genome and download a variety of complex data information on genome DNA, coding sequence (CDS), transcripts or peptide sequences, promoters and markers. It will accelerate researches on gene cloning, functional identification of sacred lotus, and hence promote the studies on this species and plant genomics as well.

Database URL: http://lotus-db.wbgcas.cn.

[详细]

  • Database
  • 9年前
  • Original Article

Standard development at the Human Variome Project

在人类基因变异组计划标准的发展

The Human Variome Project (HVP) is a world organization working towards facilitating the collection, curation, interpretation and free and open sharing of genetic variation information. A key component of HVP activities is the development of standards and guidelines. HVP Standards are systems, procedures and technologies that the HVP Consortium has determined must be used by HVP-affiliated data sharing infrastructure and should be used by the broader community. HVP guidelines are considered to be beneficial for HVP affiliated data sharing infrastructure and the broader community to adopt. The HVP also maintains a process for assessing systems, processes and tools that implement HVP Standards and Guidelines. Recommended System Status is an accreditation process designed to encourage the adoption of HVP Standards and Guidelines. Here, we describe the HVP standards development process and discuss the accepted standards, guidelines and recommended systems as well as those under acceptance. Certain HVP Standards and Guidelines are already widely adopted by the community and there are committed users for the others.

[详细]

  • Database
  • 9年前
  • Original Article

SCAN database: facilitating integrative analyses of cytosine modification and expression QTL

扫描数据库:促进综合分析胞嘧啶修饰和表达的QTL

Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and copy number variations (CNV) promises to greatly improve our understanding of human complex traits. Previous transcriptomic studies involving individuals from different global populations have investigated the genetic architecture of gene expression variation by mapping expression quantitative trait loci (eQTL). Functional interpretation of genome-wide association studies (GWAS) has identified enrichment of eQTL in top signals from GWAS of human complex traits. The SCAN (SNP and CNV Annotation) database was developed as a web-based resource of genetical genomic studies including eQTL detected in the HapMap lymphoblastoid cell line samples derived from apparently healthy individuals of European and African ancestry. Considering the critical roles of epigenetic gene regulation, cytosine modification quantitative trait loci (mQTL) are expected to add a crucial layer of annotation to existing functional genomic information. Here, we describe the new features of the SCAN database that integrate comprehensive mQTL mapping results generated in the HapMap CEU (Caucasian residents from Utah, USA) and YRI (Yoruba people from Ibadan, Nigeria) LCL samples and demonstrate the utility of the enhanced functional annotation system.

Database URL: http://www.scandb.org/

[详细]

  • Database
  • 9年前
  • Database Update

Determining similarity of scientific entities in annotation datasets

在标注数据的科学实体相似度确定

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures.

Database URL: http://www.yeastgenome.org/

[详细]

  • Database
  • 9年前
  • Original Article

A new approach for annotation of transposable elements using small RNA mapping

为利用小RNA转座子注释映射的一种新方法

Transposable elements (TEs) are mobile genomic DNA sequences found in most organisms. They so densely populate the genomes of many eukaryotic species that they are often the major constituents. With the rapid generation of many plant genome sequencing projects over the past few decades, there is an urgent need for improved TE annotation as a prerequisite for genome-wide studies. Analogous to the use of RNA-seq for gene annotation, we propose a new method for de novo TE annotation that uses as a guide 24 nt-siRNAs that are a part of TE silencing pathways. We use this new approach, called TASR (for Transposon Annotation using Small RNAs), for de novo annotation of TEs in Arabidopsis, rice and soybean and demonstrate that this strategy can be successfully applied for de novo TE annotation in plants.

Executable PERL is available for download from: http://tasr-pipeline.sourceforge.net/

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

RiceNet v2: an improved network prioritization server for rice genes

ricenet V2:一种改进的水稻基因网络优先级的服务器

Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Post-conversion targeted capture of modified cytosines in mammalian and plant genomes

转换后的靶向修饰胞嘧啶在哺乳动物和植物基因组的捕获

We present a capture-based approach for bisulfite-converted DNA that allows interrogation of pre-defined genomic locations, allowing quantitative and qualitative assessments of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) at CG dinucleotides and in non-CG contexts (CHG, CHH) in mammalian and plant genomes. We show the technique works robustly and reproducibly using as little as 500 ng of starting DNA, with results correlating well with whole genome bisulfite sequencing data, and demonstrate that human DNA can be tested in samples contaminated with microbial DNA. This targeting approach will allow cell type-specific designs to maximize the value of 5mC and 5hmC sequencing.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Intramolecular circularization increases efficiency of RNA sequencing and enables CLIP-Seq of nuclear RNA from human cells

分子内的环化增加RNA测序效率,使人类细胞的核RNA片段序列

RNA sequencing (RNA-Seq) is a powerful tool for analyzing the identity of cellular RNAs but is often limited by the amount of material available for analysis. In spite of extensive efforts employing existing protocols, we observed that it was not possible to obtain useful sequencing libraries from nuclear RNA derived from cultured human cells after crosslinking and immunoprecipitation (CLIP). Here, we report a method for obtaining strand-specific small RNA libraries for RNA sequencing that requires picograms of RNA. We employ an intramolecular circularization step that increases the efficiency of library preparation and avoids the need for intermolecular ligations of adaptor sequences. Other key features include random priming for full-length cDNA synthesis and gel-free library purification. Using our method, we generated CLIP-Seq libraries from nuclear RNA that had been UV-crosslinked and immunoprecipitated with anti-Argonaute 2 (Ago2) antibody. Computational protocols were developed to enable analysis of raw sequencing data and we observe substantial differences between recognition by Ago2 of RNA species in the nucleus relative to the cytoplasm. This RNA self-circularization approach to RNA sequencing (RC-Seq) allows data to be obtained using small amounts of input RNA that cannot be sequenced by standard methods.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

EpiToolKit--a web-based workbench for vaccine design

为疫苗设计EpiToolKit——一个基于web的工作台

Summary: EpiToolKit is a virtual workbench for immunological questions with a focus on vaccine design. It offers an array of immunoinformatics tools covering MHC genotyping, epitope and neo-epitope prediction, epitope selection for vaccine design, and epitope assembly. In its recently re-implemented version 2.0, EpiToolKit provides a range of new functionality and for the first time allows combining tools into complex workflows. For inexperienced users it offers simplified interfaces to guide the users through the analysis of complex immunological data sets.

Availability and implementation: http://www.epitoolkit.de

Contact: schubert@informatik.uni-tuebingen.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

tEFMA: computing thermodynamically feasible elementary flux modes in metabolic networks

TEFMA: computing thermodynamically feasible elementary school flux modes in metabolic networks

Summary: Elementary flux modes (EFMs) are important structural tools for the analysis of metabolic networks. It is known that many topologically feasible EFMs are biologically irrelevant. Therefore, tools are needed to find the relevant ones. We present thermodynamic tEFM analysis (tEFMA) which uses the cellular metabolome to avoid the enumeration of thermodynamically infeasible EFMs. Specifically, given a metabolic network and a not necessarily complete metabolome, tEFMA efficiently returns the full set of thermodynamically feasible EFMs consistent with the metabolome. Compared with standard approaches, tEFMA strongly reduces the memory consumption and the overall runtime. Thus tEFMA provides a new way to analyze unbiasedly hitherto inaccessible large-scale metabolic networks.

Availability and implementation: https://github.com/mpgerstl/tEFMA

Contact: christian.jungreuthmayer@boku.ac.at or juergen.zanghellini@boku.ac.at

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution

比较研究的有效性和局限性,目前检测序列共同进化的方法

Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings.

Results: Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources.

Availability and implementation: Software is freely available through the Evol component of ProDy API.

Contact: bahar@pitt.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Folding RaCe: a robust method for predicting changes in protein folding rates upon point mutations

折叠竞赛:一个健壮的方法预测蛋白质折叠速率的变化点突变

Motivation: Protein engineering methods are commonly employed to decipher the folding mechanism of proteins and enzymes. However, such experiments are exceedingly time and resource intensive. It would therefore be advantageous to develop a simple computational tool to predict changes in folding rates upon mutations. Such a method should be able to rapidly provide the sequence position and chemical nature to modulate through mutation, to effect a particular change in rate. This can be of importance in protein folding, function or mechanistic studies.

Results: We have developed a robust knowledge-based methodology to predict the changes in folding rates upon mutations formulated from amino and acid properties using multiple linear regression approach. We benchmarked this method against an experimental database of 790 point mutations from 26 two-state proteins. Mutants were first classified according to secondary structure, accessible surface area and position along the primary sequence. Three prime amino acid features eliciting the best relationship with folding rates change were then shortlisted for each class along with an optimized window length. We obtained a self-consistent mean absolute error of 0.36 s–1 and a mean Pearson correlation coefficient (PCC) of 0.81. Jack-knife test resulted in a MAE of 0.42 s–1 and a PCC of 0.73. Moreover, our method highlights the importance of outlier(s) detection and studying their implications in the folding mechanism.

Availability and implementation: A web server ‘Folding RaCe’ has been developed and is available at http://www.iitm.ac.in/bioinfo/proteinfolding/foldingrace.html.

Contact: gromiha@iitm.ac.in

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

GeNOSA: inferring and experimentally supporting quantitative gene regulatory networks in prokaryotes

GeNOSA:推理和实验支持定量在原核生物基因调控网络

Motivation: The establishment of quantitative gene regulatory networks (qGRNs) through existing network component analysis (NCA) approaches suffers from shortcomings such as usage limitations of problem constraints and the instability of inferred qGRNs. The proposed GeNOSA framework uses a global optimization algorithm (OptNCA) to cope with the stringent limitations of NCA approaches in large-scale qGRNs.

Results: OptNCA performs well against existing NCA-derived algorithms in terms of utilization of connectivity information and reconstruction accuracy of inferred GRNs using synthetic and real Escherichia coli datasets. For comparisons with other non-NCA-derived algorithms, OptNCA without using known qualitative regulations is also evaluated in terms of qualitative assessments using a synthetic Saccharomyces cerevisiae dataset of the DREAM3 challenges. We successfully demonstrate GeNOSA in several applications including deducing condition-dependent regulations, establishing high-consensus qGRNs and validating a sub-network experimentally for dose–response and time–course microarray data, and discovering and experimentally confirming a novel regulation of CRP on AscG.

Availability and implementation: All datasets and the GeNOSA framework are freely available from http://e045.life.nctu.edu.tw/GeNOSA.

Contact: syho@mail.nctu.edu.tw

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

WGBSSuite: simulating whole-genome bisulphite sequencing data and benchmarking differential DNA methylation analysis tools

WGBSSuite:模拟全基因组亚硫酸氢盐测序数据和基准差DNA甲基化分析工具

Motivation: As the number of studies looking at differences between DNA methylation increases, there is a growing demand to develop and benchmark statistical methods to analyse these data. To date no objective approach for the comparison of these methods has been developed and as such it remains difficult to assess which analysis tool is most appropriate for a given experiment. As a result, there is an unmet need for a DNA methylation data simulator that can accurately reproduce a wide range of experimental setups, and can be routinely used to compare the performance of different statistical models.

Results: We have developed WGBSSuite, a flexible stochastic simulation tool that generates single-base resolution DNA methylation data genome-wide. Several simulator parameters can be derived directly from real datasets provided by the user in order to mimic real case scenarios. Thus, it is possible to choose the most appropriate statistical analysis tool for a given simulated design. To show the usefulness of our simulator, we also report a benchmark of commonly used methods for differential methylation analysis.

Availability and implementation: WGBS code and documentation are available under GNU licence at http://www.wgbssuite.org.uk/

Contact: owen.rackham@imperial.ac.uk or l.bottolo@imperial.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Robust network structure of the Sln1-Ypd1-Ssk1 three-component phospho-relay prevents unintended activation of the HOG MAPK pathway in Saccharomyces cerevisiae

健壮的网络结构Sln1-Ypd1-Ssk1三分量phospho-relay防止意想不到的猪MAPK通路的激活<,>酿酒酵母< / >

Background: The yeast Saccharomyces cerevisiae relies on the high-osmolarity glycerol (HOG) signaling pathway to respond to increases in external osmolarity. The HOG pathway is rapidly activated under conditions of elevated osmolarity and regulates transcriptional and metabolic changes within the cell. Under normal growth conditions, however, a three-component phospho-relay consisting of the histidine kinase Sln1, the transfer protein Ypd1, and the response regulator Ssk1 represses HOG pathway activity by phosphorylation of Ssk1. This inhibition of the HOG pathway is essential for cellular fitness in normal osmolarity. Nevertheless, the extent to and mechanisms by which inhibition is robust to fluctuations in the concentrations of the phospho-relay components has received little attention. Results: We established that the Sln1-Ypd1-Ssk1 phospho-relay is robust—it is able to maintain inhibition of the HOG pathway even after significant changes in the levels of its three components. We then developed a biochemically realistic mathematical model of the phospho-relay, which suggested that robustness is due to buffering by a large excess pool of Ypd1. We confirmed experimentally that depletion of the Ypd1 pool results in inappropriate activation of the HOG pathway. Conclusions: We identified buffering by an intermediate component in excess as a novel mechanism through which a phospho-relay can achieve robustness. This buffering requires multiple components and is therefore unavailable to two-component systems, suggesting one important advantage of multi-component relays.

[详细]

  • BMC Systems Biology 2015, null:17
  • 9年前

RAPTR-SV: a hybrid method for the detection of structural variants

raptr-sv:对于结构变异的检测方法

Motivation: Identification of structural variants (SVs) in sequence data results in a large number of false positive calls using existing software, which overburdens subsequent validation.

Results: Simulations using RAPTR-SV and other, similar algorithms for SV detection revealed that RAPTR-SV had superior sensitivity and precision, as it recovered 66.4% of simulated tandem duplications with a precision of 99.2%. When compared with calls made by Delly and LUMPY on available datasets from the 1000 genomes project, RAPTR-SV showed superior sensitivity for tandem duplications, as it identified 2-fold more duplications than Delly, while making ~85% fewer duplication predictions.

Availability and implementation: RAPTR-SV is written in Java and uses new features in the collections framework in the latest release of the Java version 8 language specifications. A compiled version of the software, instructions for usage and test results files are available on the GitHub repository page: https://github.com/njdbickhart/RAPTR-SV.

Contact: derek.bickhart@ars.usda.gov

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Fast, accurate, and reliable molecular docking with QuickVina 2

快速,准确,可靠的分子对接quickvina 2

Motivation: The need for efficient molecular docking tools for high-throughput screening is growing alongside the rapid growth of drug-fragment databases. AutoDock Vina (‘Vina’) is a widely used docking tool with parallelization for speed. QuickVina (‘QVina 1’) then further enhanced the speed via a heuristics, requiring high exhaustiveness. With low exhaustiveness, its accuracy was compromised. We present in this article the latest version of QuickVina (‘QVina 2’) that inherits both the speed of QVina 1 and the reliability of the original Vina.

Results: We tested the efficacy of QVina 2 on the core set of PDBbind 2014. With the default exhaustiveness level of Vina (i.e. 8), a maximum of 20.49-fold and an average of 2.30-fold acceleration with a correlation coefficient of 0.967 for the first mode and 0.911 for the sum of all modes were attained over the original Vina. A tendency for higher acceleration with increased number of rotatable bonds as the design variables was observed. On the accuracy, Vina wins over QVina 2 on 30% of the data with average energy difference of only 0.58 kcal/mol. On the same dataset, GOLD produced RMSD smaller than 2 Å on 56.9% of the data while QVina 2 attained 63.1%.

Availability and implementation: The C++ source code of QVina 2 is available at (www.qvina.org).

Contact: aalhossary@pmail.ntu.edu.sg

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Automated benchmarking of peptide-MHC class I binding predictions

自动化peptide-MHC类我绑定的基准预测

Motivation: Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study.

Results: The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB.

Availability and implementation: Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join.

Contact: mniel@cbs.dtu.dk or bpeters@liai.org

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Extending P450 site-of-metabolism models with region-resolution data

延伸的P450代谢部位模型分辨率数据区域

Motivation: Cytochrome P450s are a family of enzymes responsible for the metabolism of approximately 90% of FDA-approved drugs. Medicinal chemists often want to know which atoms of a molecule—its metabolized sites—are oxidized by Cytochrome P450s in order to modify their metabolism. Consequently, there are several methods that use literature-derived, atom-resolution data to train models that can predict a molecule’s sites of metabolism. There is, however, much more data available at a lower resolution, where the exact site of metabolism is not known, but the region of the molecule that is oxidized is known. Until now, no site-of-metabolism models made use of region-resolution data.

Results: Here, we describe XenoSite-Region, the first reported method for training site-of-metabolism models with region-resolution data. Our approach uses the Expectation Maximization algorithm to train a site-of-metabolism model. Region-resolution metabolism data was simulated from a large site-of-metabolism dataset, containing 2000 molecules with 3400 metabolized and 30 000 un-metabolized sites and covering nine Cytochrome P450 isozymes. When training on the same molecules (but with only region-level information), we find that this approach yields models almost as accurate as models trained with atom-resolution data. Moreover, we find that atom-resolution trained models are more accurate when also trained with region-resolution data from additional molecules. Our approach, therefore, opens up a way to extend the applicable domain of site-of-metabolism models into larger regions of chemical space. This meets a critical need in drug development by tapping into underutilized data commonly available in most large drug companies.

Availability and implementation: The algorithm, data and a web server are available at http://swami.wustl.edu/xregion.

Contact: swamidass@wustl.edu

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Sambamba: fast processing of NGS alignment formats

sambamba:NGS对齐格式的快速处理

Summary: Sambamba is a high-performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability.

Availability and implementation: Sambamba is free and open source software, available under a GPLv2 license. Sambamba can be downloaded and installed from http://www.open-bio.org/wiki/Sambamba.

Sambamba v0.5.0 was released with doi:10.5281/zenodo.13200.

Contact: j.c.p.prins@umcutrecht.nl

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

CycleFreeFlux: efficient removal of thermodynamically infeasible loops from flux distributions

cyclefreeflux:有效去除通量分布的热力学不可行的循环

Motivation: Constraint-based metabolic modeling methods such as Flux Balance Analysis (FBA) are routinely used to predict metabolic phenotypes, e.g. growth rates, ATP yield or the fitness of gene knockouts. One frequent difficulty of constraint-based solutions is the inclusion of thermodynamically infeasible loops (or internal cycles), which add nonbiological fluxes to the predictions.

Results: We propose a simple postprocessing of constraint-based solutions, which removes internal cycles from any given flux distribution $${v}^{(0)}$$ without disturbing other fluxes not involved in the loops. This new algorithm, termed CycleFreeFlux, works by minimizing the sum of absolute fluxes $$||v|{|}_{1}$$ while (i) conserving the exchange fluxes and (ii) using the fluxes of the original solution to bound the new flux distribution. This strategy reduces internal fluxes until at least one reaction of every possible internal cycle is inactive, a necessary and sufficient condition for the thermodynamic feasibility of a flux distribution. If alternative representations of the input flux distribution in terms of elementary flux modes exist that differ in their inclusion of internal cycles, then CycleFreeFlux is biased towards solutions that maintain the direction given by $${v}^{(0)}$$ and towards solutions with lower total flux $$||v|{|}_{1}$$. Our method requires only one additional linear optimization, making it computationally very efficient compared to alternative strategies.

Availability and implementation: We provide freely available R implementations for the enumeration of thermodynamically infeasible cycles as well as for cycle-free FBA solutions, flux variability calculations and random sampling of solution spaces.

Contact: lercher@cs.uni-duesseldorf.de

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Evolutionary profiles improve protein-protein interaction prediction from sequence

演化剖面改善蛋白质相互作用预测序列

Motivation: Many methods predict the physical interaction between two proteins (protein-protein interactions; PPIs) from sequence alone. Their performance drops substantially for proteins not used for training.

Results: Here, we introduce a new approach to predict PPIs from sequence alone which is based on evolutionary profiles and profile-kernel support vector machines. It improved over the state-of-the-art, in particular for proteins that are sequence-dissimilar to proteins with known interaction partners. Filtering by gene expression data increased accuracy further for the few, most reliably predicted interactions (low recall). The overall improvement was so substantial that we compiled a list of the most reliably predicted PPIs in human. Our method makes a significant difference for biology because it improves most for the majority of proteins without experimental annotations.

Availability and implementation: Implementation and most reliably predicted human PPIs available at https://rostlab.org/owiki/index.php/Profppikernel.

Contact: rost@in.tum.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data

rSeqNP:非参数方法检测从RNA-Seq数据微分表达式和拼接

Summary: High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data.

Availability and implementation: The R package with its source code and documentation are freely available at http://www-personal.umich.edu/~jianghui/rseqnp/.

Contact: jianghui@umich.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE