KeBABS: an R package for kernel-based analysis of biological sequences

烤肉串:R包基于生物序列分析

Summary: KeBABS provides a powerful, flexible and easy to use framework for kernel-based analysis of biological sequences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections.

Availability and implementation: The R package kebabs is available via the Bioconductor project: http://bio conductor.org/packages/release/bioc/html/kebabs.html. Further information and the R code of the example in this paper are available at http://www.bioinf.jku.at/software/kebabs/.

Contact: kebabs@bioinf.jku.at or bodenhofer@bioinf.jku.at

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Probing the binding affinity of amyloids to reduce toxicity of oligomers in diabetes

探索淀粉的亲和力低聚物在糖尿病的基础上减少毒性

Motivation: Amyloids play a role in the degradation of β-cells in diabetes patients. In particular, short amyloid oligomers inject themselves into the membranes of these cells and create pores that disrupt the strictly controlled flow of ions through the membranes. This leads to cell death. Getting rid of the short oligomers either by a deconstruction process or by elongating them into longer fibrils will reduce this toxicity and allow the β-cells to live longer.

Results: We develop a computational method to probe the binding affinity of amyloid structures and produce an amylin analog that binds to oligomers and extends their length. The binding and extension lower toxicity and β-cell death. The amylin analog is designed through a parsimonious selection of mutations and is to be administered with the pramlintide drug, but not to interact with it. The mutations (T9K L12K S28H T30K) produce a stable native structure, strong binding affinity to oligomers, and long fibrils. We present an extended mathematical model for the insulin–glucose relationship and demonstrate how affecting the concentration of oligomers with such analog is strictly coupled with insulin release and β-cell fitness.

Availability and implementation: SEMBA, the tool to probe the binding affinity of amyloid proteins and generate the binding affinity scoring matrices and R-scores is available at: http://amyloid.cs.mcgill.ca

Contact: jeromew@cs.mcgill.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy

HAlign:快速多个相似的DNA / RNA序列比对基于中心恒星的策略

Motivation: Multiple sequence alignment (MSA) is important work, but bottlenecks arise in the massive MSA of homologous DNA or genome sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research.

Results: We developed two software tools to address the DNA MSA problem. The first employed trie trees to accelerate the centre star MSA strategy. The expected time complexity was decreased to linear time from square time. To address large-scale data, parallelism was applied using the hadoop platform. Experiments demonstrated the performance of our proposed methods, including their running time, sum-of-pairs scores and scalability. Moreover, we supplied two massive DNA/RNA MSA datasets for further testing and research.

Availability and implementation: The codes, tools and data are accessible free of charge at http://datamining.xmu.edu.cn/software/halign/.

Contact: zouquan@nclab.net or ghwang@hit.edu.cn

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Reducing the search space for causal genetic variants with VASP

减少搜索空间与VASP因果基因变异

Motivation: Increasingly, cost-effective high-throughput DNA sequencing technologies are being utilized to sequence human pedigrees to elucidate the genetic cause of a wide variety of human diseases. While numerous tools exist for variant prioritization within a single genome, the ability to concurrently analyze variants within pedigrees remains a challenge, especially should there be no prior indication of the underlying genetic cause of the disease. Here, we present a tool, variant analysis of sequenced pedigrees (VASP), a flexible data integration environment capable of producing a summary of pedigree variation, providing relevant information such as compound heterozygosity, genome phasing and disease inheritance patterns. Designed to aggregate data across a sequenced pedigree, VASP allows both powerful filtering and custom prioritization of both single nucleotide variants (SNVs) and small indels. Hence, clinical and research users with prior knowledge of a disease are able to dramatically reduce the variant search space based on a wide variety of custom prioritization criteria.

Availability and implementation: Source code available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/VASP.

Contact: matt.field@anu.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data

性能的数据随机化试验的系统发育分析时间结构的病毒数据

Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Centromeres Off the Hook: Massive Changes in Centromere Size and Structure Following Duplication of CenH3 Gene in Fabeae Species

着丝粒脱身:继CENH3基因在物种在着丝粒重复Fabeae尺寸和结构的巨大变化

In most eukaryotes, centromere is determined by the presence of the centromere-specific histone variant CenH3. Two types of chromosome morphology are generally recognized with respect to centromere organization. Monocentric chromosomes possess a single CenH3-containing domain in primary constriction, whereas holocentric chromosomes lack the primary constriction and display dispersed distribution of CenH3. Recently, metapolycentric chromosomes have been reported in Pisum sativum, representing an intermediate type of centromere organization characterized by multiple CenH3-containing domains distributed across large parts of chromosomes that still form a single constriction. In this work, we show that this type of centromere is also found in other Pisum and closely related Lathyrus species, whereas Vicia and Lens genera, which belong to the same legume tribe Fabeae, possess only monocentric chromosomes. We observed extensive variability in the size of primary constriction and the arrangement of CenH3 domains both between and within individual Pisum and Lathyrus species, with no obvious correlation to genome or chromosome size. Search for CenH3 gene sequences revealed two paralogous variants, CenH3-1 and CenH3-2, which originated from a duplication event in the common ancestor of Fabeae species. The CenH3-1 gene was subsequently lost or silenced in the lineage leading to Vicia and Lens, whereas both genes are retained in Pisum and Lathyrus. Both of these genes appear to have evolved under purifying selection and produce functional CenH3 proteins which are fully colocalized. The findings described here provide the first evidence for a highly dynamic centromere structure within a group of closely related species, challenging previous concepts of centromere evolution.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Exploring the Phenotypic Space and the Evolutionary History of a Natural Mutation in Drosophila melanogaster

探索表型空间和果蝇的自然变异的进化史

A major challenge of modern Biology is elucidating the functional consequences of natural mutations. Although we have a good understanding of the effects of laboratory-induced mutations on the molecular- and organismal-level phenotypes, the study of natural mutations has lagged behind. In this work, we explore the phenotypic space and the evolutionary history of a previously identified adaptive transposable element insertion. We first combined several tests that capture different signatures of selection to show that there is evidence of positive selection in the regions flanking FBti0019386 insertion. We then explored several phenotypes related to known phenotypic effects of nearby genes, and having plausible connections to fitness variation in nature. We found that flies with FBti0019386 insertion had a shorter developmental time and were more sensitive to stress, which are likely to be the adaptive effect and the cost of selection of this mutation, respectively. Interestingly, these phenotypic effects are not consistent with a role of FBti0019386 in temperate adaptation as has been previously suggested. Indeed, a global analysis of the population frequency of FBti0019386 showed that climatic variables explain well the FBti0019386 frequency patterns only in Australia. Finally, although FBti0019386 insertion could be inducing the formation of heterochromatin by recruiting HP1a (Heterochromatin Protein 1a) protein, the insertion is associated with upregulation of sra in adult females. Overall, our integrative approach allowed us to shed light on the evolutionary history, the relevant fitness effects, and the likely molecular mechanisms of an adaptive mutation and highlights the complexity of natural genetic variants.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

iTagPlot: an accurate computation and interactive drawing tool for tag density plot

iTagPlot:准确计算和交互式绘图工具标记密度图

Motivation: Tag density plots are very important to intuitively reveal biological phenomena from capture-based sequencing data by visualizing the normalized read depth in a region.

Results: We have developed iTagPlot to compute tag density across functional features in parallel using multicores and a grid engine and to interactively explore it in a graphical user interface. It allows us to stratify features by defining groups based on biological function and measurement, summary statistics and unsupervised clustering.

Availability and implementation: http://sourceforge.net/projects/itagplot/.

Contact: jechoi@gru.edu and jeochoi@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

A robust approach for identifying differentially abundant features in metagenomic samples

一个健壮的方法识别差异丰富特性宏基因组样本

Motivation: The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales are not properly normalized within and between communities, prior to the analysis of differential abundance. Normalization methods used in the differential analysis typically try to adjust counts on different scales to a common scale using the total sum, mean or median of representative features across all samples. These methods often yield undesirable results when the difference in total counts of differentially abundant features (DAFs) across different conditions is large.

Results: We develop a novel method, Ratio Approach for Identifying Differential Abundance (RAIDA), which utilizes the ratio between features in a modified zero-inflated lognormal model. RAIDA removes possible problems associated with counts on different scales within and between conditions. As a result, its performance is not affected by the amount of difference in total abundances of DAFs across different conditions. Through comprehensive simulation studies, the performance of our method is consistently powerful, and under some situations, RAIDA greatly surpasses other existing methods. We also apply RAIDA on real datasets of type II diabetes and find interesting results consistent with previous reports.

Availability and implementation: An R package for RAIDA can be accessed from http://cals.arizona.edu/%7Eanling/sbg/software.htm.

Contact: anling@email.arizona.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation-Accumulation Experiments

不对称的上下文相关的突变模式显示通过突变积累实验

Despite the general assumption that site-specific mutation rates are independent of the local sequence context, a growing body of evidence suggests otherwise. To further examine context-dependent patterns of mutation, we amassed 5,645 spontaneous mutations in wild- type (WT) and mismatch-repair deficient (MMR) mutation–accumulation (MA) lines of the gram-positive model organism Bacillus subtilis. We then analyzed>7,500 spontaneous base-substitution mutations across B. subtilis, Escherichia coli, and Mesoplasma florum WT and MMR MA lines, finding a context-dependent mutation pattern that is asymmetric around the origin of replication. Different neighboring nucleotides can alter site-specific mutation rates by as much as 75-fold, with sites neighboring G:C base pairs or dimers involving alternating pyrimidine–purine and purine–pyrimidine nucleotides having significantly elevated mutation rates. The influence of context-dependent mutation on genome architecture is strongest in M. florum, consistent with the reduced efficiency of selection in organisms with low effective population size. If not properly accounted for, the disparities arising from patterns of context-dependent mutation can significantly influence interpretations of positive and purifying selection.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation

麦格纳:最大化精度校准通过全球网络节点和边缘保护

Motivation: Network alignment aims to find conserved regions between different networks. Existing methods aim to maximize total similarity over all aligned nodes (i.e. node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Thus, we recently introduced MAGNA (Maximizing Accuracy in Global Network Alignment) to directly maximize edge conservation while producing alignments and showed its superiority over the existing methods. Here, we extend the original MAGNA with several important algorithmic advances into a new MAGNA++ framework.

Results: MAGNA++ introduces several novelties: (i) it simultaneously maximizes any one of three different measures of edge conservation (including our recent superior $${\hbox{ S }}^{3}$$ measure) and any desired node conservation measure, which further improves alignment quality compared with maximizing only node conservation or only edge conservation; (ii) it speeds up the original MAGNA algorithm by parallelizing it to automatically use all available resources, as well as by reimplementing the edge conservation measures more efficiently; (iii) it provides a friendly graphical user interface for easy use by domain (e.g. biological) scientists; and (iv) at the same time, MAGNA++ offers source code for easy extensibility by computational scientists.

Availability and implementation: http://www.nd.edu/~cone/MAGNA++/

Contact: tmilenko@nd.edu

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Test set bias affects reproducibility of gene signatures

测试集的偏见影响再现性的基因签名

Motivation: Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction.

Results: We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms.

Availability and implementation: The code, data and instructions necessary to reproduce our entire analysis is available at https://github.com/prpatil/testsetbias.

Contact: jtleek@gmail.com or bhaibeka@uhnresearch.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R

PRROC:计算和可视化precision-recall R和接收机操作特性曲线

Summary: Precision-recall (PR) and receiver operating characteristic (ROC) curves are valuable measures of classifier performance. Here, we present the R-package PRROC, which allows for computing and visualizing both PR and ROC curves. In contrast to available R-packages, PRROC allows for computing PR and ROC curves and areas under these curves for soft-labeled data using a continuous interpolation between the points of PR curves. In addition, PRROC provides a generic plot function for generating publication-quality graphics of PR and ROC curves.

Availability and implementation: PRROC is available from CRAN and is licensed under GPL 3.

Contact: grau@informatik.uni-halle.de

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

SANSparallel: interactive homology search against Uniprot

sansparallel:互动与UniProt同源性搜索

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

MTiOpenScreen: a web server for structure-based virtual screening

mtiopenscreen:基于结构的Web服务器的虚拟筛选

Open screening endeavors play and will play a key role to facilitate the identification of new bioactive compounds in order to foster innovation and to improve the effectiveness of chemical biology and drug discovery processes. In this line, we developed the new web server MTiOpenScreen dedicated to small molecule docking and virtual screening. It includes two services, MTiAutoDock and MTiOpenScreen, allowing performing docking into a user-defined binding site or blind docking using AutoDock 4.2 and automated virtual screening with AutoDock Vina. MTiOpenScreen provides valuable starting collections for screening, two in-house prepared drug-like chemical libraries containing 150 000 PubChem compounds: the Diverse-lib containing diverse molecules and the iPPI-lib enriched in molecules likely to inhibit protein–protein interactions. In addition, MTiOpenScreen offers users the possibility to screen up to 5000 small molecules selected outside our two libraries. The predicted binding poses and energies of up to 1000 top ranked ligands can be downloaded. In this way, MTiOpenScreen enables researchers to apply virtual screening using different chemical libraries on traditional or more challenging protein targets such as protein–protein interactions. The MTiOpenScreen web server is free and open to all users at http://bioserv.rpbs.univ-paris-diderot.fr/services/MTiOpenScreen/.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI

EBI的搜索引擎:提供搜索和检索从EMBL-EBI生物数据的功能

The European Bioinformatics Institute (EMBL-EBI—https://www.ebi.ac.uk) provides free and unrestricted access to data across all major areas of biology and biomedicine. Searching and extracting knowledge across these domains requires a fast and scalable solution that addresses the requirements of domain experts as well as casual users. We present the EBI Search engine, referred to here as ‘EBI Search’, an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. API integration provides access to analytical tools, allowing users to further investigate the results of their search. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types including sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, together with relevant life science literature.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction

TCS:Web服务器的多序列比对和系统发育重建的评价

This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature

使用本体的指纹对基因命名实体在生物医学文献

Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles.

Database URL: http://www.ontologyfingerprint.org

[详细]

  • Database
  • 10年前
  • Original Article

miRGate: a curated database of human, mouse and rat miRNA-mRNA targets

mirgate:策划数据库的人类,小鼠和大鼠的miRNA的靶mRNA

MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA–mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA–mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA–mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3'-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/

Database URL: http://mirgate.bioinfo.cnio.es

[详细]

  • Database
  • 10年前
  • Database Tool

Multiscale reaction-diffusion simulations with Smoldyn

多尺度与Smoldyn反应扩散模拟

Summary: Smoldyn is a software package for stochastic modelling of spatial biochemical networks and intracellular systems. It was originally developed with an accurate off-lattice particle-based model at its core. This has recently been enhanced with the addition of a computationally efficient on-lattice model, which can be run stand-alone or coupled together for multiscale simulations using both models in regions where they are most required, increasing the applicability of Smoldyn to larger molecule numbers and spatial domains. Simulations can switch between models with only small additions to their configuration file, enabling users with existing Smoldyn configuration files to run the new on-lattice model with any reaction, species or surface descriptions they might already have.

Availability and Implementation: Source code and binaries freely available for download at www.smoldyn.org, implemented in C/C++ and supported on Linux, Mac OSX and MS Windows.

Contact: martin.robinson@maths.ox.ac.uk

Supplementary Information: Supplementary data are available at Bioinformatics online and include additional details on model specification and modelling of surfaces, as well as the Smoldyn configuration file used to generate Figure 1.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

LBIBCell: a cell-based simulation environment for morphogenetic problems

LBIBCell:细胞形态形成问题的模拟环境

Motivation: The simulation of morphogenetic problems requires the simultaneous and coupled simulation of signalling and tissue dynamics. A cellular resolution of the tissue domain is important to adequately describe the impact of cell-based events, such as cell division, cell–cell interactions and spatially restricted signalling events. A tightly coupled cell-based mechano-regulatory simulation tool is therefore required.

Results: We developed an open-source software framework for morphogenetic problems. The environment offers core functionalities for the tissue and signalling models. In addition, the software offers great flexibility to add custom extensions and biologically motivated processes. Cells are represented as highly resolved, massless elastic polygons; the viscous properties of the tissue are modelled by a Newtonian fluid. The Immersed Boundary method is used to model the interaction between the viscous and elastic properties of the cells, thus extending on the IBCell model. The fluid and signalling processes are solved using the Lattice Boltzmann method. As application examples we simulate signalling-dependent tissue dynamics.

Availability and implementation: The documentation and source code are available on http://tanakas.bitbucket.org/lbibcell/index.html

Contact: simon.tanaka@bsse.ethz.ch or dagmar.iber@bsse.ethz.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces

Structure-PPi:一个模块注释的癌症相关的单核苷酸变异蛋白质接口

Motivation: The interpretation of cancer-related single-nucleotide variants (SNVs) considering the protein features they affect, such as known functional sites, protein–protein interfaces, or relation with already annotated mutations, might complement the annotation of genetic variants in the analysis of NGS data. Current tools that annotate mutations fall short on several aspects, including the ability to use protein structure information or the interpretation of mutations in protein complexes.

Results: We present the Structure–PPi system for the comprehensive analysis of coding SNVs based on 3D protein structures of protein complexes. The 3D repository used, Interactome3D, includes experimental and modeled structures for proteins and protein–protein complexes. Structure–PPi annotates SNVs with features extracted from UniProt, InterPro, APPRIS, dbNSFP and COSMIC databases. We illustrate the usefulness of Structure–PPi with the interpretation of 1 027 122 non-synonymous SNVs from COSMIC and the 1000G Project that provides a collection of ~172 700 SNVs mapped onto the protein 3D structure of 8726 human proteins (43.2% of the 20 214 SwissProt-curated proteins in UniProtKB release 2014_06) and protein–protein interfaces with potential functional implications.

Availability and implementation: Structure–PPi, along with a user manual and examples, isavailable at http://structureppi.bioinfo.cnio.es/Structure, the code for local installations at https://github.com/Rbbt-Workflows

Contact: tpons@cnio.es

Supplementary Information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

ASBench: benchmarking sets for allosteric discovery

ASBench:基准测试集变构的发现

Summary: Allostery allows for the fine-tuning of protein function. Targeting allosteric sites is gaining increasing recognition as a novel strategy in drug design. The key challenge in the discovery of allosteric sites has strongly motivated the development of computational methods and thus high-quality, publicly accessible standard data have become indispensable. Here, we report benchmarking data for experimentally determined allosteric sites through a complex process, including a ‘Core set’ with 235 unique allosteric sites and a ‘Core-Diversity set’ with 147 structurally diverse allosteric sites. These benchmarking sets can be exploited to develop efficient computational methods to predict unknown allosteric sites in proteins and reveal unique allosteric ligand–protein interactions to guide allosteric drug design.

Availability and implementation: The benchmarking sets are freely available at http://mdl.shsmu.edu.cn/asbench.

Contact: jian.zhang@sjtu.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE