Preparation of compact biocompatible quantum dots using multicoordinating molecular-scale ligands based on a zwitterionic hydrophilic motif and lipoic acid anchors

紧凑型量子点分子配体的使用multicoordinating基于两性离子和硫辛酸的制备亲水主题主持人

To be useful for biological assays, quantum dots need to be made soluble and stable in biologically relevant matrices. In this protocol, a ligand containing a lipoic acid–zwitterion moiety capable of photoligation to the quantum dot is used.

[详细]

  • Nature Protocols 10, 859 (2015)
  • 10年前
  • Protocol

SELPHI: correlation-based identification of kinase-associated networks from global phospho-proteomics data sets

selphi:基于相关识别激酶相关网络从全球磷酸化蛋白质组学数据集

While phospho-proteomics studies have shed light on the dynamics of cellular signaling, they mainly describe global effects and rarely explore mechanistic details, such as kinase/substrate relationships. Tools and databases, such as NetworKIN and PhosphoSitePlus, provide valuable regulatory details on signaling networks but rely on prior knowledge. They therefore provide limited information on less studied kinases and fewer unexpected relationships given that better studied signaling events can mask condition- or cell-specific ‘network wiring’.

SELPHI is a web-based tool providing in-depth analysis of phospho-proteomics data that is intuitive and accessible to non-bioinformatics experts. It uses correlation analysis of phospho-sites to extract kinase/phosphatase and phospho-peptide associations, and highlights the potential flow of signaling in the system under study. We illustrate SELPHI via analysis of phospho-proteomics data acquired in the presence of erlotinib—a tyrosine kinase inhibitor (TKI)—in cancer cells expressing TKI-resistant and -sensitive variants of the Epidermal Growth Factor Receptor. In this data set, SELPHI revealed information overlooked by the reporting study, including the known role of MET and EPHA2 kinases in conferring resistance to erlotinib in TKI sensitive strains. SELPHI can significantly enhance the analysis of phospho-proteomics data contributing to improved understanding of sample-specific signaling networks. SELPHI is freely available via http://llama.mshri.on.ca/SELPHI.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

Reshaping the epigenetic landscape during early flower development: induction of attractor transitions by relative differences in gene decay rates

重塑在花早期发育的表观遗传景观:感应吸引子转换的衰变速率的相对差异基因

Background: Gene regulatory network (GRN) dynamical models are standard systems biology tools for the mechanistic understanding of developmental processes and are enabling the formalization of the epigenetic landscape (EL) model. Methods: In this work we propose a modeling framework which integrates standard mathematical analyses to extend the simple GRN Boolean model in order to address questions regarding the impact of gene specific perturbations in cell-fate decisions during development. Results: We systematically tested the propensity of individual genes to produce qualitative changes to the EL induced by modification of gene characteristic decay rates reflecting the temporal dynamics of differentiation stimuli. By applying this approach to the flower specification GRN (FOS-GRN) we uncovered differences in the functional (dynamical) role of their genes. The observed dynamical behavior correlates with biological observables. We found a relationship between the propensity of undergoing attractor transitions between attraction basins in the EL and the direction of differentiation during early flower development - being less likely to induce up-stream attractor transitions as the course of development progresses. Our model also uncovered a potential mechanism at play during the transition from EL basins defining inflorescence meristem to those associated to flower organs meristem. Additionally, our analysis provided a mechanistic interpretation of the homeotic property of the ABC genes, being more likely to produce both an induced inter-attractor transition and to specify a novel attractor. Finally, we found that there is a close relationship between a gene’s topological features and its propensity to produce attractor transitions. Conclusions: The study of how the state-space associated with a dynamical model of a GRN can be restructured by modulation of genes’ characteristic expression times is an important aid for understanding underlying mechanisms occurring during development. Our contribution offers a simple framework to approach such problem, as exemplified here by the case of flower development. Different GRN models and the effect of diverse inductive signals can be explored within the same framework. We speculate that the dynamical role of specific genes within a GRN, as uncovered here, might give information about which genes are more likely to link a module to other regulatory circuits and signaling transduction pathways.

[详细]

  • BMC Systems Biology 2015, null:20
  • 10年前

Genomic Signature of Selective Sweeps Illuminates Adaptation of Medicago truncatula to Root-Associated Microorganisms

选择性扫描基因组签名说明苜蓿根相关微生物的适应

Medicago truncatula is a model legume species used to investigate plant–microorganism interactions, notably root symbioses. Massive population genomic and transcriptomic data now available for this species open the way for a comprehensive investigation of genomic variations associated with adaptation of M. truncatula to its environment. Here we performed a fine-scale genome scan of selective sweep signatures in M. truncatula using more than 15 million single nucleotide polymorphisms identified on 283 accessions from two populations (Circum and Far West), and exploited annotation and published transcriptomic data to identify biological processes associated with molecular adaptation. We identified 58 swept genomic regions with a 15 kb average length and comprising 3.3 gene models on average. The unimodal sweep state probability distribution in these regions enabled us to focus on the best single candidate gene per region. We detected two unambiguous species-wide selective sweeps, one of which appears to underlie morphological adaptation. Population genomic analyses of the remaining 56 sweep signatures indicate that sweeps identified in the Far West population are less population-specific and probably more ancient than those identified in the Circum population. Functional annotation revealed a predominance of immunity-related adaptations in the Circum population. Transcriptomic data from accessions of the Far West population allowed inference of four clusters of coregulated genes putatively involved in the adaptive control of symbiotic carbon flow and nodule senescence, as well as in other root adaptations upon infection with soil microorganisms. We demonstrate that molecular adaptations in M. truncatula were primarily triggered by selective pressures from root-associated microorganisms.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Maintenance and Loss of Duplicated Genes by Dosage Subfunctionalization

维护和剂量重复基因丧失功能化

Whole-genome duplications (WGDs) have contributed to gene-repertoire enrichment in many eukaryotic lineages. However, most duplicated genes are eventually lost and it is still unclear why some duplicated genes are evolutionary successful whereas others quickly turn to pseudogenes. Here, we show that dosage constraints are major factors opposing post-WGD gene loss in several Paramecium species that share a common ancestral WGD. We propose a model where a majority of WGD-derived duplicates preserve their ancestral function and are retained to produce enough of the proteins performing this same ancestral function. Under this model, the expression level of individual duplicated genes can evolve neutrally as long as they maintain a roughly constant summed expression, and this allows random genetic drift toward uneven contributions of the two copies to total expression. Our analysis suggests that once a high level of imbalance is reached, which can require substantial lengths of time, the copy with the lowest expression level contributes a small enough fraction of the total expression that selection no longer opposes its loss. Extension of our analysis to yeast species sharing a common ancestral WGD yields similar results, suggesting that duplicated-gene retention for dosage constraints followed by divergence in expression level and eventual deterministic gene loss might be a universal feature of post-WGD evolution.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

移动的山:需要变换成可计算的比较解剖学解剖学研究

The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ~40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology.

Database URL: http://kb.phenoscape.org

[详细]

  • Database
  • 10年前
  • Original Article

AtmiRNET: a web-based resource for reconstructing regulatory networks of Arabidopsis microRNAs

atmirnet:网上资源重构的拟南芥miRNA调控网络

Compared with animal microRNAs (miRNAs), our limited knowledge of how miRNAs involve in significant biological processes in plants is still unclear. AtmiRNET is a novel resource geared toward plant scientists for reconstructing regulatory networks of Arabidopsis miRNAs. By means of highlighted miRNA studies in target recognition, functional enrichment of target genes, promoter identification and detection of cis- and trans-elements, AtmiRNET allows users to explore mechanisms of transcriptional regulation and miRNA functions in Arabidopsis thaliana, which are rarely investigated so far. High-throughput next-generation sequencing datasets from transcriptional start sites (TSSs)-relevant experiments as well as five core promoter elements were collected to establish the support vector machine-based prediction model for Arabidopsis miRNA TSSs. Then, high-confidence transcription factors participate in transcriptional regulation of Arabidopsis miRNAs are provided based on statistical approach. Furthermore, both experimentally verified and putative miRNA-target interactions, whose validity was supported by the correlations between the expression levels of miRNAs and their targets, are elucidated for functional enrichment analysis. The inferred regulatory networks give users an intuitive insight into the pivotal roles of Arabidopsis miRNAs through the crosstalk between miRNA transcriptional regulation (upstream) and miRNA-mediate (downstream) gene circuits. The valuable information that is visually oriented in AtmiRNET recruits the scant understanding of plant miRNAs and will be useful (e.g. ABA-miR167c-auxin signaling pathway) for further research.

Database URL: http://AtmiRNET.itps.ncku.edu.tw/

[详细]

  • Database
  • 10年前
  • Original Article

Core promoter sequence in yeast is a major determinant of expression level [RESEARCH]

核心启动子序列在酵母中表达水平[研究]的主要决定因素

The core promoter is the regulatory sequence to which RNA polymerase is recruited and where it acts to initiate transcription. Here, we present the first comprehensive study of yeast core promoters, providing massively parallel measurements of core promoter activity and of TSS locations and relative usage for thousands of native and designed sequences. We found core promoter activity to be highly correlated to the activity of the entire promoter, and that sequence variation in different core promoter regions substantially tunes its activity in a predictable way. We also show that location, orientation and flanking bases critically affect TATA element function, that transcription initiation in highly active core promoters is focused within a narrow region, that poly(dA:dT) orientation has functional consequence at the 3' end of promoters, and that orthologous core promoters across yeast species have conserved activities. Our results demonstrate the importance of core promoters in the quantitative study of gene regulation.

[详细]

  • Genome Research
  • 10年前
  • RESEARCH

GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimization

galaxypepdock:蛋白肽对接工具基于互动的相似性和能量优化

Protein–peptide interactions are involved in a wide range of biological processes and are attractive targets for therapeutic purposes because of their small interfaces. Therefore, effective protein–peptide docking techniques can provide the basis for potential therapeutic applications by enabling an atomic-level understanding of protein interactions. With the increasing number of protein–peptide structures deposited in the protein data bank, the prediction accuracy of protein-peptide docking can be enhanced by utilizing the information provided by the database. The GalaxyPepDock web server, which is freely accessible at http://galaxy.seoklab.org/pepdock, performs similarity-based docking by finding templates from the database of experimentally determined structures and building models using energy-based optimization that allows for structural flexibility. The server can therefore effectively model the structural differences between the template and target protein–peptide complexes. The performance of GalaxyPepDock is superior to those of the other currently available web servers when tested on the PeptiDB set and on recently released complex structures. When tested on the CAPRI target 67, GalaxyPepDock generates models that are more accurate than the best server models submitted during the CAPRI blind prediction experiment.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

小鬼2:多种功能基因组学门户集成,可视化和预测蛋白质功能和网络

IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers. IMP does not require any registration or installation and is freely available for use at http://imp.princeton.edu.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides

该TOPCONS Web服务器的共识预测膜蛋白拓扑结构和信号肽

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

Human nucleolar protein Nop52 (RRP1/NNP-1) is involved in site 2 cleavage in internal transcribed spacer 1 of pre-rRNAs at early stages of ribosome biogenesis

人核仁蛋白nop52(Rrp1 / nnp-1)参与网站2裂解预rRNA基因内转录间隔区1在核糖体合成的早期阶段

During the early steps of ribosome biogenesis in mammals, the two ribosomal subunits 40S and 60S are produced via splitting of the large 90S pre-ribosomal particle (90S) into pre-40S and pre-60S pre-ribosomal particles (pre-40S and pre-60S). We previously proposed that replacement of fibrillarin by Nop52 (RRP1/NNP-1) for the binding to p32 (C1QBP) is a key event that drives this splitting process. However, how the replacement by RRP1 is coupled with the endo- and/or exo-ribonucleolytic cleavage of pre-rRNA remains unknown. In this study, we demonstrate that RRP1 deficiency suppressed site 2 cleavage on ITS1 of 47S/45S, 41S and 36S pre-rRNAs in human cells. RRP1 was also present in 90S and was localized in the dense fibrillar component of the nucleolus dependently on active RNA polymerase I transcription. In addition, double knockdown of XRN2 and RRP1 revealed that RRP1 accelerated the site 2 cleavage of 47S, 45S and 41S pre-rRNAs. These data suggest that RRP1 is involved not only in competitive binding with fibrillarin to C1QBP on 90S but also in site 2 cleavage in ITS1 of pre-rRNAs at early stages of human ribosome biogenesis; thus, it is likely that RRP1 integrates the cleavage of site 2 with the physical split of 90S into pre-40S and pre-60S.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Molecular Biology

ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap

clustvis:一个可视化的应用主成分分析和多元数据聚类的Web工具热图

The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

Structural and sequencing analysis of local target DNA recognition by MLV integrase

通过局部靶DNA识别MLV整合酶的结构和序列分析

Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Structural Biology

Comparative Genomics Reveals the Origins and Diversity of Arthropod Immune Systems

比较基因组学揭示了节肢动物的免疫系统的起源和多样性

Insects are an important model for the study of innate immune systems, but remarkably little is known about the immune system of other arthropod groups despite their importance as disease vectors, pests, and components of biological diversity. Using comparative genomics, we have characterized the immune system of all the major groups of arthropods beyond insects for the first time—studying five chelicerates, a myriapod, and a crustacean. We found clear traces of an ancient origin of innate immunity, with some arthropods having Toll-like receptors and C3-complement factors that are more closely related in sequence or structure to vertebrates than other arthropods. Across the arthropods some components of the immune system, such as the Toll signaling pathway, are highly conserved. However, there is also remarkable diversity. The chelicerates apparently lack the Imd signaling pathway and beta-1,3 glucan binding proteins—a key class of pathogen recognition receptors. Many genes have large copy number variation across species, and this may sometimes be accompanied by changes in function. For example, we find that peptidoglycan recognition proteins have frequently lost their catalytic activity and switch between secreted and intracellular forms. We also find that there has been widespread and extensive duplication of the cellular immune receptor Dscam (Down syndrome cell adhesion molecule), which may be an alternative way to generate the high diversity produced by alternative splicing in insects. In the antiviral short interfering RNAi pathway Argonaute 2 evolves rapidly and is frequently duplicated, with a highly variable copy number. Our results provide a detailed analysis of the immune systems of several important groups of animals for the first time and lay the foundations for functional work on these groups.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

ms-data-core-api: an open-source, metadata-oriented library for computational proteomics

ms-data-core-api:开源,metadata-oriented库计算蛋白质组学

Summary: The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.

Availability and implementation: The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api.

Supplementary information: Supplementary data are available at Bioinformatics online

Contact: juan@ebi.ac.uk

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

PDBest: a user-friendly platform for manipulating and enhancing protein structures

pdb:一个用户友好的操作平台和加强蛋白质结构

Summary: PDBest (PDB Enhanced Structures Toolkit) is a user-friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. With an intuitive graphical interface it allows users with no programming background to download and manipulate their files. The platform also exports protocols, enabling users to easily share PDB searching and filtering criteria, enhancing analysis reproducibility.

Availability and implementation: PDBest installation packages are freely available for several platforms at http://www.pdbest.dcc.ufmg.br

Contact: wellisson@dcc.ufmg.br, dpires@dcc.ufmg.br, raquelcm@dcc.ufmg.br

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Genome-scale strain designs based on regulatory minimal cut sets

公司基于应变设计监管最小割集

Motivation: Stoichiometric and constraint-based methods of computational strain design have become an important tool for rational metabolic engineering. One of those relies on the concept of constrained minimal cut sets (cMCSs). However, as most other techniques, cMCSs may consider only reaction (or gene) knockouts to achieve a desired phenotype.

Results: We generalize the cMCSs approach to constrained regulatory MCSs (cRegMCSs), where up/downregulation of reaction rates can be combined along with reaction deletions. We show that flux up/downregulations can virtually be treated as cuts allowing their direct integration into the algorithmic framework of cMCSs. Because of vastly enlarged search spaces in genome-scale networks, we developed strategies to (optionally) preselect suitable candidates for flux regulation and novel algorithmic techniques to further enhance efficiency and speed of cMCSs calculation. We illustrate the cRegMCSs approach by a simple example network and apply it then by identifying strain designs for ethanol production in a genome-scale metabolic model of Escherichia coli. The results clearly show that cRegMCSs combining reaction deletions and flux regulations provide a much larger number of suitable strain designs, many of which are significantly smaller relative to cMCSs involving only knockouts. Furthermore, with cRegMCSs, one may also enable the fine tuning of desired behaviours in a narrower range. The new cRegMCSs approach may thus accelerate the implementation of model-based strain designs for the bio-based production of fuels and chemicals.

Availability and implementation: MATLAB code and the examples can be downloaded at http://www.mpi-magdeburg.mpg.de/projects/cna/etcdownloads.html.

Contact: krishna.mahadevan@utoronto.ca or klamt@mpi-magdeburg.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

More challenges for machine-learning protein interactions

更多的挑战为机器学习蛋白质相互作用

Motivation: Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein–protein interactions (PPIs).

Results: Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A–B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.

Availability and implementation: Datasets used in our analyses are available at https://rostlab.org/owiki/index.php/PPI_challenges

Contact: rost@in.tum.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • SEQUENCE ANALYSIS