Discriminating precursors of common fragments for large-scale metabolite profiling by triple quadrupole mass spectrometry

识别常见的前身为大规模的代谢物分析三重四极质谱碎片

Motivation: The goal of large-scale metabolite profiling is to compare the relative concentrations of as many metabolites extracted from biological samples as possible. This is typically accomplished by measuring the abundances of thousands of ions with high-resolution and high mass accuracy mass spectrometers. Although the data from these instruments provide a comprehensive fingerprint of each sample, identifying the structures of the thousands of detected ions is still challenging and time intensive. An alternative, less-comprehensive approach is to use triple quadrupole (QqQ) mass spectrometry to analyze predetermined sets of metabolites (typically fewer than several hundred). This is done using authentic standards to develop QqQ experiments that specifically detect only the targeted metabolites, with the advantage that the need for ion identification after profiling is eliminated.

Results: Here, we propose a framework to extend the application of QqQ mass spectrometers to large-scale metabolite profiling. We aim to provide a foundation for designing QqQ multiple reaction monitoring (MRM) experiments for each of the 82 696 metabolites in the METLIN metabolite database. First, we identify common fragmentation products from the experimental fragmentation data in METLIN. Then, we model the likelihoods of each precursor structure in METLIN producing each common fragmentation product. With these likelihood estimates, we select ensembles of common fragmentation products that minimize our uncertainty about metabolite identities. We demonstrate encouraging performance and, based on our results, we suggest how our method can be integrated with future work to develop large-scale MRM experiments.

Availability and implementation: Our predictions, Supplementary results, and the code for estimating likelihoods and selecting ensembles of fragmentation reactions are made available on the lab website at http://pattilab.wustl.edu/FragPred.

Contact: gjpattij@wustl.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Why Selection Might Be Stronger When Populations Are Small: Intron Size and Density Predict within and between-Species Usage of Exonic Splice Associated cis-Motifs

为什么选择可能是强大的种群很小:内含子大小和密度的预测和外显子剪接相关物种之间顺图案使用

The nearly neutral theory predicts that small effective population size provides the conditions for weakened selection. This is postulated to explain why our genome is more "bloated" than that of, for example, yeast, ours having large introns and large intergene spacer. If a bloated genome is also an error prone genome might it, however, be the case that selection for error-mitigating properties is stronger in our genome? We examine this notion using splicing as an exemplar, not least because large introns can predispose to noisy splicing. We thus ask whether, owing to genomic decay, selection for splice error-control mechanisms is stronger, not weaker, in species with large introns and small populations. In humans much information defining splice sites is in cis-exonic motifs, most notably exonic splice enhancers (ESEs). These act as splice-error control elements. Here then we ask whether within and between-species intron size is a predictor of the commonality of exonic cis-splicing motifs. We show that, as predicted, the proportion of synonymous sites that are ESE-associated and under selection in humans is weakly positively correlated with the size of the flanking intron. In a phylogenetically controlled framework, we observe, also as expected, that mean intron size is both predicted by Ne and is a good predictor of cis-motif usage across species, this usage coevolving with splice site definition. Unexpectedly, however, across taxa intron density is a better predictor of cis-motif usage than intron size. We propose that selection for splice-related motifs is driven by a need to avoid decoy splice sites that will be more common in genes with many and large introns. That intron number and density predict ESE usage within human genes is consistent with this, as is the finding of intragenic heterogeneity in ESE density. As intronic content and splice site usage across species is also well predicted by Ne, the result also suggests an unusual circumstance in which selection (for cis-modifiers of splicing) might be stronger when population sizes are smaller, as here splicing is noisier, resulting in a greater need to control error-prone splicing.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Ancient Origin and Recent Innovations of RNA Polymerase IV and V

古老的起源和RNA聚合酶IV和V最近的创新

Small RNA-mediated chromatin modification is a conserved feature of eukaryotes. In flowering plants, the short interfering (si)RNAs that direct transcriptional silencing are abundant and subfunctionalization has led to specialized machinery responsible for synthesis and action of these small RNAs. In particular, plants possess polymerase (Pol) IV and Pol V, multi-subunit homologs of the canonical DNA-dependent RNA Pol II, as well as specialized members of the RNA-dependent RNA Polymerase (RDR), Dicer-like (DCL), and Argonaute (AGO) families. Together these enzymes are required for production and activity of Pol IV-dependent (p4-)siRNAs, which trigger RNA-directed DNA methylation (RdDM) at homologous sequences. p4-siRNAs accumulate highly in developing endosperm, a specialized tissue found only in flowering plants, and are rare in nonflowering plants, suggesting that the evolution of flowers might coincide with the emergence of specialized RdDM machinery. Through comprehensive identification of RdDM genes from species representing the breadth of the land plant phylogeny, we describe the ancient origin of Pol IV and Pol V, suggesting that a nearly complete and functional RdDM pathway could have existed in the earliest land plants. We also uncover innovations in these enzymes that are coincident with the emergence of seed plants and flowering plants, and recent duplications that might indicate additional subfunctionalization. Phylogenetic analysis reveals rapid evolution of Pol IV and Pol V subunits relative to their Pol II counterparts and suggests that duplicates were retained and subfunctionalized through Escape from Adaptive Conflict. Evolution within the carboxy-terminal domain of the Pol V largest subunit is particularly striking, where illegitimate recombination facilitated extreme sequence divergence.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Computational modeling of the expansion of human cord blood CD133+ hematopoietic stem/progenitor cells with different cytokine combinations

计算模型的扩展人类脐带血造血干/祖细胞CD133的不同细胞因子组合

Motivation: Many important problems in cell biology require dense non-linear interactions between functional modules to be considered. The importance of computer simulation in understanding cellular processes is now widely accepted, and a variety of simulation algorithms useful for studying certain subsystems have been designed. Expansion of hematopoietic stem and progenitor cells (HSC/HPC) in ex vivo culture with cytokines and small molecules is a method to increase the restricted numbers of stem cells found in umbilical cord blood (CB), while also enhancing the content of early engrafting neutrophil and platelet precursors. The efficacy of the expanded product depends on the composition of the cocktail of cytokines and small molecules used for culture. Testing the influence of a cytokine or small molecule on the expansion of HSC/HPC is a laborious and expensive process. We therefore developed a computational model based on cellular signaling interactions that predict the influence of a cytokine on the survival, duplication and differentiation of the CD133+ HSC/HPC subset from human umbilical CB.

Results: We have used results from in vitro expansion cultures with different combinations of one or more cytokines to develop an ordinary differential equation model that includes the effect of cytokines on survival, duplication and differentiation of the CD133+ HSC/HPC. Comparing the results of in vitro and in silico experiments, we show that the model can predict the effect of a cytokine on the fold expansion and differentiation of CB CD133+ HSC/HPC after 8-day culture on a 3D scaffold.

Availability and implementation: The model is available visiting the following URL: http://www.francescopappalardo.net/Bioinformatics_CD133_Model.

Contact: francesco.pappalardo@unict.it or suzanne.watt@nhsbt.nhs.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human

小说motif-discovery算法识别co-regulatory图案在大型转录因子和微rna co-regulatory网络在人类

Motivation: Interplays between transcription factors (TFs) and microRNAs (miRNAs) in gene regulation are implicated in various physiological processes. It is thus important to identify biologically meaningful network motifs involving both types of regulators to understand the key co-regulatory mechanisms underlying the cellular identity and function. However, existing motif finders do not scale well for large networks and are not designed specifically for co-regulatory networks.

Results: In this study, we propose a novel algorithm CoMoFinder to accurately and efficiently identify composite network motifs in genome-scale co-regulatory networks. We define composite network motifs as network patterns involving at least one TF, one miRNA and one target gene that are statistically significant than expected. Using two published disease-related co-regulatory networks, we show that CoMoFinder outperforms existing methods in both accuracy and robustness. We then applied CoMoFinder to human TF-miRNA co-regulatory network derived from The Encyclopedia of DNA Elements project and identified 44 recurring composite network motifs of size 4. The functional analysis revealed that genes involved in the 44 motifs are enriched for significantly higher number of biological processes or pathways comparing with non-motifs. We further analyzed the identified composite bi-fan motif and showed that gene pairs involved in this motif structure tend to physically interact and are functionally more similar to each other than expected.

Availability and implementation: CoMoFinder is implemented in Java and available for download at http://www.cs.utoronto.ca/~yueli/como.html.

Contact: luojiawei@hnu.edu.cn or zhaolei.zhang@utoronto.ca

Supplementary information: supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

SimSeq: a nonparametric approach to simulation of RNA-sequence datasets

SimSeq:rna序列数据集的非参数方法来模拟

Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method.

Results: We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algorithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate. Not surprisingly, methods based on parametric modeling assumptions seem to perform better with respect to false discovery rate control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy.

Availability and implementation: The nonparametric simulation algorithm developed in this article is implemented in the R package SimSeq, which is freely available under the GNU General Public License (version 2 or later) from the Comprehensive R Archive Network (http://cran.rproject.org/).

Contact: sgbenidt@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

4DGenome: a comprehensive database of chromatin interactions

4DGenome:全面database of chromatin作用

Motivation: The 3D structure of the genome plays a critical role in regulating gene expression. Recent progress in mapping technologies for chromatin interactions has led to a rapid increase in this kind of interaction data. This trend will continue as research in this burgeoning field intensifies.

Results: We describe the 4DGenome database that stores chromatin interaction data compiled through comprehensive literature curation. The database currently covers both low- and high-throughput assays, including 3C, 4C-Seq, 5C, Hi-C, ChIA-PET and Capture-C. To complement the set of interactions detected by experimental assays, we also include interactions predicted by a recently developed computational method with demonstrated high accuracy. The database currently contains ~8 million records, covering 102 cell/tissue types in five organisms. Records in the database are described using a standardized file format, facilitating data exchange. The vast major of the interactions were assigned a confidence score. Using the web interface, users can query and download database records via a number of annotation dimensions. Query results can be visualized along with other genomics datasets via links to the UCSC genome browser. We anticipate that 4DGenome will be a valuable resource for investigating the spatial structure-and-function relationship of genomes.

Availability and Implementation: 4Dgenome is freely accessible at http://4dgenome.int-med.uiowa.edu. The database and web interface are implemented in MySQL, Apache and JavaScript with all major browsers supported.

Contact: kai-tan@uiowa.edu

Supplementary Information: Supplementary Materials are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Two-group comparisons of zero-inflated intensity values: the choice of test statistic matters

双群比较zero-inflated强度值:检验统计量的选择很重要

Motivation: A special characteristic of data from molecular biology is the frequent occurrence of zero intensity values which can arise either by true absence of a compound or by a signal that is below a technical limit of detection.

Results: While so-called two-part tests compare mixture distributions between groups, one-part tests treat the zero-inflated distributions as left-censored. The left-inflated mixture model combines these two approaches. Both types of distributional assumptions and combinations of both are considered in a simulation study to compare power and estimation of log fold change. We discuss issues of application using an example from peptidomics.

The considered tests generally perform best in scenarios satisfying their respective distributional assumptions. In the absence of distributional assumptions, the two-part Wilcoxon test or the empirical likelihood ratio test is recommended. Assuming a log-normal subdistribution the left-inflated mixture model provides estimates for the proportions of the two considered types of zero intensities.

Availability: R code is available at http://cemsiis.meduniwien.ac.at/en/kb/science-research/software/

Contact: georg.heinze@meduniwien.ac.at

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

A Protocol for Diagnosing the Effect of Calibration Priors on Posterior Time Estimates: A Case Study for the Cambrian Explosion of Animal Phyla

一种诊断校准先验后时间估计的影响协议:为动物门寒武纪大爆发的一个案例研究

We present a procedure to test the effect of calibration priors on estimated times, which applies a recently developed calibration-free approach (RelTime) method that produces relative divergence times for all nodes in the tree. We illustrate this protocol by applying it to a timetree of metazoan diversification (Erwin DH, Laflamme M, Tweedt SM, Sperling EA, Pisani D, Peterson KJ. 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334:1091–1097.), which placed the divergence of animal phyla close to the time of the Cambrian explosion inferred from the fossil record. These analyses revealed that the two maximum-only calibration priors in the pre-Cambrian are the primary determinants of the young divergence times among animal phyla in this study. In fact, these two maximum-only calibrations produce divergence times that severely violate minimum boundaries of almost all of the other 22 calibration constraints. The use of these 22 calibrations produces dates for metazoan divergences that are hundreds of millions of years earlier in the Proterozoic. Our results encourage the use of calibration-free approaches to identify most influential calibration constraints and to evaluate their impact in order to achieve biologically robust interpretations.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Letter

High Recombinant Frequency in Extraintestinal Pathogenic Escherichia coli Strains

在肠外致病性大肠杆菌的重组频率高

Homologous recombination promotes genetic diversity by facilitating the integration of foreign DNA and intrachromosomal gene shuffling. It has been hypothesized that if recombination is variable among strains, selection should favor higher recombination rates among pathogens, as they face additional selection pressures from host defenses. To test this hypothesis we have developed a plasmid-based method for estimating the rate of recombination independently of other factors such as DNA transfer, selective processes, and mutational interference. Our results with 160 human commensal and extraintestinal pathogenic Escherichia coli (ExPEC) isolates show that the recombinant frequencies are extremely diverse (ranging 9 orders of magnitude) and plastic (they are profoundly affected by growth in urine, a condition commonly encountered by ExPEC). We find that the frequency of recombination is biased by strain lifestyle, as ExPEC isolates display strikingly higher recombination rates than their commensal counterparts. Furthermore, the presence of virulence factors is positively associated with higher recombination frequencies. These results suggest selection for high homologous recombination capacity, which may result in a higher evolvability for pathogens compared with commensals.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Evolutionary Advantage Conferred by an Eukaryote-to-Eukaryote Gene Transfer Event in Wine Yeasts

进化优势所赋予的一种真核生物真核生物基因转移事件在葡萄酒酵母

Although an increasing number of horizontal gene transfers have been reported in eukaryotes, experimental evidence for their adaptive value is lacking. Here, we report the recent transfer of a 158-kb genomic region between Torulaspora microellipsoides and Saccharomyces cerevisiae wine yeasts or closely related strains. This genomic region has undergone several rearrangements in S. cerevisiae strains, including gene loss and gene conversion between two tandemly duplicated FOT genes encoding oligopeptide transporters. We show that FOT genes confer a strong competitive advantage during grape must fermentation by increasing the number and diversity of oligopeptides that yeast can utilize as a source of nitrogen, thereby improving biomass formation, fermentation efficiency, and cell viability. Thus, the acquisition of FOT genes has favored yeast adaptation to the nitrogen-limited wine fermentation environment. This finding indicates that anthropic environments offer substantial ecological opportunity for evolutionary diversification through gene exchange between distant yeast species.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

U6 snRNA Pseudogenes: Markers of Retrotransposition Dynamics in Mammals

U6 snRNA基因在哺乳动物转录动力学指标:

Transposable elements comprise more than 45% of the human genome and long interspersed nuclear element 1 (LINE-1 or L1) is the only autonomous mobile element remaining active. Since its identification, it has been proposed that L1 contributes to the mobilization and amplification of other cellular RNAs and more recently, experimental demonstrations of this function has been described for many transcripts such as Alu, a nonautonomous mobile element, cellular mRNAs, or small noncoding RNAs. Detailed examination of the mobilization of various cellular RNAs revealed distinct pathways by which they could be recruited during retrotransposition; template choice or template switching. Here, by analyzing genomic structures and retrotransposition signatures associated with small nuclear RNA (snRNA) sequences, we identified distinct recruiting steps during the L1 retrotransposition cycle for the formation of snRNA-processed pseudogenes. Interestingly, some of the identified recruiting steps take place in the nucleus. Moreover, after comparison to other vertebrate genomes, we established that snRNA amplification by template switching is common to many LINE families from several LINE clades. Finally, we suggest that U6 snRNA copies can serve as markers of L1 retrotransposition dynamics in mammalian genomes.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

The EMBL-EBI bioinformatics web and programmatic tools framework

EMBL-EBI生物信息网络和编程工具框架

Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools (https://www.ebi.ac.uk/Tools/pfa/) such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces (https://www.ebi.ac.uk/Tools/webservices/) using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search (https://www.ebi.ac.uk/ebisearch/) and the dbfetch retrieval service (https://www.ebi.ac.uk/Tools/dbfetch/) further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (https://www.ebi.ac.uk/Tools/rna/), new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

eIF3 targets cell-proliferation messenger RNAs for translational activation or repression

研究目标信使RNA转化细胞增殖活化或抑制

Regulation of protein synthesis is fundamental for all aspects of eukaryotic biology by controlling development, homeostasis and stress responses. The 13-subunit, 800-kilodalton eukaryotic initiation factor 3 (eIF3) organizes initiation factor and ribosome interactions required for productive translation. However, current understanding of eIF3 function does not explain genetic evidence correlating eIF3 deregulation with tissue-specific cancers and developmental defects. Here we report the genome-wide discovery of human transcripts that interact with eIF3 using photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP). eIF3 binds to a highly specific program of messenger RNAs involved in cell growth control processes, including cell cycling, differentiation and apoptosis, via the mRNA 5′ untranslated region. Surprisingly, functional analysis of the interaction between eIF3 and two mRNAs encoding the cell proliferation regulators c-JUN and BTG1 reveals that eIF3 uses different modes of RNA stem–loop binding to exert either translational activation or repression. Our findings illuminate a new role for eIF3 in governing a specialized repertoire of gene expression and suggest that binding of eIF3 to specific mRNAs could be targeted to control carcinogenesis.

[详细]

  • Nature 522, 7554 (2015)
  • 10年前
  • Letter

Efficient visualization of high-throughput targeted proteomics experiments: TAPIR

Efficient the visualization of high - throughput targeted proteomics experiments: TAPIR

Motivation: Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required.

Results: We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks identified in targeted proteomics experiments. The input formats are open, community-driven standardized data formats (mzML for raw data storage and TraML encoding the hierarchical relationships between transitions, peptides and proteins). TAPIR is scalable to proteome-wide targeted proteomics studies (as enabled by SWATH-MS), allowing researchers to visualize high-throughput datasets. The framework integrates well with existing automated analysis pipelines and can be extended beyond targeted proteomics to other types of analyses.

Availability and implementation: TAPIR is available for all computing platforms under the 3-clause BSD license at https://github.com/msproteomicstools/msproteomicstools.

Contact: lars@imsb.biol.ethz.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments

红衣主教:R包质量统计分析spectrometry-based成像实验

Cardinal is an R package for statistical analysis of mass spectrometry-based imaging (MSI) experiments of biological samples such as tissues. Cardinal supports both Matrix-Assisted Laser Desorption/Ionization (MALDI) and Desorption Electrospray Ionization-based MSI workflows, and experiments with multiple tissues and complex designs. The main analytical functionalities include (1) image segmentation, which partitions a tissue into regions of homogeneous chemical composition, selects the number of segments and the subset of informative ions, and characterizes the associated uncertainty and (2) image classification, which assigns locations on the tissue to pre-defined classes, selects the subset of informative ions, and estimates the resulting classification error by (cross-) validation. The statistical methods are based on mixture modeling and regularization.

Contact: o.vitek@neu.edu

Availability and implementation: The code, the documentation, and examples are available open-source at www.cardinalmsi.org under the Artistic-2.0 license. The package is available at www.bioconductor.org.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

L-GRAAL: Lagrangian graphlet-based network aligner

L-GRAAL:拉格朗日graphlet-based网络调整器

Motivation: Discovering and understanding patterns in networks of protein–protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge.

Results: We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions.

Availability and implementation: L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/.

Contact: n.malod-dognin@imperial.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

QSLiMFinder: improved short linear motif prediction using specific query protein data

QSLiMFinder:改善短线性图案使用特定查询蛋白质数据的预测

Motivation: The sensitivity of de novo short linear motif (SLiM) prediction is limited by the number of patterns (the motif space) being assessed for enrichment. QSLiMFinder uses specific query protein information to restrict the motif space and thereby increase the sensitivity and specificity of predictions.

Results: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins. Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction. The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

Availability and implementation: All the tools and data used in this study, including QSLiMFinder and the SLiMBench benchmarking software, are freely available under a GNU license as part of SLiMSuite, at: http://bioware.soton.ac.uk.

Contact: richard.edwards@unsw.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Characterizing spatial distributions of astrocytes in the mammalian retina

特征空间分布在哺乳动物的视网膜星形胶质细胞

Motivation: In addition to being involved in retinal vascular growth, astrocytes play an important role in diseases and injuries, such as glaucomatous neuro-degeneration and retinal detachment. Studying astrocytes, their morphological cell characteristics and their spatial relationships to the surrounding vasculature in the retina may elucidate their role in these conditions.

Results: Our results show that in normal healthy retinas, the distribution of observed astrocyte cells does not follow a uniform distribution. The cells are significantly more densely packed around the blood vessels than a uniform distribution would predict. We also show that compared with the distribution of all cells, large cells are more dense in the vicinity of veins and toward the optic nerve head whereas smaller cells are often more dense in the vicinity of arteries. We hypothesize that since veinal astrocytes are known to transport toxic metabolic waste away from neurons they may be more critical than arterial astrocytes and therefore require larger cell bodies to process waste more efficiently.

Availability and implementation: A 1/8th size down-sampled version of the seven retinal image mosaics described in this article can be found on BISQUE (Kvilekval et al., 2010) at http://bisque.ece.ucsb.edu/client_service/view?resource=http://bisque.ece.ucsb.edu/data_service/dataset/6566968.

Contact: arunaj@ece.ucsb.edu or manj@ece.ucsb.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Patterns of Reproductive Isolation in Eucalyptus--A Phylogenetic Perspective

在桉树的生殖隔离模式——进化的角度

We assess phylogenetic patterns of hybridization in the speciose, ecologically and economically important genus Eucalyptus, in order to better understand the evolution of reproductive isolation. Eucalyptus globulus pollen was applied to 99 eucalypt species, mainly from the large commercially important subgenus, Symphyomyrtus. In the 64 species that produce seeds, hybrid compatibility was assessed at two stages, hybrid-production (at approximately 1 month) and hybrid-survival (at 9 months), and compared with phylogenies based on 8,350 genome-wide DArT (diversity arrays technology) markers. Model fitting was used to assess the relationship between compatibility and genetic distance, and whether or not the strength of incompatibility "snowballs" with divergence. There was a decline in compatibility with increasing genetic distance between species. Hybridization was common within two closely related clades (one including E. globulus), but rare between E. globulus and species in two phylogenetically distant clades. Of three alternative models tested (linear, slowdown, and snowball), we found consistent support for a snowball model, indicating that the strength of incompatibility accelerates relative to genetic distance. Although we can only speculate about the genetic basis of this pattern, it is consistent with a Dobzhansky–Muller-model prediction that incompatibilities should snowball with divergence due to negative epistasis. Different rates of compatibility decline in the hybrid-production and hybrid-survival measures suggest that early-acting postmating barriers developed first and are stronger than later-acting barriers. We estimated that complete reproductive isolation can take up to 21–31 My in Eucalyptus. Practical implications for hybrid eucalypt breeding and genetic risk assessment in Australia are discussed.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Quantitative Description of a Protein Fitness Landscape Based on Molecular Features

一种基于分子特征的蛋白质健身景观的定量描述

Understanding the driving forces behind protein evolution requires the ability to correlate the molecular impact of mutations with organismal fitness. To address this issue, we employ here metallo-β-lactamases as a model system, which are Zn(II) dependent enzymes that mediate antibiotic resistance. We present a study of all the possible evolutionary pathways leading to a metallo-β-lactamase variant optimized by directed evolution. By studying the activity, stability and Zn(II) binding capabilities of all mutants in the preferred evolutionary pathways, we show that this local fitness landscape is strongly conditioned by epistatic interactions arising from the pleiotropic effect of mutations in the different molecular features of the enzyme. Activity and stability assays in purified enzymes do not provide explanatory power. Instead, measurement of these molecular features in an environment resembling the native one provides an accurate description of the observed antibiotic resistance profile. We report that optimization of Zn(II) binding abilities of metallo-β-lactamases during evolution is more critical than stabilization of the protein to enhance fitness. A global analysis of these parameters allows us to connect genotype with fitness based on quantitative biochemical and biophysical parameters.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

dbPSP: a curated database for protein phosphorylation sites in prokaryotes

DBPSP:策划数据库在原核生物蛋白质磷酸化位点

As one of the most important post-translational modifications, phosphorylation is highly involved in almost all of biological processes through temporally and spatially modifying substrate proteins. Recently, phosphorylation in prokaryotes attracted much attention for its critical roles in various cellular processes such as signal transduction. Thus, an integrative data resource of the prokaryotic phosphorylation will be useful for further analysis. In this study, we presented a curated database of phosphorylation sites in prokaryotes (dbPSP, Database URL: http://dbpsp.biocuckoo.org) for 96 prokaryotic organisms, which belong to 11 phyla in two domains including bacteria and archaea. From the scientific literature, we manually collected experimentally identified phosphorylation sites on seven types of residues, including serine, threonine, tyrosine, aspartic acid, histidine, cysteine and arginine. In total, the dbPSP database contains 7391 phosphorylation sites in 3750 prokaryotic proteins. With the dataset, the sequence preferences of the phosphorylation sites and functional annotations of the phosphoproteins were analyzed, while the results shows that there were obvious differences among the phosphorylation in bacteria, archaea and eukaryotes. All the phosphorylation sites were annotated with original references and other descriptions in the database, which could be easily accessed through user-friendly website interface including various search and browse options. Taken together, the dbPSP database provides a comprehensive data resource for further studies of protein phosphorylation in prokaryotes.

Database URL: http://dbpsp.biocuckoo.org

[详细]

  • Database
  • 10年前
  • Original Article

Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis

生成的视角来看待疾病本体癌方面泛癌症数据集成分析

Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories.

Database URL: http://www.disease-ontology.org

[详细]

  • Database
  • 10年前
  • Database Tool