PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem

pug-soap和pug-rest:对化学信息的编程访问Web服务

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, developed and maintained by the US National Institutes of Health (NIH). PubChem contains more than 180 million depositor-provided chemical substance descriptions, 60 million unique chemical structures and 225 million bioactivity assay results, covering more than 9000 unique protein target sequences. As an information resource for the chemical biology research community, it routinely receives more than 1 million requests per day from an estimated more than 1 million unique users per month. Programmatic access to this vast amount of data is provided by several different systems, including the US National Center for Biotechnology Information (NCBI)'s Entrez Utilities (E-Utilities or E-Utils) and the PubChem Power User Gateway (PUG)—a common gateway interface (CGI) that exchanges data through eXtended Markup Language (XML). Further simplifying programmatic access, PubChem provides two additional general purpose web services: PUG-SOAP, which uses the simple object access protocol (SOAP) and PUG-REST, which is a Representational State Transfer (REST)-style interface. These interfaces can be harnessed in combination to access the data contained in PubChem, which is integrated with the more than thirty databases available within the NCBI Entrez system.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

为什么重?建模样本和观测水平提高了RNA序列变异分析

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Methods Online

i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly

i-cistarget 2015更新:人类广义的顺式调控富集分析,老鼠和苍蝇

i-cisTarget is a web tool to predict regulators of a set of genomic regions, such as ChIP-seq peaks or co-regulated/similar enhancers. i-cisTarget can also be used to identify upstream regulators and their target enhancers starting from a set of co-expressed genes. Whereas the original version of i-cisTarget was focused on Drosophila data, the 2015 update also provides support for human and mouse data. i-cisTarget detects transcription factor motifs (position weight matrices) and experimental data tracks (e.g. from ENCODE, Roadmap Epigenomics) that are enriched in the input set of regions. As experimental data tracks we include transcription factor ChIP-seq data, histone modification ChIP-seq data and open chromatin data. The underlying processing method is based on a ranking-and-recovery procedure, allowing accurate determination of enrichment across heterogeneous datasets, while also discriminating direct from indirect target regions through a ‘leading edge’ analysis. We illustrate i-cisTarget on various Ewing sarcoma datasets to identify EWS-FLI1 targets starting from ChIP-seq, differential ATAC-seq, differential H3K27ac and differential gene expression data. Use of i-cisTarget is free and open to all, and there is no login requirement. Address: http://gbiomed.kuleuven.be/apps/lcb/i-cisTarget.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

polysearch2:显著提高文本挖掘系统发现人类疾病之间的关联,基因,药物,毒素和代谢物,更

PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

NGL Viewer: a web application for molecular visualization

NGL的观众:一个分子可视化Web应用

The NGL Viewer (http://proteinformatics.charite.de/ngl) is a web application for the visualization of macromolecular structures. By fully adopting capabilities of modern web browsers, such as WebGL, for molecular graphics, the viewer can interactively display large molecular complexes and is also unaffected by the retirement of third-party plug-ins like Flash and Java Applets. Generally, the web application offers comprehensive molecular visualization through a graphical user interface so that life scientists can easily access and profit from available structural data. It supports common structural file-formats (e.g. PDB, mmCIF) and a variety of molecular representations (e.g. ‘cartoon, spacefill, licorice’). Moreover, the viewer can be embedded in other web sites to provide specialized visualizations of entries in structural databases or results of structure-related calculations.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server issue

TET1 is controlled by pluripotency-associated factors in ESCs and downmodulated by PRC2 in differentiated cells and tissues

Tet1通过胚胎干细胞的多能性相关因素的控制和downmodulated PRC2在分化的细胞和组织

Ten-eleven translocation (Tet) genes encode for a family of hydroxymethylase enzymes involved in regulating DNA methylation dynamics. Tet1 is highly expressed in mouse embryonic stem cells (ESCs) where it plays a critical role the pluripotency maintenance. Tet1 is also involved in cell reprogramming events and in cancer progression. Although the functional role of Tet1 has been largely studied, its regulation is poorly understood. Here we show that Tet1 gene is regulated, both in mouse and human ESCs, by the stemness specific factors Oct3/4, Nanog and by Myc. Thus Tet1 is integrated in the pluripotency transcriptional network of ESCs. We found that Tet1 is switched off by cell proliferation in adult cells and tissues with a consequent genome-wide reduction of 5hmC, which is more evident in hypemethylated regions and promoters. Tet1 downmodulation is mediated by the Polycomb repressive complex 2 (PRC2) through H3K27me3 histone mark deposition. This study expands the knowledge about Tet1 involvement in stemness circuits in ESCs and provides evidence for a transcriptional relationship between Tet1 and PRC2 in adult proliferating cells improving our understanding of the crosstalk between the epigenetic events mediated by these factors.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Gene regulation, Chromatin and Epigenetics

Follicle Online: an integrated database of follicle assembly, development and ovulation

卵泡:卵泡组件集成的在线数据库,发育和排卵

Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration.

Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php

[详细]

  • Database
  • 10年前
  • Original Article

NeuroPep: a comprehensive resource of neuropeptides

neuropep:综合资源的神经肽

Neuropeptides play a variety of roles in many physiological processes and serve as potential therapeutic targets for the treatment of some nervous-system disorders. In recent years, there has been a tremendous increase in the number of identified neuropeptides. Therefore, we have developed NeuroPep, a comprehensive resource of neuropeptides, which holds 5949 non-redundant neuropeptide entries originating from 493 organisms belonging to 65 neuropeptide families. In NeuroPep, the number of neuropeptides in invertebrates and vertebrates is 3455 and 2406, respectively. It is currently the most complete neuropeptide database. We extracted entries deposited in UniProt, the database (www.neuropeptides.nl) and NeuroPedia, and used text mining methods to retrieve entries from the MEDLINE abstracts and full text articles. All the entries in NeuroPep have been manually checked. 2069 of the 5949 (35%) neuropeptide sequences were collected from the scientific literature. Moreover, NeuroPep contains detailed annotations for each entry, including source organisms, tissue specificity, families, names, post-translational modifications, 3D structures (if available) and literature references. Information derived from these peptide sequences such as amino acid compositions, isoelectric points, molecular weight and other physicochemical properties of peptides are also provided. A quick search feature allows users to search the database with keywords such as sequence, name, family, etc., and an advanced search page helps users to combine queries with logical operators like AND/OR. In addition, user-friendly web tools like browsing, sequence alignment and mapping are also integrated into the NeuroPep database.

Database URL: http://isyslab.info/NeuroPep

[详细]

  • Database
  • 10年前
  • Database Tool

novPTMenzy: a database for enzymes involved in novel post-translational modifications

novptmenzy:一种参与新的翻译后修饰酶数据库

With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes.

Database URL: http://www.nii.ac.in/novptmenzy.html

[详细]

  • Database
  • 10年前
  • Original Article

Stratifying tumour subtypes based on copy number alteration profiles using next-generation sequence data

作为肿瘤亚型基于拷贝数改变配置文件使用新一代的序列数据

Motivation: The role of personalized medicine and target treatment in the clinical management of cancer patients has become increasingly important in recent years. This has made the task of precise histological substratification of cancers crucial. Increasingly, genomic data are being seen as a valuable classifier. Specifically, copy number alteration (CNA) profiles generated by next-generation sequencing (NGS) can become a determinant for tumours subtyping. The principle purpose of this study is to devise a model with good prediction capability for the tumours histological subtypes as a function of both the patients covariates and their genome-wide CNA profiles from NGS data.

Results: We investigate a logistic regression for modelling tumour histological subtypes as a function of the patients’ covariates and their CNA profiles, in a mixed model framework. The covariates, such as age and gender, are considered as fixed predictors and the genome-wide CNA profiles are considered as random predictors. We illustrate the application of this model in lung and oral cancer datasets, and the results indicate that the tumour histological subtypes can be modelled with a good fit. Our cross-validation indicates that the logistic regression exhibits the best prediction relative to other classification methods we considered in this study. The model also exhibits the best agreement in the prediction between smooth-segmented and circular binary-segmented CNA profiles.

Availability and implementation: An R package to run a logistic regression is available in http://www1.maths.leeds.ac.uk/~arief/R/CNALR/.

Contact: a.gusnanto@leeds.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Regulatory network inferred using expression data of small sample size: application and validation in erythroid system

监管网络推断使用表达数据的小样本大小:在红色的系统应用和验证

Motivation: Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. The outcome of the current algorithms highly depends on the quality and quantity of a single time-course dataset, and the performance may be compromised for datasets with a limited number of samples.

Results: In this work, we report a multi-layer graphical model that is capable of leveraging many publicly available time-course datasets, as well as a cell lineage-specific data with small sample size, to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual public datasets. Then, the inferred directional relationships are weighted and integrated together by evaluating against the cell lineage-specific dataset. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation.

Availability and implementation: The predicted erythroid regulatory network is available at http://guanlab.ccmb.med.umich.edu/data/inferenceNetwork/.

Contact: gyuanfan@umich.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Conformational sampling and structure prediction of multiple interacting loops in soluble and {beta}-barrel membrane proteins using multi-loop distance-guided chain-growth Monte Carlo method

构象取样和结构预测多种互动循环的可溶性和{β}筒膜蛋白用多回路distance-guided链增长蒙特卡罗方法

Motivation: Loops in proteins are often involved in biochemical functions. Their irregularity and flexibility make experimental structure determination and computational modeling challenging. Most current loop modeling methods focus on modeling single loops. In protein structure prediction, multiple loops often need to be modeled simultaneously. As interactions among loops in spatial proximity can be rather complex, sampling the conformations of multiple interacting loops is a challenging task.

Results: In this study, we report a new method called multi-loop Distance-guided Sequential chain-Growth Monte Carlo (M-DiSGro) for prediction of the conformations of multiple interacting loops in proteins. Our method achieves an average RMSD of 1.93 Å for lowest energy conformations of 36 pairs of interacting protein loops with the total length ranging from 12 to 24 residues. We further constructed a data set containing proteins with 2, 3 and 4 interacting loops. For the most challenging target proteins with four loops, the average RMSD of the lowest energy conformations is 2.35 Å. Our method is also tested for predicting multiple loops in β-barrel membrane proteins. For outer-membrane protein G, the lowest energy conformation has a RMSD of 2.62 Å for the three extracellular interacting loops with a total length of 34 residues (12, 12 and 10 residues in each loop).

Availability and implementation: The software is freely available at: tanto.bioe.uic.edu/m-DiSGro.

Contact: jinfeng@stat.fsu.edu or jliang@uic.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data

应用稳定选择一致估计稀疏主成分在高维分子数据

Motivation: Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso, to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results.

Results: We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that ‘knows’ the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma.

Availability and implementation: Software is available at https://github.com/mwsill/s4vdpca.

Contact: m.sill@dkfz.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments

EBSeq-HMM:贝叶斯方法识别命令RNA-seq实验中基因表达的变化

Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data.

Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression.

Availability and implementation: An R package containing examples and sample datasets is available at Bioconductor.

Contact: kendzior@biostat.wisc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability

最大似然序列推断蛋白质折叠的稳定性选择

Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback–Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

A Phylogenetic Analysis of 34 Chloroplast Genomes Elucidates the Relationships between Wild and Domestic Species within the Genus Citrus

34叶绿体基因组的进化分析阐明了野生和家养在柑橘属的物种之间的关系

Citrus genus includes some of the most important cultivated fruit trees worldwide. Despite being extensively studied because of its commercial relevance, the origin of cultivated citrus species and the history of its domestication still remain an open question. Here, we present a phylogenetic analysis of the chloroplast genomes of 34 citrus genotypes which constitutes the most comprehensive and detailed study to date on the evolution and variability of the genus Citrus. A statistical model was used to estimate divergence times between the major citrus groups. Additionally, a complete map of the variability across the genome of different citrus species was produced, including single nucleotide variants, heteroplasmic positions, indels (insertions and deletions), and large structural variants. The distribution of all these variants provided further independent support to the phylogeny obtained. An unexpected finding was the high level of heteroplasmy found in several of the analyzed genomes. The use of the complete chloroplast DNA not only paves the way for a better understanding of the phylogenetic relationships within the Citrus genus but also provides original insights into other elusive evolutionary processes, such as chloroplast inheritance, heteroplasmy, and gene selection.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

Workflow4Metabolomics:计算代谢组学的合作研究的基础设施

Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation.

Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure.

W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB).

Contact: contact@workflow4metabolomics.org

[详细]

  • Bioinformatics
  • 10年前
  • GENE EXPRESSION

RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

RNA-Rocket:RNA-Seq分析传染病研究资源

Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers.

Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data.

Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal.

Contact: anwarren@vt.edu

Supplementary information: Supplementary materials are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • GENE EXPRESSION

The Cyni framework for network inference in Cytoscape

Cytoscape Cyni网络推理框架

Motivation: Research on methods for the inference of networks from biological data is making significant advances, but the adoption of network inference in biomedical research practice is lagging behind. Here, we present Cyni, an open-source ‘fill-in-the-algorithm’ framework that provides common network inference functionality and user interface elements. Cyni allows the rapid transformation of Java-based network inference prototypes into apps of the popular open-source Cytoscape network analysis and visualization ecosystem. Merely placing the resulting app in the Cytoscape App Store makes the method accessible to a worldwide community of biomedical researchers by mouse click. In a case study, we illustrate the transformation of an ARACNE implementation into a Cytoscape app.

Availability and implementation: Cyni, its apps, user guides, documentation and sample code are available from the Cytoscape App Store http://apps.cytoscape.org/apps/cynitoolbox

Contact: benno.schwikowski@pasteur.fr

[详细]

  • Bioinformatics
  • 10年前
  • SYSTEMS BIOLOGY

mirPub: a database for searching microRNA publications

mirPub:数据库搜索微rna出版物

Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications.

Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/.

Contact: vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • DATABASES AND ONTOLOGIES

diXa: a data infrastructure for chemical safety assessment

diXa:化学品安全评估数据基础设施

Motivation: The field of toxicogenomics (the application of ‘-omics’ technologies to risk assessment of compound toxicities) has expanded in the last decade, partly driven by new legislation, aimed at reducing animal testing in chemical risk assessment but mainly as a result of a paradigm change in toxicology towards the use and integration of genome wide data. Many research groups worldwide have generated large amounts of such toxicogenomics data. However, there is no centralized repository for archiving and making these data and associated tools for their analysis easily available.

Results: The Data Infrastructure for Chemical Safety Assessment (diXa) is a robust and sustainable infrastructure storing toxicogenomics data. A central data warehouse is connected to a portal with links to chemical information and molecular and phenotype data. diXa is publicly available through a user-friendly web interface. New data can be readily deposited into diXa using guidelines and templates available online. Analysis descriptions and tools for interrogating the data are available via the diXa portal.

Availability and implementation: http://www.dixa-fp7.eu

Contact: d.hendrickx@maastrichtuniversity.nl; info@dixa-fp7.eu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • DATABASES AND ONTOLOGIES

MACE: mutation-oriented profiling of chemical response and gene expression in cancers

梅斯:mutation-oriented分析化学反应和癌症的基因表达

Summary: The mutational status of specific cancer lineages can affect the sensitivity to or resistance against cancer drugs. The MACE database provides web-based interactive tools for interpreting large chemical screening and gene expression datasets of cancer cell lines in terms of mutation and lineage categories. GI50 data of chemicals against individual NCI60 cell lines were normalized and organized to statistically identify mutation- or lineage-specific chemical responses. Similarly, DNA microarray data on NCI60 cell lines were processed to analyze mutation- or lineage-specific gene expression signatures. A combined analysis of GI50 and gene expression data to find potential associations between chemicals and genes is also a capability of this system. This database will provide extensive, systematic information to identify lineage- or mutation-specific anticancer agents and related gene targets.

Availability and implementation: The MACE web database is available at http://mace.sookmyung.ac.kr/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: yoonsj@sookmyung.ac.kr

[详细]

  • Bioinformatics
  • 10年前
  • DATABASES AND ONTOLOGIES

The QDREC web server: determining dose-response characteristics of complex macroparasites in phenotypic drug screens

QDREC web服务器:确定剂量反应特征的复杂macroparasites表型药物筛选

Summary: Neglected tropical diseases (NTDs) caused by helminths constitute some of the most common infections of the world’s poorest people. The etiological agents are complex and recalcitrant to standard techniques of molecular biology. Drug screening against helminths has often been phenotypic and typically involves manual description of drug effect and efficacy. A key challenge is to develop automated, quantitative approaches to drug screening against helminth diseases. The quantal dose–response calculator (QDREC) constitutes a significant step in this direction. It can be used to automatically determine quantitative dose–response characteristics and half-maximal effective concentration (EC50) values using image-based readouts from phenotypic screens, thereby allowing rigorous comparisons of the efficacies of drug compounds. QDREC has been developed and validated in the context of drug screening for schistosomiasis, one of the most important NTDs. However, it is equally applicable to general phenotypic screening involving helminths and other complex parasites.

Availability and implementation: QDREC is publically available at: http://haddock4.sfsu.edu/qdrec2/. Source code and datasets are at: http://tintin.sfsu.edu/projects/phenotypicAssays.html.

Contact: rahul@sfsu.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • BIOIMAGE INFORMATICS