Dissecting the nascent human transcriptome by analysing the RNA content of transcription factories

研究人类早期的转录的转录的RNA含量分析工厂

While mapping total and poly-adenylated human transcriptomes has now become routine, characterizing nascent transcripts remains challenging, largely because nascent RNAs have such short half-lives. Here, we describe a simple, fast and cost-effective method to isolate RNA associated with transcription factories, the sites responsible for the majority of nuclear transcription. Following stimulation of human endothelial cells with the pro-inflammatory cytokine TNFα, we isolate and analyse the RNA content of factories by sequencing. Comparison with total, poly(A)+ and chromatin RNA fractions reveals that sequencing of purified factory RNA maps the complete nascent transcriptome; it is rich in intronic unprocessed transcript, as well as long intergenic non-coding (lincRNAs) and enhancer-associated RNAs (eRNAs), micro-RNA precursors and repeat-derived RNAs. Hence, we verify that transcription factories produce most nascent RNA and confer a regulatory role via their association with a set of specifically-retained non-coding transcripts.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Babelomics 5.0: functional interpretation for new generations of genomic data

5:babelomics的基因组数据的新一代功能解释

Babelomics has been running for more than one decade offering a user-friendly interface for the functional analysis of gene expression and genomic data. Here we present its fifth release, which includes support for Next Generation Sequencing data including gene expression (RNA-seq), exome or genome resequencing. Babelomics has simplified its interface, being now more intuitive. Improved visualization options, such as a genome viewer as well as an interactive network viewer, have been implemented. New technical enhancements at both, client and server sides, makes the user experience faster and more dynamic. Babelomics offers user-friendly access to a full range of methods that cover: (i) primary data analysis, (ii) a variety of tests for different experimental designs and (iii) different enrichment and network analysis algorithms for the interpretation of the results of such tests in the proper functional context. In addition to the public server, local copies of Babelomics can be downloaded and installed. Babelomics is freely available at: http://www.babelomics.org.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

MetaboAnalyst 3.0--making metabolomics more meaningful

metaboanalyst 3.0--making代谢组学更有意义

MetaboAnalyst (www.metaboanalyst.ca) is a web server designed to permit comprehensive metabolomic data analysis, visualization and interpretation. It supports a wide range of complex statistical calculations and high quality graphical rendering functions that require significant computational resources. First introduced in 2009, MetaboAnalyst has experienced more than a 50X growth in user traffic (>50 000 jobs processed each month). In order to keep up with the rapidly increasing computational demands and a growing number of requests to support translational and systems biology applications, we performed a substantial rewrite and major feature upgrade of the server. The result is MetaboAnalyst 3.0. By completely re-implementing the MetaboAnalyst suite using the latest web framework technologies, we have been able substantially improve its performance, capacity and user interactivity. Three new modules have also been added including: (i) a module for biomarker analysis based on the calculation of receiver operating characteristic curves; (ii) a module for sample size estimation and power analysis for improved planning of metabolomics studies and (iii) a module to support integrative pathway analysis for both genes and metabolites. In addition, popular features found in existing modules have been significantly enhanced by upgrading the graphical output, expanding the compound libraries and by adding support for more diverse organisms.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

The iceLogo web server and SOAP service for determining protein consensus sequences

确定蛋白质的一致性序列的icelogo Web服务器和SOAP服务

The iceLogo web server and SOAP service implement the previously published iceLogo algorithm. iceLogo builds on probability theory to visualize protein consensus sequences in a format resembling sequence logos. Peptide sequences are compared against a reference sequence set that can be tailored to the studied system and the used protocol. As such, not only over- but also underrepresented residues can be visualized in a statistically sound manner, which further allows the user to easily analyse and interpret conserved sequence patterns in proteins. The web application and SOAP service can be found free and open to all users without the need for a login on http://iomics.ugent.be/icelogoserver/main.html.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

miRiadne: a web tool for consistent integration of miRNA nomenclature

miriadne:一种miRNA命名一致集成网络工具

The miRBase is the official miRNA repository which keeps the annotation updated on newly discovered miRNAs: it is also used as a reference for the design of miRNA profiling platforms. Nomenclature ambiguities generated by loosely updated platforms and design errors lead to incompatibilities among platforms, even from the same vendor. Published miRNA lists are thus generated with different profiling platforms that refer to diverse and not updated annotations. This greatly compromises searches, comparisons and analyses that rely on miRNA names only without taking into account the mature sequences, which is particularly critic when such analyses are carried over automatically. In this paper we introduce miRiadne, a web tool to harmonize miRNA nomenclature, which takes into account the original miRBase versions from 10 up to 21, and annotations of 25 common profiling platforms from nine brands that we manually curated. miRiadne uses the miRNA mature sequence to link miRBase versions and/or platforms to prevent nomenclature ambiguities. miRiadne was designed to simplify and support biologists and bioinformaticians in re-annotating their own miRNA lists and/or data sets. As Ariadne helped Theseus in escaping the mythological maze, miRiadne will help the miRNA researcher in escaping the nomenclature maze. miRiadne is freely accessible from the URL http://www.miriadne.org.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

The BioMart community portal: an innovative alternative to large, centralized data repositories

BioMart社区门户:大创新的替代,集中式数据仓库

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

pyDockSAXS: protein-protein complex structure by SAXS and computational docking

pydocksaxs:蛋白质复合物结构的SAXS和计算对接

Structural characterization of protein–protein interactions at molecular level is essential to understand biological processes and identify new therapeutic opportunities. However, atomic resolution structural techniques cannot keep pace with current advances in interactomics. Low-resolution structural techniques, such as small-angle X-ray scattering (SAXS), can be applied at larger scale, but they miss atomic details. For efficient application to protein–protein complexes, low-resolution information can be combined with theoretical methods that provide energetic description and atomic details of the interactions. Here we present the pyDockSAXS web server (http://life.bsc.es/pid/pydocksaxs) that provides an automatic pipeline for modeling the structure of a protein–protein complex from SAXS data. The method uses FTDOCK to generate rigid-body docking models that are subsequently evaluated by a combination of pyDock energy-based scoring function and their capacity to describe SAXS data. The only required input files are structural models for the interacting partners and a SAXS curve. The server automatically provides a series of structural models for the complex, sorted by the pyDockSAXS scoring function. The user can also upload a previously computed set of docking poses, which opens the possibility to filter the docking solutions by potential interface residues or symmetry restraints. The server is freely available to all users without restriction.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Synthesis of 2'-Fluoro RNA by Syn5 RNA polymerase

2’-氟RNA的RNA聚合酶合成syn5

The substitution of 2'-fluoro for 2'-hydroxyl moieties in RNA substantially improves the stability of RNA. RNA stability is a major issue in RNA research and applications involving RNA. We report that the RNA polymerase from the marine cyanophage Syn5 has an intrinsic low discrimination against the incorporation of 2'-fluoro dNMPs during transcription elongation. The presence of both magnesium and manganese ions at high concentrations further reduce this discrimination without decreasing the efficiency of incorporation. We have constructed a Syn5 RNA polymerase in which tyrosine 564 is replaced with phenylalanine (Y564F) that further decreases the discrimination against 2'-fluoro-dNTPs during RNA synthesis. Sequence elements in DNA templates that affect the yield of RNA and incorporation of 2'-fluoro-dNMPs by Syn5 RNA polymerase have been identified.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution

栽培的陆地棉(Gossypium hirsutum基因组序列 TM-1)提供了深入了解基因组的进化

Two draft sequences of Gossypium hirsutum, the most widely cultivated cotton species, provide insights into genome structure, genome rearrangement, gene evolution and cotton fiber biology.

[详细]

  • Nature Biotechnology 33, 524 (2015)
  • 9年前
  • Research

Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement

四倍体棉花(陆地棉TM-1序列进行了纤维改善资源)

Two draft sequences of Gossypium hirsutum, the most widely cultivated cotton species, provide insights into genome structure, genome rearrangement, gene evolution and cotton fiber biology.

[详细]

  • Nature Biotechnology 33, 531 (2015)
  • 9年前
  • Research

Alternative 3′ UTRs act as scaffolds to regulate membrane protein localization

选择3′ UTRs作为支架,调节膜蛋白的定位

About half of human genes use alternative cleavage and polyadenylation (ApA) to generate messenger RNA transcripts that differ in the length of their 3′ untranslated regions (3′ UTRs) while producing the same protein. Here we show in human cell lines that alternative 3′ UTRs differentially regulate the localization of membrane proteins. The long 3′ UTR of CD47 enables efficient cell surface expression of CD47 protein, whereas the short 3′ UTR primarily localizes CD47 protein to the endoplasmic reticulum. CD47 protein localization occurs post-translationally and independently of RNA localization. In our model of 3′ UTR-dependent protein localization, the long 3′ UTR of CD47 acts as a scaffold to recruit a protein complex containing the RNA-binding protein HuR (also known as ELAVL1) and SET to the site of translation. This facilitates interaction of SET with the newly translated cytoplasmic domains of CD47 and results in subsequent translocation of CD47 to the plasma membrane via activated RAC1 (ref. 5). We also show that CD47 protein has different functions depending on whether it was generated by the short or long 3′ UTR isoforms. Thus, ApA contributes to the functional diversity of the proteome without changing the amino acid sequence. 3′ UTR-dependent protein localization has the potential to be a widespread trafficking mechanism for membrane proteins because HuR binds to thousands of mRNAs, and we show that the long 3′ UTRs of CD44, ITGA1 and TNFRSF13C, which are bound by HuR, increase surface protein expression compared to their corresponding short 3′ UTRs. We propose that during translation the scaffold function of 3′ UTRs facilitates binding of proteins to nascent proteins to direct their transport or function—and this role of 3′ UTRs can be regulated by ApA.

[详细]

  • Nature
  • 9年前
  • Letter

Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells

内在的逆转录病毒再激活在植入前胚胎细胞和多能干细胞

Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, and comprise nearly 8% of the human genome. The most recently acquired human ERV is HERVK(HML-2), which repeatedly infected the primate lineage both before and after the divergence of the human and chimpanzee common ancestor. Unlike most other human ERVs, HERVK retained multiple copies of intact open reading frames encoding retroviral proteins. However, HERVK is transcriptionally silenced by the host, with the exception of in certain pathological contexts such as germ-cell tumours, melanoma or human immunodeficiency virus (HIV) infection. Here we demonstrate that DNA hypomethylation at long terminal repeat elements representing the most recent genomic integrations, together with transactivation by OCT4 (also known as POU5F1), synergistically facilitate HERVK expression. Consequently, HERVK is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the eight-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. Remarkably, we detected HERVK viral-like particles and Gag proteins in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. We further show that overexpression of one such product, the HERVK accessory protein Rec, in a pluripotent cell line is sufficient to increase IFITM1 levels on the cell surface and inhibit viral infection, suggesting at least one mechanism through which HERVK can induce viral restriction pathways in early embryonic cells. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.

[详细]

  • Nature
  • 9年前
  • Letter

A fast- and positively photoswitchable fluorescent protein for ultralow-laser-power RESOLFT nanoscopy

一个快速和积极的萤光蛋白超激光功率resolft纳米

Kohinoor is a fast- and positively switching fluorescent protein with high photostability over many photoswitching cycles. Its improved photophysical properties enable nanoscopy with low phototoxicity and allow use of a simplified RESOLFT setup.

[详细]

  • Nature Methods
  • 9年前
  • Brief Communication

Isotope-targeted glycoproteomics (IsoTaG): a mass-independent platform for intact N- and O-glycopeptide discovery and analysis

同位素靶向糖蛋白质组学(isotag):一块独立的平台完整的N和o-glycopeptide发现和分析

Metabolically labeling proteins with glycans that enable attachment of an isotopically encoded tag allows for the identification of N- and O- glycopeptides and their glycan structures.

[详细]

  • Nature Methods
  • 9年前
  • Article

Single-cell, locus-specific bisulfite sequencing (SLBS) for direct detection of epimutations in DNA methylation patterns

单细胞,特异性位点测序(SLBS)的表观突变直接检测DNA甲基化模式

Stochastic epigenetic changes drive biological processes, such as development, aging and disease. Yet, epigenetic information is typically collected from millions of cells, thereby precluding a more precise understanding of cell-to-cell variability and the pathogenic history of epimutations. Here we present a novel procedure for directly detecting epimutations in DNA methylation patterns using single-cell, locus-specific bisulfite sequencing (SLBS). We show that within gene promoter regions of mouse hepatocytes the epimutation rate is two orders of magnitude higher than the mutation rate.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Three minimal sequences found in Ebola virus genomes and absent from human DNA

三个最小的序列中发现埃博拉病毒基因组和人类DNA缺席

Motivation: Ebola virus causes high mortality hemorrhagic fevers, with more than 25 000 cases and 10 000 deaths in the current outbreak. Only experimental therapies are available, thus, novel diagnosis tools and druggable targets are needed.

Results: Analysis of Ebola virus genomes from the current outbreak reveals the presence of short DNA sequences that appear nowhere in the human genome. We identify the shortest such sequences with lengths between 12 and 14. Only three absent sequences of length 12 exist and they consistently appear at the same location on two of the Ebola virus proteins, in all Ebola virus genomes, but nowhere in the human genome. The alignment-free method used is able to identify pathogen-specific signatures for quick and precise action against infectious agents, of which the current Ebola virus outbreak provides a compelling example.

Availability and Implementation: EAGLE is freely available for non-commercial purposes at http://bioinformatics.ua.pt/software/eagle.

Contact: raquelsilva@ua.pt; pratas@ua.pt

Supplementary Information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • DISCOVERY NOTE

Outlier detection at the transcriptome-proteome interface

孤立点检测在transcriptome-proteome接口

Background: In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation.

Results: Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.

Contact: mn@ecs.soton.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

稀疏:二次时间同步校准和折叠没有基于启发式的rna

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of $$O({n}^{6})$$. Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ($$\ge $$ quartic time).

Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.

Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE.

Contact: backofen@informatik.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes

disgenet:用于人类疾病的动态探测及其基因的发现平台

DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380 000 associations between >16 000 genes and 13 000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/

[详细]

  • Database
  • 9年前
  • Database Tool

Using neighborhood cohesiveness to infer interactions between protein domains

使用社区凝聚力来推断蛋白质之间的相互作用域

Motivation: In recent years, large-scale studies have been undertaken to describe, at least partially, protein-protein interaction maps, or interactomes, for a number of relevant organisms, including human. However, current interactomes provide a somehow limited picture of the molecular details involving protein interactions, mostly because essential experimental information, especially structural data, is lacking. Indeed, the gap between structural and interactomics information is enlarging and thus, for most interactions, key experimental information is missing. We elaborate on the observation that many interactions between proteins involve a pair of their constituent domains and, thus, the knowledge of how protein domains interact adds very significant information to any interactomic analysis.

Results: In this work, we describe a novel use of the neighborhood cohesiveness property to infer interactions between protein domains given a protein interaction network. We have shown that some clustering coefficients can be extended to measure a degree of cohesiveness between two sets of nodes within a network. Specifically, we used the meet/min coefficient to measure the proportion of interacting nodes between two sets of nodes and the fraction of common neighbors. This approach extends previous works where homolog coefficients were first defined around network nodes and later around edges. The proposed approach substantially increases both the number of predicted domain-domain interactions as well as its accuracy as compared with current methods.

Availability and implementation: http://dimero.cnb.csic.es

Contact: jsegura@cnb.csic.es

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

DREAM: a webserver for the identification of editing sites in mature miRNAs using deep sequencing data

梦想:一个网络服务器编辑网站的识别在成熟的microrna使用深度测序数据

Summary: DREAM: detecting RNA editing associated with microRNAs, is a webserver for the identification of mature microRNA editing events using deep sequencing data. Raw microRNA sequencing reads can be provided as input, the reads are aligned against the genome and custom scripts process the data, search for potential editing sites and assess the statistical significance of the findings. The output is a text file with the location and the statistical description of all the putative editing sites detected.

Availability and implementation: DREAM is freely available on the web at http://www.cs.tau.ac.il/~mirnaed/.

Contact: elieis@post.tau.ac.il

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

A statistical physics perspective on alignment-independent protein sequence comparison

统计物理学角度alignment-independent蛋白质序列比较

Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly.

Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from ‘first passage probability distribution’ to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach.

Contact: d.r.flower@aston.ac.uk

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival

集成的体细胞突变、基因表达和功能数据揭示潜在司机预测乳腺癌的生存

Motivation: Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures.

Results: An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer.

Availability and implementation: The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/.

Contact: yudi.pawitan@ki.se

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Identification of a small set of plasma signalling proteins using neural network for prediction of Alzheimer's disease

Identification of a small set of plasma signalling proteins using neural network for prediction of Alzheimer 's diseases

Motivation: Alzheimer’s disease (AD) is a dementia that gets worse with time resulting in loss of memory and cognitive functions. The life expectancy of AD patients following diagnosis is ~7 years. In 2006, researchers estimated that 0.40% of the world population (range 0.17–0.89%) was afflicted by AD, and that the prevalence rate would be tripled by 2050. Usually, examination of brain tissues is required for definite diagnosis of AD. So, it is crucial to diagnose AD at an early stage via some alternative methods. As the brain controls many functions via releasing signalling proteins through blood, we analyse blood plasma proteins for diagnosis of AD.

Results: Here, we use a radial basis function (RBF) network for feature selection called feature selection RBF network for selection of plasma proteins that can help diagnosis of AD. We have identified a set of plasma proteins, smaller in size than previous study, with comparable prediction accuracy. We have also analysed mild cognitive impairment (MCI) samples with our selected proteins. We have used neural networks and support vector machines as classifiers. The principle component analysis, Sammmon projection and heat-map of the selected proteins have been used to demonstrate the proteins’ discriminating power for diagnosis of AD. We have also found a set of plasma signalling proteins that can distinguish incipient AD from MCI at an early stage. Literature survey strongly supports the AD diagnosis capability of the selected plasma proteins.

Availability and implementation: The FSRBF code is available at https://sites.google.com/site/agar walswapna/publications.

Contact: agarwal.swapna@gmail.com or swapna_r@isical.ac.in

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER