Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2{alpha} and ZNF451 in mammals

可变剪接和转位因子的选择:对于TMPO / LAP2 {α}和ZNF451哺乳动物

Summary: Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common—they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, whereas the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized.

Contact: mtress@cnio.es or valencia@cnio.es

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • DISCOVERY NOTE

Mining viral proteins for antimicrobial and cell-penetrating drug delivery peptides

挖掘病毒蛋白和细胞穿透肽的抗菌药

Motivation: The need for more effective and safer pharmaceuticals is a persistent quest. Microbial adaptations create the need to permanently develop new antimicrobials (AMPs), for instance. Similarly, intracellular delivery of drugs is still a challenge and translocation of membranes for drug delivery is an area of intense research. Peptides can be used both as AMP drug leads and drug carrier systems for intracellular delivery. Multifunctional proteins are abundant in viruses but, surprisingly, have never been thoroughly screened for bioactive peptide sequences.

Results: Using the AMPA and CellPPD online tools, we have evaluated the propensity of viral proteins to comprise AMP or cell-penetrating peptides (CPPs). Capsid proteins from both enveloped and non-enveloped viruses, and membrane and envelope proteins from enveloped viruses, in a total of 272 proteins from 133 viruses, were screened to detect the presence of potential AMP and CPP sequences. A pool of 2444 and 426 CPP and AMP sequences, respectively, were discovered. The capsids of flaviviruses are the best sources of these peptides reaching more than 80% of CPP sequence coverage per protein. Selected sequences were tested experimentally and validated the results. Overall, this study reveals that viruses form a natural multivalent biotechnological platform still underexplored in drug discovery and the heterogeneous abundance of CPP/AMP sequences among viral families opens new avenues in viral biology research.

Contacts: aveiga@medicina.ulisboa.pt or macastanho@medicina.ulisboa.pt

Supplementary information: supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • DISCOVERY NOTE

LCR-eXXXplorer: a web platform to search, visualize and share data for low complexity regions in protein sequences

LCR-eXXXplorer:网络平台搜索、可视化和共享数据低蛋白质序列复杂性的地区

Motivation: Local compositionally biased and low complexity regions (LCRs) in amino acid sequences have initially attracted the interest of researchers due to their implication in generating artifacts in sequence database searches. There is accumulating evidence of the biological significance of LCRs both in physiological and in pathological situations. Nonetheless, LCR-related algorithms and tools have not gained wide appreciation across the research community, partly due to the fact that only a handful of user-friendly software is currently freely available.

Results: We developed LCR-eXXXplorer, an extensible online platform attempting to fill this gap. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Moreover, users may perform powerful queries against a custom designed sequence/LCR-centric database. We anticipate that LCR-eXXXplorer will be a useful starting point in research efforts for the elucidation of the structure, function and evolution of proteins with LCRs.

Availability and implementation: LCR-eXXXplorer is freely available at the URL http://repeat.biol. ucy.ac.cy/lcr-exxxplorer.

Contact: vprobon@ucy.ac.cy

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

High-throughput assay and engineering of self-cleaving ribozymes by sequencing

高通量检测和自切割核酶通过测序工程

Self-cleaving ribozymes are found in all domains of life and are believed to play important roles in biology. Additionally, self-cleaving ribozymes have been the subject of extensive engineering efforts for applications in synthetic biology. These studies often involve laborious assays of multiple individual variants that are either designed rationally or discovered through selection or screening. However, these assays provide only a limited view of the large sequence space relevant to the ribozyme function. Here, we report a strategy that allows quantitative characterization of greater than 1000 ribozyme variants in a single experiment. We generated a library of predefined ribozyme variants that were converted to DNA and analyzed by high-throughput sequencing. By counting the number of cleaved and uncleaved reads of every variant in the library, we obtained a complete activity profile of the ribozyme pool which was used to both analyze and engineer allosteric ribozymes.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

A new method to improve network topological similarity search: applied to fold recognition

一种新方法来改善网络拓扑相似性搜索:应用于折叠识别

Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics.

Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network.

Availability and implementation: Source code freely available upon request

Contact: lxie@iscb.org

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

GlycanAnalysis Plug-in: a database search tool for N-glycan structures using mass spectrometry

GlycanAnalysis插件:数据库搜索工具N-glycan使用质谱结构

Summary: Tandem mass spectrometry (MS/MS or MSn) is a potent technique for characterizing N-glycan structures. GlycanAnalysis searches a glycan database to support the identification of glycan structures from MS/MS spectra. It also calculates diagnostic ions of glycan structures registered in a glycan database (GlycomeDB or KEGG GLYCAN) and searches for MS/MS spectra of N-glycans that match diagnostic ions to determine the structures. This program functions as a plug-in for Mass++, a freeware mass spectrum visualization and analysis program.

Availability and implementation: The executable files of Mass++ are available for free at http://www.first-ms3d.jp/english/. The GlycanAnalysis plug-in is included in the standard package of Mass++ for Windows.

Contact: k-morimt@shimadzu.co.jp or nishikaz@shimadzu.co.jp or acyshzw@shimadzu.co.jp

Supplementary information: Supplementary material are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

specL--an R/Bioconductor package to prepare peptide spectrum matches for use in targeted proteomics

specL——一个R / Bioconductor包准备肽谱匹配用于目标蛋白质组学

Motivation: Targeted data extraction methods are attractive ways to obtain quantitative peptide information from a proteomics experiment. Sequential Window Acquisition of all Theoretical Spectra (SWATH) and Data Independent Acquisition (DIA) methods increase reproducibility of acquired data because the classical precursor selection is omitted and all present precursors are fragmented. However, especially for targeted data extraction, MS coordinates (retention time information precursor and fragment masses) are required for the particular entities (peptide ions). These coordinates are usually generated in a so-called discovery experiment earlier on in the project if not available in public spectral library repositories. The quality of the assay panel is crucial to ensure appropriate downstream analysis. For that, a method is needed to create spectral libraries and to export customizable assay panels.

Results: Here, we present a versatile set of functions to generate assay panels from spectral libraries for use in targeted data extraction methods (SWATH/DIA) in the area of proteomics.

Availability and implementation: specL is implemented in the R language and available under an open-source license (GPL-3) in Bioconductor since BioC 3.0 (R-3.1) http://www.bioconductor.org (Trachsel et al., 2015). A vignette with a complete tutorial describing data import/export and analysis is included in the package and can also be found as supplement material of this article.

Contact: cp@fgcz.ethz.ch or jg@fgcz.ethz.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

NxTrim: optimized trimming of Illumina mate pair reads

NxTrim:优化调整Illumina公司伴侣对读取

Motivation: Mate pair protocols add to the utility of paired-end sequencing by boosting the genomic distance spanned by each pair of reads, potentially allowing larger repeats to be bridged and resolved. The Illumina Nextera Mate Pair (NMP) protocol uses a circularization-based strategy that leaves behind 38-bp adapter sequences, which must be computationally removed from the data. While ‘adapter trimming’ is a well-studied area of bioinformatics, existing tools do not fully exploit the particular properties of NMP data and discard more data than is necessary.

Results: We present NxTrim, a tool that strives to discard as little sequence as possible from NMP reads. NxTrim makes full use of the sequence on both sides of the adapter site to build ‘virtual libraries’ of mate pairs, paired-end reads and single-ended reads. For bacterial data, we show that aggregating these datasets allows a single NMP library to yield an assembly whose quality compares favourably to that obtained from regular paired-end reads.

Availability and implementation: The source code is available at https://github.com/sequencing/NxTrim

Contact: acox@illumina.com

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

EpIC: a rational pipeline for epitope immunogenicity characterization

史诗:理性管道表位免疫原性鉴定

Summary: Efforts to develop peptide-based vaccines, in particular those requiring site-specific targeting of self-proteins, rely on the ability to optimize the immunogenicity of the peptide epitopes. Currently, screening of candidate vaccines is typically performed through low-throughput, high-cost animal trials. To improve on this we present the program EpIC, which enables high-throughput prediction of peptide immunogenicity based on the endogenous occurrence of B-cell epitopes within native protein sequences. This information informs rational selection of immunogenicity-optimized epitopes for peptide vaccines.

Availability and implementation: EpIC is available as a web server at http://saphire.usask.ca/saphire/epic.

Contact: kdm449@mail.usask.ca

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum

sequencing Deep analysis病毒的感染和具体范围的快速演变致石棉病毒的突变的five方面

Motivation: The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations.

Results: Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies.

Availability and implementation: Freely available on the web at http://www.vivanbioinfo.org

Contact: nshomron@post.tau.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

14-3-3-Pred: improved methods to predict 14-3-3-binding phosphopeptides

14-3-3-Pred:改进方法来预测14-3-3-binding phosphopeptides

Motivation: The 14-3-3 family of phosphoprotein-binding proteins regulates many cellular processes by docking onto pairs of phosphorylated Ser and Thr residues in a constellation of intracellular targets. Therefore, there is a pressing need to develop new prediction methods that use an updated set of 14-3-3-binding motifs for the identification of new 14-3-3 targets and to prioritize the downstream analysis of >2000 potential interactors identified in high-throughput experiments.

Results: Here, a comprehensive set of 14-3-3-binding targets from the literature was used to develop 14-3-3-binding phosphosite predictors. Position-specific scoring matrix, support vector machines (SVM) and artificial neural network (ANN) classification methods were trained to discriminate experimentally determined 14-3-3-binding motifs from non-binding phosphopeptides. ANN, position-specific scoring matrix and SVM methods showed best performance for a motif window spanning from –6 to +4 around the binding phosphosite, achieving Matthews correlation coefficient of up to 0.60. Blind prediction showed that all three methods outperform two popular 14-3-3-binding site predictors, Scansite and ELM. The new methods were used for prediction of 14-3-3-binding phosphosites in the human proteome. Experimental analysis of high-scoring predictions in the FAM122A and FAM122B proteins confirms the predictions and suggests the new 14-3-3-predictors will be generally useful.

Availability and implementation: A standalone prediction web server is available at http://www.compbio.dundee.ac.uk/1433pred. Human candidate 14-3-3-binding phosphosites were integrated in ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome database.

Contact: cmackintosh@dundee.ac.uk or gjbarton@dundee.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

模式识别方法与时间配置文件与表型的基因表达数据:比较研究

Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic endpoints (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data.

Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.

Availability: The Matlab code for the simulations performed in this research is available in the Supplementary Data (Word file). The microarray data analysed in this paper are available at ArrayExpress (E-TOXM-22 and E-TOXM-23) and Gene Expression Omnibus (GSE39291). The phenotypic data are available in the Supplementary Data (Excel file). Links to the pattern recognition tools compared in this paper are provided in the main text.

Contact: d.hendrickx@maastrichtuniversity.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Detection of significant protein coevolution

检测重要的蛋白质共同进化

Motivation: The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals.

Results: In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information.

Availability and implementation: The software and datasets used in this work are freely available at: http://csbg.cnb.csic.es/pMT/.

Contact: pazos@cnb.csic.es

Supplementary Information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

A trimming-and-retrieving alignment scheme for reduced representation bisulfite sequencing

酸性亚硫酸盐测序trimming-and-retrieving对齐方案减少表示

Summary: Currently available bisulfite sequencing tools frequently suffer from low mapping rates and low methylation calls, especially for data generated from the Illumina sequencer, NextSeq. Here, we introduce a sequential trimming-and-retrieving alignment approach for investigating DNA methylation patterns, which significantly improves the number of mapped reads and covered CpG sites. The method is implemented in an automated analysis toolkit for processing bisulfite sequencing reads.

Availability and implementation: http://mysbfiles.stonybrook.edu/~xuefenwang/software.html and https://github.com/xfwang/BStools.

Contact: xuefeng.wang@stonybrook.edu

Supplementary information: Supplementary materials are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

GlycoProfileAssigner: automated structural assignment with error estimation for glycan LC data

GlycoProfileAssigner:自动化结构多糖LC数据赋值与误差估计

Motivation: Sequencing glycan structures is a difficult problem that requires the use of multiple experimental approaches. One powerful approach to glycan sequencing is the combination of liquid chromatography with sequential exoglycosidase digestions; however, interpreting this can be difficult and time-consuming. To aid this process, we introduce GlycoProfileAssigner, software for automated structural assignment of glycan profile data from liquid chromatography experiments.

Results: GlycoProfileAssigner has been tested on human IgG data, and can retrieve the correct structure in 14 out of 16 peaks tested.

Availability and Implementation: The programme and its source code is available at https://bitbucket.org/fergaljd/glycoprofileassigner

Contact: pauline.rudd@nibrt.ie

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Mammalian genome evolution is governed by multiple pacemakers

哺乳动物基因组进化是由多个心脏起搏器

Genomic evolution is shaped by a dynamic combination of mutation, selection and genetic drift. These processes lead to evolutionary rate variation across loci and among lineages. In turn, interactions between these two forms of rate variation can produce residual effects, whereby the pattern of among-lineage rate heterogeneity varies across loci. The nature of rate variation is encapsulated in the pacemaker models of genome evolution, which differ in the degree of importance assigned to residual effects: none (Universal Pacemaker), some (Multiple Pacemaker) or total (Degenerate Multiple Pacemaker). Here we use a phylogenetic method to partition the rate variation across loci, allowing comparison of these pacemaker models. Our analysis of 431 genes from 29 mammalian taxa reveals that rate variation across these genes can be explained by 13 pacemakers, consistent with the Multiple Pacemaker model. We find no evidence that these pacemakers correspond to gene function. Our results have important consequences for understanding the factors driving genomic evolution and for molecular-clock analyses.

Availability and implementation: ClockstaR-G is freely available for download from github (https://github.com/sebastianduchene/clockstarg).

Contact: simon.ho@sydney.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • DISCOVERY NOTE

Application of clinical text data for phenome-wide association studies (PheWASs)

应用临床phenome-wide协会研究文本数据(PheWASs)

Motivation: Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery.

Results: As an alternative to ICD9 coding, a text-based phenome was defined by 23 384 clinically relevant terms extracted from Marshfield Clinic’s EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P < 0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.

Contact: hebbring.scott@mcrf.mfldclin.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis

RepExplore:寻址技术复制差异蛋白质组学和代谢组学数据分析

Summary: High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.

We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics.

Availability and implementation: Freely available at http://www.repexplore.tk

Contact: enrico.glaab@uni.lu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Population-scale three-dimensional reconstruction and quantitative profiling of microglia arbors

人口规模小神经胶质细胞的三维重建、定量分析乔木

Motivation: The arbor morphologies of brain microglia are important indicators of cell activation. This article fills the need for accurate, robust, adaptive and scalable methods for reconstructing 3-D microglial arbors and quantitatively mapping microglia activation states over extended brain tissue regions.

Results: Thick rat brain sections (100–300 µm) were multiplex immunolabeled for IBA1 and Hoechst, and imaged by step-and-image confocal microscopy with automated 3-D image mosaicing, producing seamless images of extended brain regions (e.g. 5903 x 9874 x 229 voxels). An over-complete dictionary-based model was learned for the image-specific local structure of microglial processes. The microglial arbors were reconstructed seamlessly using an automated and scalable algorithm that exploits microglia-specific constraints. This method detected 80.1 and 92.8% more centered arbor points, and 53.5 and 55.5% fewer spurious points than existing vesselness and LoG-based methods, respectively, and the traces were 13.1 and 15.5% more accurate based on the DIADEM metric. The arbor morphologies were quantified using Scorcioni’s L-measure. Coifman’s harmonic co-clustering revealed four morphologically distinct classes that concord with known microglia activation patterns. This enabled us to map spatial distributions of microglial activation and cell abundances.

Availability and implementation: Experimental protocols, sample datasets, scalable open-source multi-threaded software implementation (C++, MATLAB) in the electronic supplement, and website (www.farsight-toolkit.org). http://www.farsight-toolkit.org/wiki/Population-scale_Three-dimensional_Reconstruction_and_Quanti-tative_Profiling_of_Microglia_Arbors

Contact: broysam@central.uh.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Comprehensive data resources and analytical tools for pathological association of aminoacyl tRNA synthetases with cancer

对氨基酰-tRNA合成酶与肿瘤病理协会综合数据资源和分析工具

Mammalian cells have cytoplasmic and mitochondrial aminoacyl-tRNA synthetases (ARSs) that catalyze aminoacylation of tRNAs during protein synthesis. Despite their housekeeping functions in protein synthesis, recently, ARSs and ARS-interacting multifunctional proteins (AIMPs) have been shown to play important roles in disease pathogenesis through their interactions with disease-related molecules. However, there are lacks of data resources and analytical tools that can be used to examine disease associations of ARS/AIMPs. Here, we developed an Integrated Database for ARSs (IDA), a resource database including cancer genomic/proteomic and interaction data of ARS/AIMPs. IDA includes mRNA expression, somatic mutation, copy number variation and phosphorylation data of ARS/AIMPs and their interacting proteins in various cancers. IDA further includes an array of analytical tools for exploration of disease association of ARS/AIMPs, identification of disease-associated ARS/AIMP interactors and reconstruction of ARS-dependent disease-perturbed network models. Therefore, IDA provides both comprehensive data resources and analytical tools for understanding potential roles of ARS/AIMPs in cancers.

Database URL: http://ida.biocon.re.kr/, http://ars.biocon.re.kr/

[详细]

  • Database
  • 9年前
  • Original Article

BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis

bioxpress:RNA序列综合推导分析泛癌基因表达数据库

BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database. All cancer types are mapped to Disease Ontology terms to facilitate a uniform pan-cancer analysis. The BioXpress database is easily searched using HUGO Gene Nomenclature Committee gene symbol, UniProtKB/RefSeq accession or, alternatively, can be queried by cancer type with specified significance filters. This interface along with availability of pre-computed downloadable files containing differentially expressed genes in multiple cancers enables straightforward retrieval and display of a broad set of cancer-related genes.

Database URL: http://hive.biochemistry.gwu.edu/tools/bioxpress

[详细]

  • Database
  • 9年前
  • Database Tool

The Listeria monocytogenes strain 10403S BioCyc database

单核细胞增生李斯特菌的菌株BioCyc数据库

Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma () factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database.

Database URL: http://biocyc.org/organism-summary?object=10403S_RAST

[详细]

  • Database
  • 9年前
  • Database Tool

IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences

intfold:模拟蛋白质结构与功能的氨基酸序列集成服务器

IntFOLD is an independent web server that integrates our leading methods for structure and function prediction. The server provides a simple unified interface that aims to make complex protein modelling data more accessible to life scientists. The server web interface is designed to be intuitive and integrates a complex set of quantitative data, so that 3D modelling results can be viewed on a single page and interpreted by non-expert modellers at a glance. The only required input to the server is an amino acid sequence for the target protein. Here we describe major performance and user interface updates to the server, which comprises an integrated pipeline of methods for: tertiary structure prediction, global and local 3D model quality assessment, disorder prediction, structural domain prediction, function prediction and modelling of protein-ligand interactions. The server has been independently validated during numerous CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments, as well as being continuously evaluated by the CAMEO (Continuous Automated Model Evaluation) project. The IntFOLD server is available at: http://www.reading.ac.uk/bioinf/IntFOLD/

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server Issue

New insights into the performance of human whole-exome capture platforms

为人类全基因组外显子捕获平台性能的新见解

Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online