Predicting internal cell fluxes at sub-optimal growth

内部细胞通量预测最优的增长

Background: Flux Balance Analysis (FBA) is a widely used tool to model metabolic behavior and cellular function. Applications of FBA span a breadth of research from synthetic engineering of biofuels to understanding evolutionary adaptations. FBA predicts metabolic reaction fluxes that optimize a given objective. This objective is generally defined for unicellular organisms by a theoretical reaction which simulates biomass production. FBA has been extremely successful at predicting in E. coli growth rates under different media and gene essentiality, amongst other things. In order to improve predictions, additional constraints are coupled with optimization of the biomass function. Studies have suggested, however, that unicellular organisms - like multicellular organisms - do not grow at optimal rates. To further improve FBA predictions, particularly of internal cell fluxes, new techniques to explore the sub-optimal solution space need to be developed. Results: We present an innovative FBA method called corsoFBA based on the optimization of protein cost at sub-optimal objective levels. Our method shows good agreement with experimental data of E. coli grown at different dilution rates. Maintaining the objective function close to its maximum value predicts metabolic states that closely resemble low dilution rates; while higher dilution rates can be mirrored by lowering the biomass production value. By using a modified version of Extreme Pathways, we are also able to quantify the energy production and overall protein cost for all possible pathways in the central carbon metabolism. Conclusion: Metabolic flux distributions at the optimal objective can be substantially different from the near-optimal distributions. Importantly, the behavior of E. coli central carbon metabolism can be better predicted by exploring the sub-optimal FBA solution space. The corsoFBA method presented here is able to predict the behavior of PEP Carboxylase, the glyoxylate shunt and the Entner-Doudoroff pathway at different glucose levels, a behavior not predicted by the minimization of metabolic steps and FBA alone. This technique can be used to better predict internal cell fluxes under different conditions, and corsoFBA will be of great help for the study of cells from multicellular organisms using Flux Balance Analysis.

[详细]

  • BMC Systems Biology 2015, null:18
  • 9年前

SplicePie: a novel analytical approach for the detection of alternative, non-sequential and recursive splicing

splicepie:替代检测的一种新的分析方法,非连续的、递归的拼接

Alternative splicing is a powerful mechanism present in eukaryotic cells to obtain a wide range of transcripts and protein isoforms from a relatively small number of genes. The mechanisms regulating (alternative) splicing and the paradigm of consecutive splicing have recently been challenged, especially for genes with a large number of introns. RNA-Seq, a powerful technology using deep sequencing in order to determine transcript structure and expression levels, is usually performed on mature mRNA, therefore not allowing detailed analysis of splicing progression. Sequencing pre-mRNA at different stages of splicing potentially provides insight into mRNA maturation. Although the number of tools that analyze total and cytoplasmic RNA in order to elucidate the transcriptome composition is rapidly growing, there are no tools specifically designed for the analysis of nuclear RNA (which contains mixtures of pre- and mature mRNA). We developed dedicated algorithms to investigate the splicing process. In this paper, we present a new classification of RNA-Seq reads based on three major stages of splicing: pre-, intermediate- and post-splicing. Applying this novel classification we demonstrate the possibility to analyze the order of splicing. Furthermore, we uncover the potential to investigate the multi-step nature of splicing, assessing various types of recursive splicing events. We provide the data that gives biological insight into the order of splicing, show that non-sequential splicing of certain introns is reproducible and coinciding in multiple cell lines. We validated our observations with independent experimental technologies and showed the reliability of our method. The pipeline, named SplicePie, is freely available at: https://github.com/pulyakhina/splicing_analysis_pipeline. The example data can be found at: https://barmsijs.lumc.nl/HG/irina/example_data.tar.gz.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Ancient Duplications and Expression Divergence in the Globin Gene Superfamily of Vertebrates: Insights from the Elephant Shark Genome and Transcriptome

在珠蛋白基因家族的脊椎动物古复制和表达分歧:从象鲨的基因组和转录组的见解

Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about ancestral functions of vertebrate globins.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Research Article

Neighboring Genes Show Correlated Evolution in Gene Expression

邻近基因的基因表达相关的进化

When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Research Article

An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors

拟南芥转录图揭示了转录因子不同的功能和进化特征

Transcription factors (TFs) play key roles in both development and stress responses. By integrating into and rewiring original systems, novel TFs contribute significantly to the evolution of transcriptional regulatory networks. Here, we report a high-confidence transcriptional regulatory map covering 388 TFs from 47 families in Arabidopsis. Systematic analysis of this map revealed the architectural heterogeneity of developmental and stress response subnetworks and identified three types of novel network motifs that are absent from unicellular organisms and essential for multicellular development. Moreover, TFs of novel families that emerged during plant landing present higher binding specificities and are preferentially wired into developmental processes and these novel network motifs. Further unveiled connection between the binding specificity and wiring preference of TFs explains the wiring preferences of novel-family TFs. These results reveal distinct functional and evolutionary features of novel TFs, suggesting a plausible mechanism for their contribution to the evolution of multicellular organisms.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Letter

Next-generation sequencing reveals the biological significance of the N2,3-ethenoguanine lesion in vivo

下一代测序揭示了体内的n2,3-ethenoguanine病变的生物学意义

Etheno DNA adducts are a prevalent type of DNA damage caused by vinyl chloride (VC) exposure and oxidative stress. Etheno adducts are mutagenic and may contribute to the initiation of several pathologies; thus, elucidating the pathways by which they induce cellular transformation is critical. Although N2,3-ethenoguanine (N2,3-G) is the most abundant etheno adduct, its biological consequences have not been well characterized in cells due to its labile glycosidic bond. Here, a stabilized 2'-fluoro-2'-deoxyribose analog of N2,3-G was used to quantify directly its genotoxicity and mutagenicity. A multiplex method involving next-generation sequencing enabled a large-scale in vivo analysis, in which both N2,3-G and its isomer 1,N2-ethenoguanine (1,N2-G) were evaluated in various repair and replication backgrounds. We found that N2,3-G potently induces G to A transitions, the same mutation previously observed in VC-associated tumors. By contrast, 1,N2-G induces various substitutions and frameshifts. We also found that N2,3-G is the only etheno lesion that cannot be repaired by AlkB, which partially explains its persistence. Both G lesions are strong replication blocks and DinB, a translesion polymerase, facilitates the mutagenic bypass of both lesions. Collectively, our results indicate that N2,3-G is a biologically important lesion and may have a functional role in VC-induced or inflammation-driven carcinogenesis.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Genome integrity, repair and replication

Estimating the proportion of true null hypotheses when the statistics are discrete

估计真正的零假设的比例时,统计数据是离散的

Motivation: In high-dimensional testing problems 0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating 0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems.

Results: This article introduces a number of 0 estimators, the regression and ‘T’ methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data.

Availability and implementation: implemented in R

Contact: nsa1@psu.edu or naomi@psu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Multispecies Analysis of Expression Pattern Diversification in the Recently Expanded Insect Ly6 Gene Family

在最近扩大昆虫LY6基因家族的表达方式多元化的多组分分析

Gene families often consist of members with diverse expression domains reflecting their functions in a wide variety of tissues. However, how the expression of individual members, and thus their tissue-specific functions, diversified during the course of gene family expansion is not well understood. In this study, we approached this question through the analysis of the duplication history and transcriptional evolution of a rapidly expanding subfamily of insect Ly6 genes. We analyzed different insect genomes and identified seven Ly6 genes that have originated from a single ancestor through sequential duplication within the higher Diptera. We then determined how the original embryonic expression pattern of the founding gene diversified by characterizing its tissue-specific expression in the beetle Tribolium castaneum, the butterfly Bicyclus anynana, and the mosquito Anopheles stephensi and those of its duplicates in three higher dipteran species, representing various stages of the duplication history (Megaselia abdita, Ceratitis capitata, and Drosophila melanogaster). Our results revealed that frequent neofunctionalization episodes contributed to the increased expression breadth of this subfamily and that these events occurred after duplication and speciation events at comparable frequencies. In addition, at each duplication node, we consistently found asymmetric expression divergence. One paralog inherited most of the tissue-specificities of the founder gene, whereas the other paralog evolved drastically reduced expression domains. Our approach attests to the power of combining a well-established duplication history with a comprehensive coverage of representative species in acquiring unequivocal information about the dynamics of gene expression evolution in gene families.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Research Article

VisualCNA: a GUI for interactive constraint network analysis and protein engineering for improving thermostability

VisualCNA: a GUI for interactive constraint network analysis and protein engineering for improving thermostability

Summary: Constraint network analysis (CNA) is a graph theory-based rigidity analysis approach for linking a biomolecule’s structure, flexibility, (thermo)stability and function. Results from CNA are highly information-rich and require intuitive, synchronized and interactive visualization for a comprehensive analysis. We developed VisualCNA, an easy-to-use PyMOL plug-in that allows setup of CNA runs and analysis of CNA results linking plots with molecular graphics representations. From a practical viewpoint, the most striking feature of VisualCNA is that it facilitates interactive protein engineering aimed at improving thermostability.

Availability and Implementation: VisualCNA and its dependencies (CNA and FIRST software) are available free of charge under GPL and academic licenses, respectively. VisualCNA and CNA are available at http://cpclab.uni-duesseldorf.de/software; FIRST is available at http://flexweb.asu.edu.

Contact: gohlke@uni-duesseldorf.de

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

IVA: accurate de novo assembly of RNA virus genomes

IVA:准确的新创RNA病毒基因组的组装

Motivation: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods.

Results: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers.

Availability and implementation: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva

Contact: iva@sanger.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Guidance for RNA-seq co-expression network construction and analysis: safety in numbers

指导RNA-seq co-expression网络建设和分析:安全

Motivation: RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks.

Results: We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ~0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain ‘gold-standard’ co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology.

Contact: jgillis@cshl.edu or sballouz@cshl.edu

Supplementary information: Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles

EW_dmGWAS:edge-weighted密集的模块搜索全基因组关联研究和基因表达

Summary: We previously developed dmGWAS to search for dense modules in a human protein–protein interaction (PPI) network; it has since become a popular tool for network-assisted analysis of genome-wide association studies (GWAS). dmGWAS weights nodes by using GWAS signals. Here, we introduce an upgraded algorithm, EW_dmGWAS, to boost GWAS signals in a node- and edge-weighted PPI network. In EW_dmGWAS, we utilize condition-specific gene expression profiles for edge weights. Specifically, differential gene co-expression is used to infer the edge weights. We applied EW_dmGWAS to two diseases and compared it with other relevant methods. The results suggest that EW_dmGWAS is more powerful in detecting disease-associated signals.

Availability and implementation: The algorithm of EW_dmGWAS is implemented in the R package dmGWAS_3.0 and is available at http://bioinfo.mc.vanderbilt.edu/dmGWAS.

Contact: zhongming.zhao@vanderbilt.edu or peilin.jia@vanderbilt.edu

Supplementary information: Supplementary materials are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Hi-Jack: a novel computational framework for pathway-based inference of host-pathogen interactions

嗨杰克:小说pathway-based推理寄主——病原体相互作用的计算框架

Motivation: Pathogens infect their host and hijack the host machinery to produce more progeny pathogens. Obligate intracellular pathogens, in particular, require resources of the host to replicate. Therefore, infections by these pathogens lead to alterations in the metabolism of the host, shifting in favor of pathogen protein production. Some computational identification of mechanisms of host–pathogen interactions have been proposed, but it seems the problem has yet to be approached from the metabolite-hijacking angle.

Results: We propose a novel computational framework, Hi-Jack, for inferring pathway-based interactions between a host and a pathogen that relies on the idea of metabolite hijacking. Hi-Jack searches metabolic network data from hosts and pathogens, and identifies candidate reactions where hijacking occurs. A novel scoring function ranks candidate hijacked reactions and identifies pathways in the host that interact with pathways in the pathogen, as well as the associated frequent hijacked metabolites. We also describe host–pathogen interaction principles that can be used in the future for subsequent studies. Our case study on Mycobacterium tuberculosis (Mtb) revealed pathways in human—e.g. carbohydrate metabolism, lipids metabolism and pathways related to amino acids metabolism—that are likely to be hijacked by the pathogen. In addition, we report interesting potential pathway interconnections between human and Mtb such as linkage of human fatty acid biosynthesis with Mtb biosynthesis of unsaturated fatty acids, or linkage of human pentose phosphate pathway with lipopolysaccharide biosynthesis in Mtb.

Availability and implementation: Datasets and codes are available at http://cloud.kaust.edu.sa/Pages/Hi-Jack.aspx

Contact: Dimitrios.Kleftogiannis@kaust.edu.sa

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

GeLL: a generalized likelihood library for phylogenetic models

GeLL:广义图书馆系统模型的可能性

Summary: Phylogenetic models are an important tool in molecular evolution allowing us to study the pattern and rate of sequence change. The recent influx of new sequence data in the biosciences means that to address evolutionary questions, we need a means for rapid and easy model development and implementation. Here we present GeLL, a Java library that lets users use text to quickly and efficiently define novel forms of discrete data and create new substitution models that describe how those data change on a phylogeny. GeLL allows users to define general substitution models and data structures in a way that is not possible in other existing libraries, including mixture models and non-reversible models. Classes are provided for calculating likelihoods, optimizing model parameters and branch lengths, ancestral reconstruction and sequence simulation.

Availability and implementation: http://phylo.bio.ku.edu/GeLL under a GPL v3 license.

Contact: daniel.money@dal.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

MUGBAS: a species free gene-based programme suite for post-GWAS analysis

MUGBAS:一种免费的基于基因计划post-GWAS分析套件

Genome Wide Association Studies between molecular markers and phenotypes are now routinely run in model and non-model species. However, tools to estimate the probability of association of functional units (e.g. genes) containing multiple markers are not developed for species other than humans. Here we introduce MUGBAS (MUlti species Gene-Based Association Suite), software that estimates the P-value of a gene using information on annotation, single marker GWA results and genotype. The software is species and annotation independent, fast, highly parallelized and ready for high-density marker studies.

Availability and implementation: https://bitbucket.org/capemaster/mugbas

Contact: capemaster@gmail.com

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information

MSA-PAD:DNA多序列比对框架基于访问域包含了信息

Summary: Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.

Availability and Implementation: MSA-PAD is available as a web application (https://recasgateway.ba.infn.it/) and as two Taverna workflows corresponding to two alignment modes (Gene mode: http://www.myexperiment.org/workflows/4549.html; Genome Mode: http://www.myexperiment.org/workflows/4551.html).

Contact: g.pesole@ibbe.cnr.it

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Robust meta-analysis of gene expression using the elastic net

利用弹性网基因表达强大的Meta分析

Meta-analysis of gene expression has enabled numerous insights into biological systems, but current methods have several limitations. We developed a method to perform a meta-analysis using the elastic net, a powerful and versatile approach for classification and regression. To demonstrate the utility of our method, we conducted a meta-analysis of lung cancer gene expression based on publicly available data. Using 629 samples from five data sets, we trained a multinomial classifier to distinguish between four lung cancer subtypes. Our meta-analysis-derived classifier included 58 genes and achieved 91% accuracy on leave-one-study-out cross-validation and on three independent data sets. Our method makes meta-analysis of gene expression more systematic and expands the range of questions that a meta-analysis can be used to address. As the amount of publicly available gene expression data continues to grow, our method will be an effective tool to help distill these data into knowledge.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system

建设磷酸化相互作用网络文本挖掘的长篇文章efip系统使用

Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein–protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases.

Database URL: http://proteininformationresource.org/efip

[详细]

  • Database
  • 9年前
  • Original Article

CHOPIN: a web resource for the structural and functional proteome of Mycobacterium tuberculosis

萧邦:一个结构和功能的结核分枝杆菌蛋白质组网络资源

Tuberculosis kills more than a million people annually and presents increasingly high levels of resistance against current first line drugs. Structural information about Mycobacterium tuberculosis (Mtb) proteins is a valuable asset for the development of novel drugs and for understanding the biology of the bacterium; however, only about 10% of the ~4000 proteins have had their structures determined experimentally. The CHOPIN database assigns structural domains and generates homology models for 2911 sequences, corresponding to ~73% of the proteome. A sophisticated pipeline allows multiple models to be created using conformational states characteristic of different oligomeric states and ligand binding, such that the models reflect various functional states of the proteins. Additionally, CHOPIN includes structural analyses of mutations potentially associated with drug resistance. Results are made available at the web interface, which also serves as an automatically updated repository of all published Mtb experimental structures. Its RESTful interface allows direct and flexible access to structures and metadata via intuitive URLs, enabling easy programmatic use of the models.

Database URL: http://structure.bioc.cam.ac.uk/chopin

[详细]

  • Database
  • 9年前
  • Original Article

BioSurfDB: knowledge and algorithms to support biosurfactants and biodegradation studies

biosurfdb:知识和算法支持的生物表面活性剂和生物降解的研究

Crude oil extraction, transportation and use provoke the contamination of countless ecosystems. Therefore, bioremediation through surfactants mobilization or biodegradation is an important subject, both economically and environmentally. Bioremediation research had a great boost with the recent advances in Metagenomics, as it enabled the sequencing of uncultured microorganisms providing new insights on surfactant-producing and/or oil-degrading bacteria. Many research studies are making available genomic data from unknown organisms obtained from metagenomics analysis of oil-contaminated environmental samples. These new datasets are presently demanding the development of new tools and data repositories tailored for the biological analysis in a context of bioremediation data analysis. This work presents BioSurfDB, www.biosurfdb.org, a curated relational information system integrating data from: (i) metagenomes; (ii) organisms; (iii) biodegradation relevant genes; proteins and their metabolic pathways; (iv) bioremediation experiments results, with specific pollutants treatment efficiencies by surfactant producing organisms; and (v) a biosurfactant-curated list, grouped by producing organism, surfactant name, class and reference. The main goal of this repository is to gather information on the characterization of biological compounds and mechanisms involved in biosurfactant production and/or biodegradation and make it available in a curated way and associated with a number of computational tools to support studies of genomic and metagenomic data.

Database URL: www.biosurfdb.org

[详细]

  • Database
  • 9年前
  • Original Article

CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002

cyanomics:模型的蓝藻Synechococcus sp.PCC 7002平台集成数据库

Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics’s usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects.

Database URL: http://lag.ihb.ac.cn/cyanomics

[详细]

  • Database
  • 9年前
  • Original Article

Lateral Gene Transfer and Gene Duplication Played a Key Role in the Evolution of Mastigamoeba balamuthi Hydrogenosomes

横向基因转移和基因重复在变形鞭毛虫balamuthi氢化酶体进化的关键作用

Lateral gene transfer (LGT) is an important mechanism of evolution for protists adapting to oxygen-poor environments. Specifically, modifications of energy metabolism in anaerobic forms of mitochondria (e.g., hydrogenosomes) are likely to have been associated with gene transfer from prokaryotes. An interesting question is whether the products of transferred genes were directly targeted into the ancestral organelle or initially operated in the cytosol and subsequently acquired organelle-targeting sequences. Here, we identified key enzymes of hydrogenosomal metabolism in the free-living anaerobic amoebozoan Mastigamoeba balamuthi and analyzed their cellular localizations, enzymatic activities, and evolutionary histories. Additionally, we characterized 1) several canonical mitochondrial components including respiratory complex II and the glycine cleavage system, 2) enzymes associated with anaerobic energy metabolism, including an unusual D-lactate dehydrogenase and acetyl CoA synthase, and 3) a sulfate activation pathway. Intriguingly, components of anaerobic energy metabolism are present in at least two gene copies. For each component, one copy possesses an mitochondrial targeting sequence (MTS), whereas the other lacks an MTS, yielding parallel cytosolic and hydrogenosomal extended glycolysis pathways. Experimentally, we confirmed that the organelle targeting of several proteins is fully dependent on the MTS. Phylogenetic analysis of all extended glycolysis components suggested that these components were acquired by LGT. We propose that the transformation from an ancestral organelle to a hydrogenosome in the M. balamuthi lineage involved the lateral acquisition of genes encoding extended glycolysis enzymes that initially operated in the cytosol and that established a parallel hydrogenosomal pathway after gene duplication and MTS acquisition.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Discoveries

Reassessing the "Duon" Hypothesis of Protein Evolution

蛋白质进化的“duon”假说的重新评估

There are two distinct types of DNA sequences, namely coding sequences and regulatory sequences, in a genome. A recent study of the occupancy of transcription factors (TFs) in human cells suggested that protein-coding sequences also serve as the codes of TF occupancy, and proposed a "duon" hypothesis in which up to 15% of codons of human protein genes are constrained by the additional coding requirements that regulate gene expression. This hypothesis challenges our basic understanding on the human genome. We reanalyzed the data and found that the previous study was confounded by ascertainment bias related to base composition. Using an unbiased comparison in which G/C and A/T sites are considered separately, we reveal a similar level of conservation between TF-bound codons and TF-depleted codons, suggesting largely no extra purifying selection provided by the TF occupancy on the codons of human genes. Given the generally short binding motifs of TFs and the open chromatin structure during transcription, we argue that the occupancy of TFs on protein-coding sequences is mostly passive and evolutionarily neutral, with to-be-determined functions in the regulation of gene expression.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Discoveries