An integrated database of wood-formation related genes in plants

植物木材形成相关基因数据库的集成

Wood, which consists mainly of plant cell walls, is an extremely important resource in daily lives. Genes whose products participate in the processes of cell wall and wood formation are therefore major subjects of plant science research. The Wood-Formation Related Genes database (WFRGdb, http://me.lzu.edu.cn/woodformation/) serves as a data resource center for genes involved in wood formation. To create this database, we collected plant genome data published in other online databases and predicted all cell wall and wood formation related genes using BLAST and HMMER. To date, 47 gene families and 33 transcription factors from 57 genomes (28 herbaceous, 22 woody and 7 non-vascular plants) have been covered and more than 122,000 genes have been checked and recorded. To provide easy access to these data, we have developed several search methods, which make it easy to download targeted genes or groups of genes free of charge in FASTA format. Sequence and phylogenetic analyses are also available online. WFRGdb brings together cell wall and wood formation related genes from all available plant genomes, and provides an integrative platform for gene inquiry, downloading and analysis. This database will therefore be extremely useful for those who focuses on cell wall and wood research.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

The evolution and devolution of cognitive control: The costs of deliberation in a competitive world

与认知控制的权力下放的演变:在世界竞争中协商的成本

Dual-system theories of human cognition, under which fast automatic processes can complement or compete with slower deliberative processes, have not typically been incorporated into larger scale population models used in evolutionary biology, macroeconomics, or sociology. However, doing so may reveal important phenomena at the population level. Here, we introduce a novel model of the evolution of dual-system agents using a resource-consumption paradigm. By simulating agents with the capacity for both automatic and controlled processing, we illustrate how controlled processing may not always be selected over rigid, but rapid, automatic processing. Furthermore, even when controlled processing is advantageous, frequency-dependent effects may exist whereby the spread of control within the population undermines this advantage. As a result, the level of controlled processing in the population can oscillate persistently, or even go extinct in the long run. Our model illustrates how dual-system psychology can be incorporated into population-level evolutionary models, and how such a framework can be used to examine the dynamics of interaction between automatic and controlled processing that transpire over an evolutionary time scale.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

TPpred3 detects and discriminates mitochondrial and chloroplastic targeting peptides in Eukaryotic proteins

tppred3检测并对线粒体和叶绿体的真核蛋白靶向肽

Motivation: Molecular recognition of N-terminal targeting peptides is the most common mechanism controlling the import of nuclear-encoded proteins into mitochondria and chloroplasts. When experimental information is lacking, computational methods can annotate targeting peptides, and determine their cleavage sites for characterizing protein localization, function, and mature protein sequences. The problem of discriminating mitochondrial from chloroplastic propeptides is particularly relevant when annotating proteomes of photosynthetic Eukaryotes, endowed with both types of sequences.

Results: Here, we introduce TPpred3, a computational method that given any Eukaryotic protein sequence performs three different tasks: i) the detection of targeting peptides; ii) their classification as mitochondrial or chloroplastic, and iii) the precise localization of the cleavage sites in an organelle-specific framework. Our implementation is based on our TPpred previously introduced. Here we integrate a new N-to-1 Extreme Learning Machine specifically designed for the classification task (ii). For the last task, we introduce an organelle-specific Support Vector Machine that exploits sequence motifs retrieved with an extensive motif-discovery analysis of a large set of mitochondrial and chloroplastic proteins. We show that TPpred3 outperforms the state-of-the-art methods in all the three tasks.

Availability: The method server and datasets are available at http://tppred3.biocomp.unibo.it

Contact: gigi@biocomp.unibo.it

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Investigating microbial co-occurrence patterns based on metagenomic compositional data

在基于成分数据模式的共生微生物宏基因组

Motivation: The high-throughput sequencing technologies have provided a powerful tool to study the microbial organisms living in various environments. Characterizing microbial interactions can give us insights into how they live and work together as a community. Metagonomic data are usually summarized in a compositional fashion due to varying sampling/sequencing depths from one sample to another. We study the co-occurrence patterns of microbial organisms using their relative abundance information. Analyzing compositional data using conventional correlation methods has been shown prone to bias that leads to artifactual correlations.

Results: We propose a novel method, REBACCA, to identify significant co-occurrence patterns by finding sparse solutions to a system with a deficient rank. To be specific, we construct the system using log ratios of count data and solve the system using the l1-norm shrinkage method. Our comprehensive simulation studies show that REBACCA 1) achieves higher accuracy in general than the existing methods when a sparse condition is satisfied; 2) controls the false positives at a pre-specified level, while other methods fail in various cases; and 3) runs considerably faster than the existing comparable method. REBACCA is also applied to several real metagenomic datasets.

Availability: Availability: The R codes for the proposed method are available at http://faculty.wcas.northwestern.edu/~hji403/REBACCA.htm

Contact: hongmei@northwestern.edu

Supplementary information:

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

TENET: Topological Feature-based Target Characterization in Signaling Networks

宗旨:拓扑特征信号网络的基于目标特征

Motivation: Target characterization for a biochemical network is a heuristic evaluation process that produces a characterization model that may aid in predicting the suitability of each molecule for drug targeting. These approaches are typically used in drug research to identify novel potential targets using insights from known targets. Traditional approaches that characterize targets based on their molecular characteristics and biological function require extensive experimental study of each protein and are infeasible for evaluating larger networks with poorly-understood proteins. Moreover, they fail to exploit network connectivity information which is now available from systems biology methods. Adopting a network-based approach by characterizing targets using network features provides greater insights that complement these traditional techniques. To this end, we present TENET, a network-based approach that characterizes known targets in signaling networks using topological features.

Results: TENET first computes a set of topological features and then leverages a SVM-based approach to identify predictive topological features that characterizes known targets. A characterization model is generated and it specifies which topological features are important for discriminating the targets and how these features should be combined to quantify the likelihood of a node being a target. We empirically study the performance of TENET from a wide variety of aspects, using several signaling networks from BioModels with real-world curated outcomes. Results demonstrate its effectiveness and superiority in comparison to state-of-the-art approaches.

Availability and Implementation: Our software is available freely for non-commercial purposes from: https://sites.google.com/site/cosbyntu/softwares/tenet

Contact: hechua@ntu.edu.sg or assourav@ntu.edu.sg

Supplementary Information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

JSBML 1.0: providing a smorgasbord of options to encode systems biology models

1:jsbml编码系统生物学模型提供各种选项

Summary: JSBML, the official pure Java programming library for the SBML format, has evolved with the advent of different modeling formalisms in systems biology and their ability to be exchanged and represented via extensions of SBML. JSBML has matured into a major, active open-source project with contributions from a growing, international team of developers who not only maintain compatibility with SBML, but also drive steady improvements to the Java interface and promote ease-of-use with end users.

Availability: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML.

Supplementary Information: More information about JSBML can be found in the user guide at http://sbml.org/Software/JSBML/docs/.

Contact: jsbml-development@googlegroups.com

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Sex and parasites: genomic and transcriptomic analysis of Microbotryum lychnidis-dioicae, the biotrophic and plant-castrating anther smut fungus

性和寄生虫:基因组和转录组分析<它> Microbotryum lychnidis dioicae >,< /活体植物阉割的花药黑穗病菌

Background: The genus Microbotryum includes plant pathogenic fungi afflicting a wide variety of hosts with anther smut disease. Microbotryum lychnidis-dioicae infects Silene latifolia and replaces host pollen with fungal spores, exhibiting biotrophy and necrosis associated with altering plant development. Results: We determined the haploid genome sequence for M. lychnidis-dioicae and analyzed whole transcriptome data from plant infections and other stages of the fungal lifecycle, revealing the inventory and expression level of genes that facilitate pathogenic growth. Compared to related fungi, an expanded number of major facilitator superfamily transporters and secretory lipases were detected; lipase gene expression was found to be altered by exposure to lipid compounds, which signaled a switch to dikaryotic, pathogenic growth. In addition, while enzymes to digest cellulose, xylan, xyloglucan, and highly substituted forms of pectin were absent, along with depletion of peroxidases and superoxide dismutases that protect the fungus from oxidative stress, the repertoire of glycosyltransferases and of enzymes that could manipulate host development has expanded. A total of 14 % of the genome was categorized as repetitive sequences. Transposable elements have accumulated in mating-type chromosomal regions and were also associated across the genome with gene clusters of small secreted proteins, which may mediate host interactions. Conclusions: The unique absence of enzyme classes for plant cell wall degradation and maintenance of enzymes that break down components of pollen tubes and flowers provides a striking example of biotrophic host adaptation.

[详细]

  • BMC Genomics 2015, null:461
  • 9年前

Integrated web visualizations for protein-protein interaction databases

蛋白质-蛋白质相互作用数据库集成的Web可视化

Background: Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. Results: We selected M =10 out of N =53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Conclusions: Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

[详细]

  • BMC Bioinformatics 2015, null:195
  • 9年前

De novo assembly of the Carcinus maenas transcriptome and characterization of innate immune system pathways

<正>从头< / > <它>装配Carcinus maenas <它>转录组和表征的先天免疫系统的途径

Background: The European shore crab, Carcinus maenas, is used widely in biomonitoring, ecotoxicology and for studies into host-pathogen interactions. It is also an important invasive species in numerous global locations. However, the genomic resources for this organism are still sparse, limiting research progress in these fields. To address this resource shortfall we produced a C. maenas transcriptome, enabled by the progress in next-generation sequencing technologies, and applied this to assemble information on the innate immune system in this species. Results: We isolated and pooled RNA for twelve different tissues and organs from C. maenas individuals and sequenced the RNA using next generation sequencing on an Illumina HiSeq 2500 platform. After de novo assembly a transcriptome was generated encompassing 212,427 transcripts (153,699 loci). The transcripts were filtered, annotated and characterised using a variety of tools (including BLAST, MEGAN and RSEM) and databases (including NCBI, Gene Ontology and KEGG). There were differential patterns of expression for between 1,223 and 2,741 transcripts across tissues and organs with over-represented Gene Ontology terms relating to their specific function. Based on sequence homology to immune system components in other organisms, we show both the presence of transcripts for a series of known pathogen recognition receptors and response proteins that form part of the innate immune system, and transcripts representing the RNAi, Toll-like receptor signalling, IMD and JAK/STAT pathways. Conclusions: We have produced an assembled transcriptome for C. maenas that provides a significant molecular resource for wide ranging studies in this species. Analysis of the transcriptome has revealed the presence of a series of known targets and functional pathways that form part of their innate immune system and illustrate tissue specific differences in their expression patterns.

[详细]

  • BMC Genomics 2015, null:458
  • 9年前

Layering genetic circuits to build a single cell, bacterial half adder

分层遗传电路建立一个单细胞,细菌半加法器

Background: Gene regulation in biological systems is impacted by the cellular and genetic context-dependent effects of the biological parts which comprise the circuit. Here, we have sought to elucidate the limitations of engineering biology from an architectural point of view, with the aim of compiling a set of engineering solutions for overcoming failure modes during the development of complex, synthetic genetic circuits. Results: Using a synthetic biology approach that is supported by computational modelling and rigorous characterisation, AND, OR and NOT biological logic gates were layered in both parallel and serial arrangements to generate a repertoire of Boolean operations that include NIMPLY, XOR, half adder and half subtractor logics in single cell. Subsequent evaluation of these near-digital biological systems revealed critical design pitfalls that triggered genetic context dependent effects, including 5′ UTR interferences and uncontrolled switch-on behaviour of supercoiled σ54 promoter. In particular, the presence of seven consecutive hairpins immediately downstream of promoter transcription start site resulted in severe impediment of gene expression. Conclusions: As synthetic biology moves forward with greater focus on scaling the complexity of engineered genetic circuits, studies which thoroughly evaluate failure modes and engineering solutions will serve as important references for future design and development of synthetic biological systems. This work describes a representative case study to the debugging of genetic context dependent effects through principles elucidated herein, thereby providing a rational design framework to integrate multiple genetic circuits in a single prokaryotic cell.

[详细]

  • BMC Biology 2015, null:40
  • 9年前

Genome wide interactions of wild-type and activator bypass forms of {sigma}54

全基因组的野生型和{σ} 54旁路形式的相互作用

Enhancer-dependent transcription involving the promoter specificity factor 54 is widely distributed amongst bacteria and commonly associated with cell envelope function. For transcription initiation, 54-RNA polymerase yields open promoter complexes through its remodelling by cognate AAA+ ATPase activators. Since activators can be bypassed in vitro, bypass transcription in vivo could be a source of emergent gene expression along evolutionary pathways yielding new control networks and transcription patterns. At a single test promoter in vivo bypass transcription was not observed. We now use genome-wide transcription profiling, genome-wide mutagenesis and gene over-expression strategies in Escherichia coli, to (i) scope the range of bypass transcription in vivo and (ii) identify genes which might alter bypass transcription in vivo. We find little evidence for pervasive bypass transcription in vivo with only a small subset of 54 promoters functioning without activators. Results also suggest no one gene limits bypass transcription in vivo, arguing bypass transcription is strongly kept in check. Promoter sequences subject to repression by 54 were evident, indicating loss of rpoN (encoding 54) rather than creating rpoN bypass alleles would be one evolutionary route for new gene expression patterns. Finally, cold-shock promoters showed unusual 54-dependence in vivo not readily correlated with conventional 54 binding-sites.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Gene regulation, Chromatin and Epigenetics

RNASequel: accurate and repeat tolerant realignment of RNA-seq reads

rnasequel:精确和重复RNA seq宽容调整读取

RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

An ecosystem of cancer cell line factories to support a cancer dependency map

癌细胞工厂支持癌症依赖地图生态系统

Jesse Boehm and Todd Golub call for an international effort to establish >10,000 cancer cell line models as a community resource. Cancer cell line factories will facilitate the creation of a cancer dependency map, connecting cancer genomics to therapeutic dependencies.

[详细]

  • Nature Reviews Genetics 16, 373 (2015)
  • 9年前
  • Comment

Technology: A drop in single-cell challenges

工艺:在单挑战的一滴

Single-cell high-throughput RNA sequencing (scRNA-seq) approaches are providing unprecedented insights into the constituent cell types and intercellular heterogeneity underlying various tissues. However, one limitation is that the cost or labour intensiveness of current approaches for generating single-cell sequencing libraries using valve-based microfluidic chips or microplates

[详细]

  • Nature Reviews Genetics 16, 376 (2015)
  • 9年前
  • Research Highlight

Anchor-based classification and type-C inhibitors for tyrosine kinases

基于锚的酪氨酸激酶抑制剂的分类和C型

Tyrosine kinases regulate various biological processes and are drug targets for cancers. At present, the design of selective and anti-resistant inhibitors of kinases is an emergent task. Here, we inferred specific site-moiety maps containing two specific anchors to uncover a new binding pocket in the C-terminal hinge region by docking 4,680 kinase inhibitors into 51 protein kinases, and this finding provides an opportunity for the development of kinase inhibitors with high selectivity and anti-drug resistance. We present an anchor-based classification for tyrosine kinases and discover two type-C inhibitors, namely rosmarinic acid (RA) and EGCG, which occupy two and one specific anchors, respectively, by screening 118,759 natural compounds. Our profiling reveals that RA and EGCG selectively inhibit 3% (EGFR and SYK) and 14% of 64 kinases, respectively. According to the guide of our anchor model, we synthesized three RA derivatives with better potency. These type-C inhibitors are able to maintain activities for drug-resistant EGFR and decrease the invasion ability of breast cancer cells. Our results show that the type-C inhibitors occupying a new pocket are promising for cancer treatments due to their kinase selectivity and anti-drug resistance.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Towards a molecular understanding of microRNA-mediated gene silencing

对在分子水平上了解microRNA介导的基因沉默

MicroRNAs (miRNAs) are a conserved class of small non-coding RNAs that assemble with Argonaute proteins into miRNA-induced silencing complexes (miRISCs) to direct post-transcriptional silencing of complementary mRNA targets. Silencing is accomplished through a combination of translational repression and mRNA destabilization, with the latter contributing to

[详细]

  • Nature Reviews Genetics 16, 421 (2015)
  • 9年前
  • Review

Identification of the gene defect responsible for severe hypercholesterolaemia using whole-exome sequencing

该基因缺陷导致严重高胆固醇血症通过全外显子测序鉴定

Familial hypercholesterolaemia (FH) is a serious genetic metabolic disease. We identified a specific family in which the proband had typical homozygous phenotype of FH, but couldn’t detect any mutations in usual pathogenic genes using traditional sequencing. This study is the first attempt to use whole exome sequencing (WES) to identify the pathogenic genes in Chinese FH. The routine examinations were performed on all parentage members, and WES on 5 members. We used bioinformatics methods to splice and filter out the pathogenic gene. Finally, Sanger sequencing and cDNA sequencing were used to verify the candidate genes. Half of parentage members had got hypercholesterolaemia. WES identified LDLR IVS8[−10] as a candidate mutation from 222,267 variations. The Sanger sequencing showed proband had a homozygous mutation inherited from his parents, and this loci were cosegregated with FH phenotype. The cDNA sequencing revealed that this mutations caused abnormal shearing. This mutation was first identified in Chinese patients, and this homozygous mutation is a new genetic type of FH. This is the first time that WES was used in Chinese FH patients. We detected a novel genetic type of LDLR homozygous mutation. WES is powerful tools to identify specific FH families with potentially pathogenic gene mutations.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Collembolan Transcriptomes Highlight Molecular Evolution of Hexapods and Provide Clues on the Adaptation to Terrestrial Life

跳虫转录突出形分子进化和适应陆地生活提供线索

by A. Faddeeva, R. A. Studer, K. Kraaijeveld, D. Sie, B. Ylstra, J. Mariën, H. J. M. op den Camp, E. Datema, J. T. den Dunnen, N. M. van Straalen, D. Roelofs

Background

Collembola (springtails) represent a soil-living lineage of hexapods in between insects and crustaceans. Consequently, their genomes may hold key information on the early processes leading to evolution of Hexapoda from a crustacean ancestor.

Method

We assembled and annotated transcriptomes of the Collembola Folsomia candida and Orchesella cincta, and performed comparative analysis with protein-coding gene sequences of three crustaceans and three insects to identify adaptive signatures associated with the evolution of hexapods within the pancrustacean clade.

Results

Assembly of the springtail transcriptomes resulted in 37,730 transcripts with predicted open reading frames for F. candida and 32,154 for O. cincta, of which 34.2% were functionally annotated for F. candida and 38.4% for O. cincta. Subsequently, we predicted orthologous clusters among eight species and applied the branch-site test to detect episodic positive selection in the Hexapoda and Collembola lineages. A subset of 250 genes showed significant positive selection along the Hexapoda branch and 57 in the Collembola lineage. Gene Ontology categories enriched in these genes include metabolism, stress response (i.e. DNA repair, immune response), ion transport, ATP metabolism, regulation and development-related processes (i.e. eye development, neurological development).

Conclusions

We suggest that the identified gene families represent processes that have played a key role in the divergence of hexapods within the pancrustacean clade that eventually evolved into the most species-rich group of all animals, the hexapods. Furthermore, some adaptive signatures in collembolans may provide valuable clues to understand evolution of hexapods on land.

[详细]

  • PloS one
  • 9年前

Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures

新型ncRNA序列在多个基因组比对,对保守和稳定的二级结构的基础上发现的

by Yinghan Fu, Zhenjiang Zech Xu, Zhi J. Lu, Shan Zhao, David H. Mathews

Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary.

[详细]

  • PloS one
  • 9年前

Defining the Human Brain Proteome Using Transcriptomics and Antibody-Based Profiling with a Focus on the Cerebral Cortex

定义人类脑蛋白质组学、抗体为基础的分析与利用对大脑皮层的焦点

by Evelina Sjöstedt, Linn Fagerberg, Björn M. Hallström, Anna Häggmark, Nicholas Mitsios, Peter Nilsson, Fredrik Pontén, Tomas Hökfelt, Mathias Uhlén, Jan Mulder

The mammalian brain is a complex organ composed of many specialized cells, harboring sets of both common, widely distributed, as well as specialized and discretely localized proteins. Here we focus on the human brain, utilizing transcriptomics and public available Human Protein Atlas (HPA) data to analyze brain-enriched (frontal cortex) polyadenylated messenger RNA and long non-coding RNA and generate a genome-wide draft of global and cellular expression patterns of the brain. Based on transcriptomics analysis of altogether 27 tissues, we have estimated that approximately 3% (n=571) of all protein coding genes and 13% (n=87) of the long non-coding genes expressed in the human brain are enriched, having at least five times higher expression levels in brain as compared to any of the other analyzed peripheral tissues. Based on gene ontology analysis and detailed annotation using antibody-based tissue micro array analysis of the corresponding proteins, we found the majority of brain-enriched protein coding genes to be expressed in astrocytes, oligodendrocytes or in neurons with molecular properties linked to synaptic transmission and brain development. Detailed analysis of the transcripts and the genetic landscape of brain-enriched coding and non-coding genes revealed brain-enriched splice variants. Several clusters of neighboring brain-enriched genes were also identified, suggesting regulation of gene expression on the chromatin level. This multi-angle approach uncovered the brain-enriched transcriptome and linked genes to cell types and functions, providing novel insights into the molecular foundation of this highly specialized organ.

[详细]

  • PloS one
  • 9年前

'SEEDY' (Simulation of Evolutionary and Epidemiological Dynamics): An R Package to Follow Accumulation of Within-Host Mutation in Pathogens

“肮脏的”(进化和流行病学动力学模拟):R包遵循积累突变的病原体在宿主

by Colin J. Worby, Timothy D. Read

Genome sequencing is an increasingly common component of infectious disease outbreak investigations. However, the relationship between pathogen transmission and observed genetic data is complex, and dependent on several uncertain factors. As such, simulation of pathogen dynamics is an important tool for interpreting observed genomic data in an infectious disease outbreak setting, in order to test hypotheses and to explore the range of outcomes consistent with a given set of parameters. We introduce ‘seedy’, an R package for the simulation of evolutionary and epidemiological dynamics (http://cran.r-project.org/web/packages/seedy/). Our software implements stochastic models for the accumulation of mutations within hosts, as well as individual-level disease transmission. By allowing variables such as the transmission bottleneck size, within-host effective population size and population mixing rates to be specified by the user, our package offers a flexible framework to investigate evolutionary dynamics during disease outbreaks. Furthermore, our software provides theoretical pairwise genetic distance distributions to provide a likelihood of person-to-person transmission based on genomic observations, and using this framework, implements transmission route assessment for genomic data collected during an outbreak. Our open source software provides an accessible platform for users to explore pathogen evolution and outbreak dynamics via simulation, and offers tools to assess observed genomic data in this context.

[详细]

  • PloS one
  • 9年前

Combined Targeted DNA Sequencing in Non-Small Cell Lung Cancer (NSCLC) Using UNCseq and NGScopy, and RNA Sequencing Using UNCqeR for the Detection of Genetic Aberrations in NSCLC

联合靶向DNA测序在非小细胞肺癌(NSCLC)使用uncseq和ngscopy RNA,并用uncqer非小细胞肺癌的遗传畸变检测测序

by Xiaobei Zhao, Anyou Wang, Vonn Walter, Nirali M. Patel, David A. Eberhard, Michele C. Hayward, Ashley H. Salazar, Heejoon Jo, Matthew G. Soloway, Matthew D. Wilkerson, Joel S. Parker, Xiaoying Yin, Guosheng Zhang, Marni B. Siegel, Gary B. Rosson, H. Shelton Earp, Norman E. Sharpless, Margaret L. Gulley, Karen E. Weck, D. Neil Hayes, Stergios J. Moschos

The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS) panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV) as well as small insertions and deletions (indel). In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV), similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07–0120 tissue cohort) and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11–1115 tissue cohort) and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion.

[详细]

  • PloS one
  • 9年前

Exploring Differentially Expressed Genes and Natural Antisense Transcripts in Sheep (Ovis aries) Skin with Different Wool Fiber Diameters by Digital Gene Expression Profiling

在羊的差异表达基因和天然反义转录探索(绵羊)与不同的羊毛纤维直径的数字基因表达谱的皮肤

by Yaojing Yue, Tingting Guo, Jianbin Liu, Jian Guo, Chao Yuan, Ruilin Feng, Chune Niu, Xiaoping Sun, Bohui Yang

Wool fiber diameter (WFD) is the most important economic trait of wool. However, the genes specifically controlling WFD remain elusive. In this study, the expression profiles of skin from two groups of Gansu Alpine merino sheep with different WFD (a super-fine wool group [FD = 18.0 ± 0.5 μm, n= 3] and a fine wool group [FD=23.0±0.5μm, n=3]) were analyzed using next-generation sequencing–based digital gene expression profiling. A total of 40 significant differentially expressed genes (DEGs) were detected, including 9 up-regulated genes and 31 down-regulated genes. Further expression profile analysis of natural antisense transcripts (NATs) showed that more than 30% of the genes presented in sheep skin expression profiles had NATs. A total of 7 NATs with significant differential expression were detected, and all were down-regulated. Among of 40 DEGs, 3 DEGs (AQP8, Bos d2, and SPRR) had significant NATs which were all significantly down-regulated in the super-fine wool group. In total of DEGs and NATs were summarized as 3 main GO categories and 38 subcategories. Among the molecular functions, cellular components and biological processes categories, binding, cell part and metabolic process were the most dominant subcategories, respectively. However, no significant enrichment of GO terms was found (corrected P-value >0.05). The pathways that were significantly enriched with significant DEGs and NATs were mainly the lipoic acid metabolism, bile secretion, salivary secretion and ribosome and phenylalanine metabolism pathways (P < 0.05). The results indicated that expression of NATs and gene transcripts were correlated, suggesting a role in gene regulation. The discovery of these DEGs and NATs could facilitate enhanced selection for super-fine wool sheep through gene-assisted selection or targeted gene manipulation in the future.

[详细]

  • PloS one
  • 9年前