Structural and biochemical studies of the distinct activity profiles of Rai1 enzymes

对不同的活动概况RAI1酶的结构和生化研究

Recent studies showed that Rai1 and its homologs are a crucial component of the mRNA 5'-end capping quality control mechanism. They can possess RNA 5'-end pyrophosphohydrolase (PPH), decapping, and 5'-3' exonuclease (toward 5' monophosphate RNA) activities, which help to degrade mRNAs with incomplete 5'-end capping. A single active site in the enzyme supports these apparently distinct activities. However, each Rai1 protein studied so far has a unique set of activities, and the molecular basis for these differences are not known. Here, we have characterized the highly diverse activity profiles of Rai1 homologs from a collection of fungal organisms and identified a new activity for these enzymes, 5'-end triphosphonucleotide hydrolase (TPH) instead of PPH activity. Crystal structures of two of these enzymes bound to RNA oligonucleotides reveal differences in the RNA binding modes. Structure-based mutations of these enzymes, changing residues that contact the RNA but are poorly conserved, have substantial effects on their activity, providing a framework to begin to understand the molecular basis for the different activity profiles.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Structural Biology

ECHO-liveFISH: in vivo RNA labeling reveals dynamic regulation of nuclear RNA foci in living tissues

回声条活鱼:体内RNA标记揭示的核RNA灶动态调节活体组织

Elucidating the dynamic organization of nuclear RNA foci is important for understanding and manipulating these functional sites of gene expression in both physiological and pathological states. However, such studies have been difficult to establish in vivo as a result of the absence of suitable RNA imaging methods. Here, we describe a high-resolution fluorescence RNA imaging method, ECHO-liveFISH, to label endogenous nuclear RNA in living mice and chicks. Upon in vivo electroporation, exciton-controlled sequence-specific oligonucleotide probes revealed focally concentrated endogenous 28S rRNA and U3 snoRNA at nucleoli and poly(A) RNA at nuclear speckles. Time-lapse imaging reveals steady-state stability of these RNA foci and dynamic dissipation of 28S rRNA concentrations upon polymerase I inhibition in native brain tissue. Confirming the validity of this technique in a physiological context, the in vivo RNA labeling did not interfere with the function of target RNA nor cause noticeable cytotoxicity or perturbation of cellular behavior.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

The de-ubiquitylating enzymes USP26 and USP37 regulate homologous recombination by counteracting RAP80

德ubiquitylating USP26、usp37调节酶同源重组抵消RAP80

The faithful repair of DNA double-strand breaks (DSBs) is essential to safeguard genome stability. DSBs elicit a signaling cascade involving the E3 ubiquitin ligases RNF8/RNF168 and the ubiquitin-dependent assembly of the BRCA1-Abraxas-RAP80-MERIT40 complex. The association of BRCA1 with ubiquitin conjugates through RAP80 is known to be inhibitory to DSB repair by homologous recombination (HR). However, the precise regulation of this mechanism remains poorly understood. Through genetic screens we identified USP26 and USP37 as key de-ubiquitylating enzymes (DUBs) that limit the repressive impact of RNF8/RNF168 on HR. Both DUBs are recruited to DSBs where they actively remove RNF168-induced ubiquitin conjugates. Depletion of USP26 or USP37 disrupts the execution of HR and this effect is alleviated by the simultaneous depletion of RAP80. We demonstrate that USP26 and USP37 prevent excessive spreading of RAP80-BRCA1 from DSBs. On the other hand, we also found that USP26 and USP37 promote the efficient association of BRCA1 with PALB2. This suggests that these DUBs limit the ubiquitin-dependent sequestration of BRCA1 via the BRCA1-Abraxas-RAP80-MERIT40 complex, while promoting complex formation and cooperation of BRCA1 with PALB2-BRCA2-RAD51 during HR. These findings reveal a novel ubiquitin-dependent mechanism that regulates distinct BRCA1-containing complexes for efficient repair of DSBs by HR.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Genome Integrity, Repair and Replication

Post-translational environmental switch of RadA activity by extein-intein interactions in protein splicing

翻译后的环境切换RADA活性的蛋白相互作用蛋白质内含肽剪接

Post-translational control based on an environmentally sensitive intervening intein sequence is described. Inteins are invasive genetic elements that self-splice at the protein level from the flanking host protein, the exteins. Here we show in Escherichia coli and in vitro that splicing of the RadA intein located in the ATPase domain of the hyperthermophilic archaeon Pyrococcus horikoshii is strongly regulated by the native exteins, which lock the intein in an inactive state. High temperature or solution conditions can unlock the intein for full activity, as can remote extein point mutations. Notably, this splicing trap occurs through interactions between distant residues in the native exteins and the intein, in three-dimensional space. The exteins might thereby serve as an environmental sensor, releasing the intein for full activity only at optimal growth conditions for the native organism, while sparing ATP consumption under conditions of cold-shock. This partnership between the intein and its exteins, which implies coevolution of the parasitic intein and its host protein may provide a novel means of post-translational control.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Synthetic Biology and Bioengineering

BioJazz: in silico evolution of cellular networks with unbounded complexity using rule-based modeling

biojazz:蜂窝网络的无限复杂性采用规则建模硅片演化

Systems biologists aim to decipher the structure and dynamics of signaling and regulatory networks underpinning cellular responses; synthetic biologists can use this insight to alter existing networks or engineer de novo ones. Both tasks will benefit from an understanding of which structural and dynamic features of networks can emerge from evolutionary processes, through which intermediary steps these arise, and whether they embody general design principles. As natural evolution at the level of network dynamics is difficult to study, in silico evolution of network models can provide important insights. However, current tools used for in silico evolution of network dynamics are limited to ad hoc computer simulations and models. Here we introduce BioJazz, an extendable, user-friendly tool for simulating the evolution of dynamic biochemical networks. Unlike previous tools for in silico evolution, BioJazz allows for the evolution of cellular networks with unbounded complexity by combining rule-based modeling with an encoding of networks that is akin to a genome. We show that BioJazz can be used to implement biologically realistic selective pressures and allows exploration of the space of network architectures and dynamics that implement prescribed physiological functions. BioJazz is provided as an open-source tool to facilitate its further development and use. Source code and user manuals are available at: http://oss-lab.github.io/biojazz and http://osslab.lifesci.warwick.ac.uk/BioJazz.aspx.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

microRNAs and the evolution of complex multicellularity: identification of a large, diverse complement of microRNAs in the brown alga Ectocarpus

微小RNA与复杂的多细胞生物的进化:识别一个大,在褐藻水云microRNAs的多样化的补充

There is currently convincing evidence that microRNAs have evolved independently in at least six different eukaryotic lineages: animals, land plants, chlorophyte green algae, demosponges, slime molds and brown algae. MicroRNAs from different lineages are not homologous but some structural features are strongly conserved across the eukaryotic tree allowing the application of stringent criteria to identify novel microRNA loci. A large set of 63 microRNA families was identified in the brown alga Ectocarpus based on mapping of RNA-seq data and nine microRNAs were confirmed by northern blotting. The Ectocarpus microRNAs are highly diverse at the sequence level with few multi-gene families, and do not tend to occur in clusters but exhibit some highly conserved structural features such as the presence of a uracil at the first residue. No homologues of Ectocarpus microRNAs were found in other stramenopile genomes indicating that they emerged late in stramenopile evolution and are perhaps specific to the brown algae. The large number of microRNA loci in Ectocarpus is consistent with the developmental complexity of many brown algal species and supports a proposed link between the emergence and expansion of microRNA regulatory systems and the evolution of complex multicellularity.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Genomics

Engineered CRISPR-Cas9 nucleases with altered PAM specificities

工程改变PAM CRISPR-Cas9核酸酶的特异性

Although CRISPR-Cas9 nucleases are widely used for genome editing, the range of sequences that Cas9 can recognize is constrained by the need for a specific protospacer adjacent motif (PAM). As a result, it can often be difficult to target double-stranded breaks (DSBs) with the precision that is necessary for various genome-editing applications. The ability to engineer Cas9 derivatives with purposefully altered PAM specificities would address this limitation. Here we show that the commonly used Streptococcus pyogenes Cas9 (SpCas9) can be modified to recognize alternative PAM sequences using structural information, bacterial selection-based directed evolution, and combinatorial design. These altered PAM specificity variants enable robust editing of endogenous gene sites in zebrafish and human cells not currently targetable by wild-type SpCas9, and their genome-wide specificities are comparable to wild-type SpCas9 as judged by GUIDE-seq analysis. In addition, we identify and characterize another SpCas9 variant that exhibits improved specificity in human cells, possessing better discrimination against off-target sites with non-canonical NAG and NGA PAMs and/or mismatched spacers. We also find that two smaller-size Cas9 orthologues, Streptococcus thermophilus Cas9 (St1Cas9) and Staphylococcus aureus Cas9 (SaCas9), function efficiently in the bacterial selection systems and in human cells, suggesting that our engineering strategies could be extended to Cas9s from other species. Our findings provide broadly useful SpCas9 variants and, more importantly, establish the feasibility of engineering a wide range of Cas9s with altered and improved PAM specificities.

[详细]

  • Nature
  • 9年前
  • Letter

Redox rhythm reinforces the circadian clock to gate immune response

氧化还原的节奏加强了生物钟门的免疫反应

Recent studies have shown that in addition to the transcriptional circadian clock, many organisms, including Arabidopsis, have a circadian redox rhythm driven by the organism’s metabolic activities. It has been hypothesized that the redox rhythm is linked to the circadian clock, but the mechanism and the biological significance of this link have only begun to be investigated. Here we report that the master immune regulator NPR1 (non-expressor of pathogenesis-related gene 1) of Arabidopsis is a sensor of the plant’s redox state and regulates transcription of core circadian clock genes even in the absence of pathogen challenge. Surprisingly, acute perturbation in the redox status triggered by the immune signal salicylic acid does not compromise the circadian clock but rather leads to its reinforcement. Mathematical modelling and subsequent experiments show that NPR1 reinforces the circadian clock without changing the period by regulating both the morning and the evening clock genes. This balanced network architecture helps plants gate their immune responses towards the morning and minimize costs on growth at night. Our study demonstrates how a sensitive redox rhythm interacts with a robust circadian clock to ensure proper responsiveness to environmental stimuli without compromising fitness of the organism.

[详细]

  • Nature
  • 9年前
  • Letter

Edge co-occurrences can account for rapid categorization of natural versus animal images

边缘共生可以解释自然与动物图像的快速分类

Making a judgment about the semantic category of a visual scene, such as whether it contains an animal, is typically assumed to involve high-level associative brain areas. Previous explanations require progressively analyzing the scene hierarchically at increasing levels of abstraction, from edge extraction to mid-level object recognition and then object categorization. Here we show that the statistics of edge co-occurrences alone are sufficient to perform a rough yet robust (translation, scale, and rotation invariant) scene categorization. We first extracted the edges from images using a scale-space analysis coupled with a sparse coding algorithm. We then computed the “association field” for different categories (natural, man-made, or containing an animal) by computing the statistics of edge co-occurrences. These differed strongly, with animal images having more curved configurations. We show that this geometry alone is sufficient for categorization, and that the pattern of errors made by humans is consistent with this procedure. Because these statistics could be measured as early as the primary visual cortex, the results challenge widely held assumptions about the flow of computations in the visual system. The results also suggest new algorithms for image classification and signal processing that exploit correlations between low-level structure and the underlying semantic category.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Genetic drift of human coronavirus OC43 spike gene during adaptive evolution

在自适应进化人类冠状病毒OC43穗基因遗传漂变

Coronaviruses (CoVs) continuously threaten human health. However, to date, the evolutionary mechanisms that govern CoV strain persistence in human populations have not been fully understood. In this study, we characterized the evolution of the major antigen-spike (S) gene in the most prevalent human coronavirus (HCoV) OC43 using phylogenetic and phylodynamic analysis. Among the five known HCoV-OC43 genotypes (A to E), higher substitution rates and dN/dS values as well as more positive selection sites were detected in the S gene of genotype D, corresponding to the most dominant HCoV epidemic in recent years. Further analysis showed that the majority of substitutions were located in the S1 subunit. Among them, seven positive selection sites were chronologically traced in the temporal evolution routes of genotype D, and six were located around the critical sugar binding region in the N-terminal domain (NTD) of S protein, an important sugar binding domain of CoV. These findings suggest that the genetic drift of the S gene may play an important role in genotype persistence in human populations, providing insights into the mechanisms of HCoV-OC43 adaptive evolution.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Peptide-modified Substrate for Modulating Gland Tissue Growth and Morphology In Vitro

肽修饰的底物调节体外腺组织的生长和形态

In vitro fabricated biological tissue would be a valuable tool to screen newly synthesized drugs or understand the tissue development process. Several studies have attempted to fabricate biological tissue in vitro. However, controlling the growth and morphology of the fabricated tissue remains a challenge. Therefore, new techniques are required to modulate tissue growth. RGD (arginine-glycine-aspartic acid), which is an integrin-binding domain of fibronectin, has been found to enhance cell adhesion and survival; it has been used to modify substrates for in vitro cell culture studies or used as tissue engineering scaffolds. In addition, this study shows novel functions of the RGD peptide, which enhances tissue growth and modulates tissue morphology in vitro. When an isolated submandibular gland (SMG) was cultured on an RGD-modified alginate hydrogel sheet, SMG growth including bud expansion and cleft formation was dramatically enhanced. Furthermore, we prepared small RGD-modified alginate beads and placed them on the growing SMG tissue. These RGD-modified beads successfully induced cleft formation at the bead position, guiding the desired SMG morphology. Thus, this RGD-modified material might be a promising tool to modulate tissue growth and morphology in vitro for biological tissue fabrication.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Open and Lys–His Hexacoordinated Closed Structures of a Globin with Swapped Proximal and Distal Sites

开放和封闭的赖氨酸–他的六配位与交换的近端和远端站点的珠蛋白结构

Globins are haem-binding proteins with a conserved fold made up of α-helices and can possess diverse properties. A putative globin-coupled sensor from Methylacidiphilum infernorum, HGbRL, contains an N-terminal globin domain whose open and closed structures reveal an untypical dimeric architecture. Helices E and F fuse into an elongated helix, resulting in a novel site-swapped globin fold made up of helices A–E, hence the distal site, from one subunit and helices F–H, the proximal site, from another. The open structure possesses a large cavity binding an imidazole molecule, while the closed structure forms a unique Lys–His hexacoordinated species, with the first turn of helix E unravelling to allow Lys52(E10) to bind to the haem. Ligand binding induces reorganization of loop CE, which is stabilized in the closed form, and helix E, triggering a large conformational movement in the open form. These provide a mechanical insight into how a signal may be relayed between the globin domain and the C-terminal domain of HGbRL, a Roadblock/LC7 domain. Comparison with HGbI, a closely related globin, further underlines the high degree of structural versatility that the globin fold is capable of, enabling it to perform a diversity of functions.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome

元查询:一个在人类肠道微生物特异基因的快速注释和定量分析Web服务器

Summary: Microbiome researchers frequently want to know how abundant a particular microbial gene or pathway is across different human hosts, including its association with disease and its co-occurrence with other genes or microbial taxa. With thousands of publicly available metagenomes, these questions should be easy to answer. However, computational barriers prevent most researchers from conducting such analyses. We address this problem with MetaQuery, a web application for rapid and quantitative analysis of specific genes in the human gut microbiome. The user inputs one or more query genes, and our software returns the estimated abundance of these genes across 1,267 publicly available faecal metagenomes from American, European, and Chinese individuals. Additionally, our application performs downstream statistical analyses to identify features that are associated with gene variation, including other query genes (i.e. gene co-variation), taxa, clinical variables (e.g. inflammatory bowel disease and diabetes), and average genome size. The speed and accessibility of MetaQuery are a step towards democratizing metagenomics research, which should allow many researchers to query the abundance and variation of specific genes in the human gut microbiome.

Availability: http://metaquery.docpollard.org.

Contact: snayfach@gmail.com

Supplementary Information:Available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

FoldNucleus: Web server for the prediction of RNA and protein folding nuclei from their 3D structures

foldnucleus:从三维结构的RNA和蛋白质折叠核的预测Web服务器

Motivation: To gain insight into how biopolymers fold as quickly as they do, it is useful to determine which structural elements limit the rate of RNA/protein folding.

Summary: We have created a new web server, FoldNucleus. Using this server, it is possible to calculate the folding nucleus for RNA molecules with known 3D structures—including pseudoknots, tRNAs, hairpins, and ribozymes—and for protein molecules with known 3D structures, as long as they are smaller than 200 amino acid residues. Researchers can determine and understand which elements of the structure limit the folding process for various types of RNAs and protein molecules. Experimental FE-values for 21 proteins can be found and compared to those determined by our method: http://bioinfo.protres.ru/resources/phi_values.htm.

Availability: http://bioinfo.protres.ru/foldnucleus/

Contact: ogalzit@vega.protres.ru

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Factors affecting the accuracy of a class prediction model in gene expression data

影响一个基因表达数据的分类预测模型准确性的因素

Background: Class prediction models have been shown to have varying performances in clinical gene expression datasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of class prediction models differs from dataset to dataset and depends on the type of classification function. While a substantial amount of information is known about the characteristics of classification functions, little has been done to determine which characteristics of gene expression data have impact on the performance of a classifier. This study aims to empirically identify data characteristics that affect the predictive accuracy of classification models, outside of the field of cancer. Results: Datasets from twenty five studies meeting predefined inclusion and exclusion criteria were downloaded. Nine classification functions were chosen, falling within the categories: discriminant analyses or Bayes classifiers, tree based, regularization and shrinkage and nearest neighbors methods. Consequently, nine class prediction models were built for each dataset using the same procedure and their performances were evaluated by calculating their accuracies. The characteristics of each experiment were recorded, (i.e., observed disease, medical question, tissue/cell types and sample size) together with characteristics of the gene expression data, namely the number of differentially expressed genes, the fold changes and the within-class correlations. Their effects on the accuracy of a class prediction model were statistically assessed by random effects logistic regression. The number of differentially expressed genes and the average fold change had significant impact on the accuracy of a classification model and gave individual explained-variation in prediction accuracy of up to 72% and 57%, respectively. Multivariable random effects logistic regression with forward selection yielded the two aforementioned study factors and the within class correlation as factors affecting the accuracy of classification functions, explaining 91.5% of the between study variation. Conclusions: We evaluated study- and data-related factors that might explain the varying performances of classification functions in non-cancerous datasets. Our results showed that the number of differentially expressed genes, the fold change, and the correlation in gene expression data significantly affect the accuracy of class prediction models.

[详细]

  • BMC Bioinformatics 2015, null:199
  • 9年前

RNA-Seq analysis of resistant and susceptible potato varieties during the early stages of potato virus Y infection

抗性马铃薯品种的马铃薯Y病毒感染早期阶段的RNA序列分析

Background: Potato virus Y (PVY) is one of the most important plant viruses affecting potato production. The interactions between potato and PVY are complex and the outcome of the interactions depends on the potato genotype, the PVY strain, and the environmental conditions. A potato cultivar can induce resistance to a specific PVY strain, yet be susceptible to another. How a single potato cultivar responds to PVY in both compatible and incompatible interactions is not clear. Results: In this study, we used RNA-sequencing (RNA-Seq) to investigate and compare the transcriptional changes in leaves of potato upon inoculation with PVY. We used two potato varieties: Premier Russet, which is resistant to the PVY strain O (PVY O ) but susceptible to the strain NTN (PVY NTN ), and Russet Burbank, which is susceptible to all PVY strains that have been tested. Leaves were inoculated with PVY O or PVY NTN , and samples were collected 4 and 10 h post inoculation (hpi). A larger number of differentially expressed (DE) genes were found in the compatible reactions compared to the incompatible reaction. For all treatments, the majority of DE genes were down-regulated at 4 hpi and up-regulated at 10 hpi. Gene Ontology enrichment analysis showed enrichment of the biological process GO term “Photosynthesis, light harvesting” specifically in PVY O -inoculated Premier Russet leaves, while the GO term “nucleosome assembly” was largely overrepresented in PVY NTN -inoculated Premier Russet leaves and PVY O -inoculated Russet Burbank leaves but not in PVY O -inoculated Premier Russet leaves. Fewer genes were DE over 4-fold in the incompatible reaction compared to the compatible reactions. Amongst these, five genes were DE only in PVY O -inoculated Premier Russet leaves, and all five were down-regulated. These genes are predicted to encode for a putative ABC transporter, a MYC2 transcription factor, a VQ-motif containing protein, a non-specific lipid-transfer protein, and a xyloglucan endotransglucosylase-hydroxylase. Conclusions: Our results show that the incompatible and compatible reactions in Premier Russet shared more similarities, in particular during the initial response, than the compatible reactions in the two different hosts. Our results identify potential key processes and genes that determine the fate of the reaction, compatible or incompatible, between PVY and its host.

[详细]

  • BMC Genomics 2015, null:472
  • 9年前

Mesenchymal Adenomatous Polyposis Coli plays critical and diverse roles in regulating lung development

间充质<它> > < /结肠腺瘤性息肉病中起关键的调节作用不同肺发育

Background: Adenomatous Polyposis Coli (Apc) is a tumor suppressor that inhibits Wnt/Ctnnb1. Mutations of Apc will not only lead to familial adenomatous polyposis that is an epithelial lesion, but also cause aggressive fibromatosis in mesenchymal cells. However, the roles of Apc in regulating mesenchymal cell biology and organogenesis during development are unknown. Results: We have specifically deleted the Apc gene in lung mesenchymal cells during early lung development in mice. Loss of Apc function resulted in immediate mesenchymal cell hyper-proliferation through abnormal activation of Wnt/Ctnnb1, followed by a subsequent inhibition of cell proliferation due to cell cycle arrest at G0/G1, which was caused by a mechanism independent of Wnt/Ctnnb1. Meanwhile, abrogation of Apc also disrupted lung mesenchymal cell differentiation, including decreased airway and vascular smooth muscle cells, presence of Sox9-positive mesenchymal cells in the peripheral lung, and excessive versican production. Moreover, lung epithelial branching morphogenesis was drastically inhibited due to disrupted Bmp4-Fgf10 morphogen production and regulation in surrounding lung mesenchyme. Lastly, lung mesenchyme-specific Apc conditional knockout also resulted in altered lung vasculogenesis and disrupted pulmonary vascular continuity through a paracrine mechanism, leading to massive pulmonary hemorrhage and lethality at mid gestation when the pulmonary circulation should have started. Conclusions: Our study suggests that Apc in lung mesenchyme plays central roles in coordinating the proper development of several quite different cellular compartments including lung epithelial branching and pulmonary vascular circulation during lung organogenesis.

[详细]

  • BMC Biology 2015, null:42
  • 9年前

Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits

测序策略和医学相关性状的遗传分析721未来长尾猴的基因组特性

Background: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available. Results: We identified genome wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs we constructed genome wide panels suitable for genetic association (~500,000 SNPs) and linkage analysis (~150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent (MIBD) matrices. Conclusions: The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.

[详细]

  • BMC Biology 2015, null:41
  • 9年前

Ferret: a sentence-based literature scanning system

鼬:一句文献扫描系统

Background: The rapid pace of bioscience research makes it very challenging to track relevant articles in one’s area of interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with three-quarters of a million new ones added each year. Thus it is not surprising to see active research in building new document retrieval and sentence retrieval systems. We present Ferret, a prototype retrieval system, designed to retrieve and rank sentences (and their documents) conveying gene-centric relationships of interest to a scientist. The prototype has several features. For example, it is designed to handle gene name ambiguity and perform query expansion. Inputs can be a list of genes with an optional list of keywords. Sentences are retrieved across species but the species discussed in the records are identified. Results are presented in the form of a heat map and sentences corresponding to specific cells of the heat map may be selected for display. Ferret is designed to assist bio scientists at different stages of research from early idea exploration to advanced analysis of results from bench experiments. Results: Three live case studies in the field of plant biology are presented related to Arabidopsis thaliana. The first is to discover genes that may relate to the phenotype of open immature flower in Arabidopsis. The second case is about finding associations reported between ethylene signaling and a set of 300+ Arabidopsis genes. The third case is on searching for potential gene targets of an Arabidopsis transcription factor hypothesized to be involved in plant stress responses. Ferret was successful in finding valuable information in all three cases. In the first case the bZIP family of genes was identified. In the second case sentences indicating relevant associations were found in other species such as potato and jasmine. In the third sentences led to new research questions about the plant hormone salicylic acid. Conclusions: Ferret successfully retrieved relevant gene-centric sentences from PubMed records. The three case studies demonstrate end user satisfaction with the system.

[详细]

  • BMC Bioinformatics 2015, null:198
  • 9年前

RDML-Ninja and RDMLdb for standardized exchange of qPCR data

对qPCR数据标准化交换RDML忍者和rdmldb

Background: The universal qPCR data exchange file format RDML is today well accepted by the scientific community, part of the MIQE guidelines and implemented in many qPCR instruments. With the increased use of RDML new challenges emerge. The flexibility of the RDML format resulted in some implementations that did not meet the expectations of the consortium in the level of support or the use of elements. Results: In the current RDML version 1.2 the description of the elements was sharpened. The open source editor RDML-Ninja was released (http://sourceforge.net/projects/qpcr-ninja/). RDML-Ninja allows to visualize, edit and validate RDML files and thus clarifies the use of RDML elements. Furthermore RDML-Ninja serves as reference implementation for RDML and enables migration between RDML versions independent of the instrument software. The database RDMLdb will serve as an online repository for RDML files and facilitate the exchange of RDML data (http://www.rdmldb.org). Authors can upload their RDML files and reference them in publications by the unique identifier provided by RDMLdb. The MIQE guidelines propose a rich set of information required to document each qPCR run. RDML provides the vehicle to store and maintain this information and current development aims at further integration of MIQE requirements into the RDML format. Conclusions: The editor RDML-Ninja and the database RDMLdb enable scientists to evaluate and exchange qPCR data in the instrument-independent RDML format. We are confident that this infrastructure will build the foundation for standardized qPCR data exchange among scientists, research groups, and during publication.

[详细]

  • BMC Bioinformatics 2015, null:197
  • 9年前

LFQC: A lossless compression algorithm for FASTQ files

lfqc:一为FASTQ文件无损压缩算法

Motivation: Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this paper we address the problem of storage and transmission of large FASTQ files using innovative compression techniques.

Results: We introduce a new lossless non-reference based FASTQ compression algorithm named LFQC.We have compared our algorithm with other state of the art big data compression algorithms namely gzip, bzip2, fastqz (Bonfield and Mahoney (2013)), fqzcomp (Bonfield and Mahoney (2013)), Quip (Jones et al. (2012)), DSRC2 (Roguski and Deorowicz (2014)). This comparison reveals that our algorithm achieves better compression ratios on LS454 and SOLiD datasets.

Availability: The implementations are freely available for noncommercial purposes. They can be downloaded from http://engr.uconn.edu/~rajasek/lfqc-v1.1.zip.

Contact: rajasek@engr.uconn.edu

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

Chimira: Analysis of small RNA Sequencing data and microRNA modifications

chimira:小RNA测序数据分析和microRNA的修改

Summary: Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3’-modifications (e.g. uridylation, adenylation), 5’-modifications and also internal modifications or variation (ADAR editing or SNPs). Besides cleaning and mapping of input sequences to miRNAs (Griffiths-Jones et al., 2008), Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplemental material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface.

Availability: Chimira has been developed as a web application and it is accessible here: http://wwwdev.ebi.ac.uk/enright-srv/chimira.

Contact: aje@ebi.ac.uk

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics

成膜助剂为基础的非参数phylodynamics高效的贝叶斯推理的框架

Motivation: The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology, but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g., influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. While this approach is quite powerful, large data sets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost.

Results: To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm.

Availability: The R code for all simulation studies and real data analysis conducted in this paper are publicly available at http://www.ics.uci.edu/~slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn.

Contact: Shiwei Lan – S.Lan@warwick.ac.uk, Babak Shahbaba – babaks@uci.edu

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

GenomeD3Plot: A library for rich, interactive visualizations of genomic data in web applications

genomed3plot:图书馆丰富的Web应用程序,在基因组数据的交互式可视化

Motivation: A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations.

GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general.

Availability: GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/

Contact: brinkman@sfu.ca

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE