protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences

protr / ProtrWeb:R包和web服务器生成各种蛋白质序列的数值表示计划

Summary: Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package.

Availability and implementation: The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.

Contact: oriental-cds@163.com or dasongxu@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • SYSTEMS BIOLOGY

NetExplore: a web server for modeling small network motifs

NetExplore:web服务器为小型网络主题建模

Motivation: Quantitative and qualitative assessment of biological data often produces small essential recurrent networks, containing 3–5 components called network motifs. In this context, model solutions for small network motifs represent very high interest.

Results: Software package NetExplore has been created in order to generate, classify and analyze solutions for network motifs including up to six network components. NetExplore allows plotting and visualization of the solution's phase spaces and bifurcation diagrams.

Availability and implementation: The current version of NetExplore has been implemented in Perl-CGI and is accessible at the following locations: http://line.bioinfolab.net/nex/NetExplore.htm and http://nex.autosome.ru/nex/NetExplore.htm.

Contact: dmitri.papatsenko@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • SYSTEMS BIOLOGY

Dizzy-Beats: a Bayesian evidence analysis tool for systems biology

Dizzy-Beats:贝叶斯证据分析系统生物学的工具

Motivation: Model selection and parameter inference are complex problems of long-standing interest in systems biology. Selecting between competing models arises commonly as underlying biochemical mechanisms are often not fully known, hence alternative models must be considered. Parameter inference yields important information on the extent to which the data and the model constrain parameter values.

Results: We report Dizzy-Beats, a graphical Java Bayesian evidence analysis tool implementing nested sampling - an algorithm yielding an estimate of the log of the Bayesian evidence Z and the moments of model parameters, thus addressing two outstanding challenges in systems modelling. A likelihood function based on the L1-norm is adopted as it is generically applicable to replicated time series data.

Availability and implementation: http://sourceforge.net/p/bayesevidence/home/Home/

Contact: s.aitken@ed.ac.uk

[详细]

  • Bioinformatics
  • 10年前
  • SYSTEMS BIOLOGY

TIMMA-R: an R package for predicting synergistic multi-targeted drug combinations in cancer cell lines or patient-derived samples

TIMMA-R:R包协同多目标预测药物在癌症细胞系或组合patient-derived样本

Summary: Network pharmacology-based prediction of multi-targeted drug combinations is becoming a promising strategy to improve anticancer efficacy and safety. We developed a logic-based network algorithm, called Target Inhibition Interaction using Maximization and Minimization Averaging (TIMMA), which predicts the effects of drug combinations based on their binary drug-target interactions and single-drug sensitivity profiles in a given cancer sample. Here, we report the R implementation of the algorithm (TIMMA-R), which is much faster than the original MATLAB code. The major extensions include modeling of multiclass drug-target profiles and network visualization. We also show that the TIMMA-R predictions are robust to the intrinsic noise in the experimental data, thus making it a promising high-throughput tool to prioritize drug combinations in various cancer types for follow-up experimentation or clinical applications.

Availability and implementation: TIMMA-R source code is freely available at http://cran.r-project.org/web/packages/timma/.

Contact: jing.tang@helsinki.fi

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • SYSTEMS BIOLOGY

CypRules: a rule-based P450 inhibition prediction server

CypRules:基于规则的P450抑制预测服务器

Summary: Cytochrome P450 (CYPs) are the major enzymes involved in drug metabolism and bioactivation. Inhibition models were constructed for five of the most popular enzymes from the CYP superfamily in human liver. The five enzymes chosen for this study, namely CYP1A2, CYP2D6, CYP2C19, CYP2C9 and CYP3A4, account for 90% of the xenobiotic and drug metabolism in human body. CYP enzymes can be inhibited or induced by various drugs or chemical compounds. In this work, a rule-based CYP inhibition prediction online server, CypRules, was created based on predictive models generated by the rule-based C5.0 algorithm. CypRules can predict and provide structural rulesets for CYP inhibition for each compound uploaded to the server. Capable of fast execution performance, it can be used for virtual high-throughput screening (VHTS) of a large set of testing compounds.

Availability and implementation: CypRules is freely accessible at http://cyprules.cmdm.tw/ and models, descriptor and program files for all compounds are publically available at http://cyprules.cmdm.tw/sources/sources.rar.

Contact: yjtseng@csie.ntu.edu.tw

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • DATA AND TEXT MINING

ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life

环境和终点:识别环境本体术语的注释文本和生活的百科全书

Summary: The association of organisms to their environments is a key issue in exploring biodiversity patterns. This knowledge has traditionally been scattered, but textual descriptions of taxa and their habitats are now being consolidated in centralized resources. However, structured annotations are needed to facilitate large-scale analyses. Therefore, we developed ENVIRONMENTS, a fast dictionary-based tagger capable of identifying Environment Ontology (ENVO) terms in text. We evaluate the accuracy of the tagger on a new manually curated corpus of 600 Encyclopedia of Life (EOL) species pages. We use the tagger to associate taxa with environments by tagging EOL text content monthly, and integrate the results into the EOL to disseminate them to a broad audience of users.

Availability and implementation: The software and the corpus are available under the open-source BSD and the CC-BY-NC-SA 3.0 licenses, respectively, at http://environments.hcmr.gr

Contact: pafilis@hcmr.gr or lars.juhl.jensen@cpr.ku.dk

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • DATA AND TEXT MINING

SPARQL-enabled identifier conversion with Identifiers.org

SPARQL-enabled标识符转换与Identifiers.org

Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data.

Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data.

Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql.

Contact: sarala@ebi.ac.uk

[详细]

  • Bioinformatics
  • 10年前
  • DATABASES AND ONTOLOGIES

Deterministic identification of specific individuals from GWAS results

确定性识别特定个人的GWAS的结果

Motivation: Genome-wide association studies (GWASs) are commonly applied on human genomic data to understand the causal gene combinations statistically connected to certain diseases. Patients involved in these GWASs could be re-identified when the studies release statistical information on a large number of single-nucleotide polymorphisms. Subsequent work, however, found that such privacy attacks are theoretically possible but unsuccessful and unconvincing in real settings.

Results: We derive the first practical privacy attack that can successfully identify specific individuals from limited published associations from the Wellcome Trust Case Control Consortium (WTCCC) dataset. For GWAS results computed over 25 randomly selected loci, our algorithm always pinpoints at least one patient from the WTCCC dataset. Moreover, the number of re-identified patients grows rapidly with the number of published genotypes. Finally, we discuss prevention methods to disable the attack, thus providing a solution for enhancing patient privacy.

Availability and implementation: Proofs of the theorems and additional experimental results are available in the support online documents. The attack algorithm codes are publicly available at https://sites.google.com/site/zhangzhenjie/GWAS_attack.zip. The genomic dataset used in the experiments is available at http://www.wtccc.org.uk/ on request.

Contact: winslett@illinois.edu or zhenjie@adsc.com.sg

Supplementary information: Supplementary data are available from Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • GENOME ANALYSIS

CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data

CNOGpro:检测和量化的基因拷贝数异变在原核生物的全基因组测序数据

Motivation: The explosion of whole-genome sequencing (WGS) as a tool in the mapping and understanding of genomes has been accompanied by an equally massive report of tools and pipelines for the analysis of DNA copy number variation (CNV). Most currently available tools are designed specifically for human genomes, with comparatively little literature devoted to CNVs in prokaryotic organisms. However, there are several idiosyncrasies in prokaryotic WGS data. This work proposes a step-by-step approach for detection and quantification of copy number variants specifically aimed at prokaryotes.

Results: After aligning WGS reads to a reference genome, we count the individual reads in a sliding window and normalize these counts for bias introduced by differences in GC content. We then investigate the coverage in two fundamentally different ways: (i) Employing a Hidden Markov Model and (ii) by repeated sampling with replacement (bootstrapping) on each individual gene. The latter bypasses the complex problem of breakpoint determination. To demonstrate our method, we apply it to real and simulated WGS data and benchmark it against two popular methods for CNV detection. The proposed methodology will in some cases represent a significant jump in accuracy from other current methods.

Availability and implementation: CNOGpro is written entirely in the R programming language and is available from the CRAN repository (http://cran.r-project.org) under the GNU General Public License.

Contact: ola.brynildsrud@nmbu.no

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • GENOME ANALYSIS

GPU MrBayes V3.1: MrBayes on graphics processing units for protein sequence data

GPU MrBayes V3.1:蛋白质序列数据图形处理单元MrBayes

We present a modified GPU version of MrBayes, called ta(MC)3 (GPU MrBayes V3.1), for Bayesian phylogeneic inference on protein datasets. Our main contributions are (a) utilizing 64-bit variables, thereby enabling ta(MC)3 to process larger datasets than MrBayes, and (b) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus the current fastest software, we achieve a speedup of up to around 2.5 (and up to around 90 vs. serial MrBayes), and more on multi-GPU hardware. GPU MrBayes V3.1 is available from http://sourceforge.net/projects/mrbayes-gpu/.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Brief Communication

A Coevolutionary Arms Race between Hosts and Viruses Drives Polymorphism and Polygenicity of NK Cell Receptors

共同进化的军备竞赛带动主机和病毒多态性与NK细胞受体polygenicity之间

Natural killer cell receptors (NKRs) monitor the expression of major histocompatibility class I (MHC-I) and stress molecules to detect unhealthy tissue, such as infected or tumor cells. The NKR gene family shows a remarkable genetic diversity, containing several genes encoding receptors with activating and inhibiting signaling, and varying in gene content and allelic polymorphism. The expansion of the NKR genes is species-specific, with different species evolving alternative expanded NKR genes, which encode structurally different proteins, yet perform comparable functions. So far, the biological function of this expansion within the NKR cluster has remained poorly understood. To study the evolution of NKRs, we have developed an agent-based model implementing a coevolutionary scenario between hosts and herpes-like viruses that are able to evade the immune response by downregulating the expression of MHC-I on the cell surface. We show that hosts evolve specific inhibitory NKRs, specialized to particular MHC-I alleles in the population. Viruses in our simulations readily evolve proteins mimicking the MHC molecules of their host, even in the absence of MHC-I downregulation. As a result, the NKR locus becomes polygenic and polymorphic, encoding both specific inhibiting and activating receptors to optimally protect the hosts from coevolving viruses.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Evolution Is an Experiment: Assessing Parallelism in Crop Domestication and Experimental Evolution: (Nei Lecture, SMBE 2014, Puerto Rico)

进化是一个实验:评估作物驯化和实验演化并行(内讲座,SMBE 2014,波多黎各)

In this commentary, I make inferences about the level of repeatability and constraint in the evolutionary process, based on two sets of replicated experiments. The first experiment is crop domestication, which has been replicated across many different species. I focus on results of whole-genome scans for genes selected during domestication and ask whether genes are, in fact, selected in parallel across different domestication events. If genes are selected in parallel, it implies that the number of genetic solutions to the challenge of domestication is constrained. However, I find no evidence for parallel selection events either between species (maize vs. rice) or within species (two domestication events within beans). These results suggest that there are few constraints on genetic adaptation, but conclusions must be tempered by several complicating factors, particularly the lack of explicit design standards for selection screens. The second experiment involves the evolution of Escherichia coli to thermal stress. Unlike domestication, this highly replicated experiment detected a limited set of genes that appear prone to modification during adaptation to thermal stress. However, the number of potentially beneficial mutations within these genes is large, such that adaptation is constrained at the genic level but much less so at the nucleotide level. Based on these two experiments, I make the general conclusion that evolution is remarkably flexible, despite the presence of epistatic interactions that constrain evolutionary trajectories. I also posit that evolution is so rapid that we should establish a Speciation Prize, to be awarded to the first researcher who demonstrates speciation with a sexual organism in the laboratory.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Editorial

Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors

氨基酸重复扩张的复制转录因子的功能多样化的关键作用

The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Relaxed Observance of Traditional Marriage Rules Allows Social Connectivity without Loss of Genetic Diversity

放松传统的婚姻规则的遵守允许社会连接没有遗传多样性丧失

Marriage rules, the community prescriptions that dictate who an individual can or cannot marry, are extremely diverse and universally present in traditional societies. A major focus of research in the early decades of modern anthropology, marriage rules impose social and economic forces that help structure societies and forge connections between them. However, in those early anthropological studies, the biological benefits or disadvantages of marriage rules could not be determined. We revisit this question by applying a novel simulation framework and genome-wide data to explore the effects of Asymmetric Prescriptive Alliance, an elaborate set of marriage rules that has been a focus of research for many anthropologists. Simulations show that strict adherence to these marriage rules reduces genetic diversity on the autosomes, X chromosome and mitochondrial DNA, but relaxed compliance produces genetic diversity similar to random mating. Genome-wide data from the Indonesian community of Rindi, one of the early study populations for Asymmetric Prescriptive Alliance, are more consistent with relaxed compliance than strict adherence. We therefore suggest that, in practice, marriage rules are treated with sufficient flexibility to allow social connectivity without significant degradation of biological diversity.

[详细]

  • Molecular Biology and Evolution
  • 10年前
  • Research Article

Real-time single-molecule studies of the motions of DNA polymerase fingers illuminate DNA synthesis mechanisms

DNA聚合酶的手指的动作实时单分子研究阐明DNA的合成机制

DNA polymerases maintain genomic integrity by copying DNA with high fidelity. A conformational change important for fidelity is the motion of the polymerase fingers subdomain from an open to a closed conformation upon binding of a complementary nucleotide. We previously employed intra-protein single-molecule FRET on diffusing molecules to observe fingers conformations in polymerase–DNA complexes. Here, we used the same FRET ruler on surface-immobilized complexes to observe fingers-opening and closing of individual polymerase molecules in real time. Our results revealed the presence of intrinsic dynamics in the binary complex, characterized by slow fingers-closing and fast fingers-opening. When binary complexes were incubated with increasing concentrations of complementary nucleotide, the fingers-closing rate increased, strongly supporting an induced-fit model for nucleotide recognition. Meanwhile, the opening rate in ternary complexes with complementary nucleotide was 6 s–1, much slower than either fingers closing or the rate-limiting step in the forward direction; this rate balance ensures that, after nucleotide binding and fingers-closing, nucleotide incorporation is overwhelmingly likely to occur. Our results for ternary complexes with a non-complementary dNTP confirmed the presence of a state corresponding to partially closed fingers and suggested a radically different rate balance regarding fingers transitions, which allows polymerase to achieve high fidelity.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Genome Integrity, Repair and Replication

Synthesis and triplex-forming properties of oligonucleotides capable of recognizing corresponding DNA duplexes containing four base pairs

和三联能够识别相应的DNA双链含有四个碱基的寡核苷酸成形性能的合成

A triplex-forming oligonucleotide (TFO) could be a useful molecular tool for gene therapy and specific gene modification. However, unmodified TFOs have two serious drawbacks: low binding affinities and high sequence-dependencies. In this paper, we propose a new strategy that uses a new set of modified nucleobases for four-base recognition of TFOs, and thereby overcome these two drawbacks. TFOs containing a 2’-deoxy-4N-(2-guanidoethyl)-5-methylcytidine (dgC) residue for a C-G base pair have higher binding and base recognition abilities than those containing 2’-OMe-4N-(2-guanidoethyl)-5-methylcytidine (2’-OMegC), 2’-OMe-4N-(2-guanidoethyl)-5-methyl-2-thiocytidine (2’-OMegCs), dgC and 4S-(2-guanidoethyl)-4-thiothymidine (gsT). Further, we observed that N-acetyl-2,7-diamino-1,8-naphtyridine (DANac) has a higher binding and base recognition abilities for a T-A base pair compared with that of dG and the other DNA derivatives. On the basis of this knowledge, we successfully synthesized a fully modified TFO containing DANac, dgC, 2’-OMe-2-thiothymidine (2’-OMesT) and 2’-OMe-8-thioxoadenosine (2’-OMesA) with high binding and base recognition abilities. To the best of our knowledge, this is the first report in which a fully modified TFO accurately recognizes a complementary DNA duplex having a mixed sequence under neutral conditions.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Chemical Biology and Nucleic Acid Chemistry

The Cas6e ribonuclease is not required for interference and adaptation by the E. coli type I-E CRISPR-Cas system

的cas6e核糖核酸酶不干扰由大肠杆菌型I-E CRISPR-Cas系统适应性要求

CRISPR-Cas are small RNA-based adaptive prokaryotic immunity systems protecting cells from foreign DNA or RNA. Type I CRISPR-Cas systems are composed of a multiprotein complex (Cascade) that, when bound to CRISPR RNA (crRNA), can recognize double-stranded DNA targets and recruit the Cas3 nuclease to destroy target-containing DNA. In the Escherichia coli type I-E CRISPR-Cas system, crRNAs are generated upon transcription of CRISPR arrays consisting of multiple palindromic repeats and intervening spacers through the function of Cas6e endoribonuclease, which cleaves at specific positions of repeat sequences of the CRISPR array transcript. Cas6e is also a component of Cascade. Here, we show that when mature unit-sized crRNAs are provided in a Cas6e-independent manner by transcription termination, the CRISPR-Cas system can function without Cas6e. The results should allow facile interrogation of various targets by type I-E CRISPR-Cas system in E. coli using unit-sized crRNAs generated by transcription.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Molecular Biology

Quantitative characterization of protein-protein complexes involved in base excision DNA repair

蛋白质-蛋白质复合物参与DNA碱基切除修复的定量表征

Base Excision Repair (BER) efficiently corrects the most common types of DNA damage in mammalian cells. Step-by-step coordination of BER is facilitated by multiple interactions between enzymes and accessory proteins involved. Here we characterize quantitatively a number of complexes formed by DNA polymerase β (Polβ), apurinic/apyrimidinic endonuclease 1 (APE1), poly(ADP-ribose) polymerase 1 (PARP1), X-ray repair cross-complementing protein 1 (XRCC1) and tyrosyl-DNA phosphodiesterase 1 (TDP1), using fluorescence- and light scattering-based techniques. Direct physical interactions between the APE1-Polβ, APE1-TDP1, APE1-PARP1 and Polβ-TDP1 pairs have been detected and characterized for the first time. The combined results provide strong evidence that the most stable complex is formed between XRCC1 and Polβ. Model DNA intermediates of BER are shown to induce significant rearrangement of the Polβ complexes with XRCC1 and PARP1, while having no detectable influence on the protein–protein binding affinities. The strength of APE1 interaction with Polβ, XRCC1 and PARP1 is revealed to be modulated by BER intermediates to different extents, depending on the type of DNA damage. The affinity of APE1 for Polβ is higher in the complex with abasic site-containing DNA than after the APE1-catalyzed incision. Our findings advance understanding of the molecular mechanisms underlying coordination and regulation of the BER process.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Genome Integrity, Repair and Replication

DeAnnCNV: a tool for online detection and annotation of copy number variations from whole-exome sequencing data

deanncnv:一种从全基因组测序数据的在线检测和拷贝数变异注释工具

With the decrease in costs, whole-exome sequencing (WES) has become a very popular and powerful tool for the identification of genetic variants underlying human diseases. However, integrated tools to precisely detect and systematically annotate copy number variations (CNVs) from WES data are still in great demand. Here, we present an online tool, DeAnnCNV (Detection and Annotation of Copy Number Variations from WES data), to meet the current demands of WES users. Upon submitting the file generated from WES data by an in-house tool that can be downloaded from our server, DeAnnCNV can detect CNVs in each sample and extract the shared CNVs among multiple samples. DeAnnCNV also provides additional useful supporting information for the detected CNVs and associated genes to help users to find the potential candidates for further experimental study. The web server is implemented in PHP + Perl + MATLAB and is online available to all users for free at http://mcg.ustc.edu.cn/db/cnv/.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

Nucleotidyl transferase assisted DNA labeling with different click chemistries

核苷酸转移酶辅助DNA不同的点击化学标记

Here, we present a simple, modular and efficient strategy that allows the 3'-terminal labeling of DNA, regardless of whether it has been chemically or enzymatically synthesized or isolated from natural sources. We first incorporate a range of modified nucleotides at the 3'-terminus, using terminal deoxynucleotidyl transferase. In the second step, we convert the incorporated nucleotides, using either of four highly efficient click chemistry-type reactions, namely copper-catalyzed azide-alkyne cycloaddition, strain-promoted azide-alkyne cycloaddition, Staudinger ligation or Diels-Alder reaction with inverse electron demand. Moreover, we create internal modifications, making use of either ligation or primer extension, after the nucleotidyl transferase step, prior to the click reaction. We further study the influence of linker variants on the reactivity of azides in different click reactions. We find that different click reactions exhibit distinct substrate preferences, a fact that is often overlooked, but should be considered when labeling oligonucleotides or other biomolecules with click chemistry. Finally, our findings allowed us to extend our previously published RNA labeling strategy to the use of a different copper-free click chemistry, namely the Staudinger ligation.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Methods Online

PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank

patternquery:快速检测在整个生物大分子结构的蛋白质数据银行模式的Web应用

Well defined biomacromolecular patterns such as binding sites, catalytic sites, specific protein or nucleic acid sequences, etc. precisely modulate many important biological phenomena. We introduce PatternQuery, a web-based application designed for detection and fast extraction of such patterns. The application uses a unique query language with Python-like syntax to define the patterns that will be extracted from datasets provided by the user, or from the entire Protein Data Bank (PDB). Moreover, the database-wide search can be restricted using a variety of criteria, such as PDB ID, resolution, and organism of origin, to provide only relevant data. The extraction generally takes a few seconds for several hundreds of entries, up to approximately one hour for the whole PDB. The detected patterns are made available for download to enable further processing, as well as presented in a clear tabular and graphical form directly in the browser. The unique design of the language and the provided service could pave the way towards novel PDB-wide analyses, which were either difficult or unfeasible in the past. The application is available free of charge at http://ncbr.muni.cz/PatternQuery.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Web Server Issue

A mass spectrometry-based method for comprehensive quantitative determination of post-transcriptional RNA modifications: the complete chemical structure of Schizosaccharomyces pombe ribosomal RNAs

一种基于质谱的RNA转录后修饰的综合定量测定方法:粟酒裂殖酵母核糖体RNA的完整的化学结构

We present a liquid chromatography–mass spectrometry (LC-MS)-based method for comprehensive quantitative identification of post-transcriptional modifications (PTMs) of RNA. We incorporated an in vitro-transcribed, heavy isotope-labeled reference RNA into a sample RNA solution, digested the mixture with a number of RNases and detected the post-transcriptionally modified oligonucleotides quantitatively based on shifts in retention time and the MS signal in subsequent LC-MS. This allowed the determination and quantitation of all PTMs in Schizosaccharomyces pombe ribosomal (r)RNAs and generated the first complete PTM maps of eukaryotic rRNAs at single-nucleotide resolution. There were 122 modified sites, most of which appear to locate at the interface of ribosomal subunits where translation takes place. We also identified PTMs at specific locations in rRNAs that were altered in response to growth conditions of yeast cells, suggesting that the cells coordinately regulate the modification levels of RNA.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Methods Online

PopIns: population-scale detection of novel sequence insertions

波平:人口规模检测序列插入的小说

Motivation: The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions.

Results: We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach.

Availability and implementation: The source code of PopIns is available from http://github.com/bkehr/popins.

Contact: birte.kehr@decode.is

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER