tcR: an R package for T cell receptor repertoire advanced data analysis

为T细胞受体细胞:一个R包曲目先进的数据分析

Background: The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is required for the rational analysis of massive data generated by next-generation sequencing. Results: Here we introduce tcR, a new R package, representing a platform for the advanced analysis of T cell receptor repertoires, which includes diversity measures, shared T cell receptor sequences identification, gene usage statistics computation and other widely used methods. The tool has proven its utility in recent research studies. Conclusions: tcR is an R package for the advanced analysis of T cell receptor repertoires after primary TR sequences extraction from raw sequencing reads. The stable version can be directly installed from The Comprehensive R Archive Network (http://cran.r-project.org/mirrors.html). The source code and development version are available at tcR GitHub (http://imminfo.github.io/tcr/) along with the full documentation and typical usage examples.

[详细]

  • BMC Bioinformatics 2015, null:175
  • 10年前

A novel hybrid single molecule approach reveals spontaneous DNA motion in the nucleosome

一种新的混合单分子方法揭示核小体DNA自发运动

Structural dynamics of nucleic acid and protein is an important physical basis of their functions. These motions are often very difficult to synchronize and too fast to be clearly resolved with the currently available single molecule methods. Here we demonstrate a novel hybrid single molecule approach combining stochastic data analysis with fluorescence correlation that enables investigations of sub-ms unsynchronized structural dynamics of macromolecules. Based on the method, we report the first direct evidence of spontaneous DNA motions at the nucleosome termini. The nucleosome, comprising DNA and a histone core, is the fundamental packing unit of eukaryotic genes that must be accessed during various genome transactions. Spontaneous DNA opening at the nucleosome termini has long been hypothesized to enable gene access in the nucleosome, but has yet to be directly observed. Our approach reveals that DNA termini in the nucleosome open and close repeatedly at 0.1–1 ms–1. The kinetics depends on salt concentration and DNA–histone interactions but not much on DNA sequence, suggesting that this dynamics is universal and imposes the kinetic limit to gene access. These results clearly demonstrate that our method provides an efficient and robust means to investigate unsynchronized structural changes of DNA at a sub-ms time resolution.

[详细]

  • Nucleic Acids Research
  • 10年前
  • Methods online

Finding the positive feedback loops underlying multi-stationarity

底层multi-stationarity找到积极的反馈循环

Background: Bistability is ubiquitous in biological systems. For example, bistability is found in many reaction networks that involve the control and execution of important biological functions, such as signaling processes. Positive feedback loops, composed of species and reactions, are necessary for bistability, and generally for multi-stationarity, to occur. These loops are therefore often used to illustrate and pinpoint the parts of a multi-stationary network that are relevant (‘responsible’) for the observed multi-stationarity. However positive feedback loops are generally abundant in reaction networks but not all of them are important for understanding the network’s dynamics. Results: We present an automated procedure to determine the relevant positive feedback loops of a multi-stationary reaction network. The procedure only reports the loops that are relevant for multi-stationarity (that is, when broken multi-stationarity disappears) and not all positive feedback loops of the network. We show that the relevant positive feedback loops must be understood in the context of the network (one loop might be relevant for one network, but cannot create multi-stationarity in another). Finally, we demonstrate the procedure by applying it to several examples of signaling processes, including a ubiquitination and an apoptosis network, and to models extracted from the Biomodels database. The procedure is implemented in Maple. Conclusions: We have developed and implemented an automated procedure to find relevant positive feedback loops in reaction networks. The results of the procedure are useful for interpretation and summary of the network’s dynamics.

[详细]

  • BMC Systems Biology 2015, null:22
  • 10年前

Reproducibility: changing the policies and culture of cell line authentication

重复性:改变政策和细胞系认证文化

Quality control of cell lines used in biomedical research is essential to ensure reproducibility. Although cell line authentication has been widely recommended for many years, misidentification, including cross-contamination, remains a serious problem. We outline a multi-stakeholder, incremental approach and policy-related recommendations to facilitate change in the culture of cell line authentication.

[详细]

  • Nature Methods 12, 493 (2015)
  • 10年前
  • Commentary

Swiss army knives: non-canonical functions of nuclear Drosha and Dicer

瑞士军刀:核Drosha和Dicer的非规范的功能

The RNase III enzymes Drosha and Dicer are essential for the production of small non-coding RNAs (ncRNAs). In canonical RNAi, microRNAs (miRNAs) regulate gene expression by post-transcriptional gene silencing. In non-canonical RNAi, nuclear RNAi factors generate small ncRNAs that are essential for transcriptional gene silencing.

[详细]

  • Nature Reviews Molecular Cell Biology 16, 417 (2015)
  • 10年前
  • Review

Construction of a liposome dialyzer for the preparation of high-value, small-volume liposome formulations

对高价值的制备脂质体透析施工,体积小的脂质体制剂

Performing biochemical reactions within liposomes requires the supply of reagents and removal of products present in very small quantities. By using this protocol, a liposome dialyzer can be easily constructed to enable reagent exchange in small volumes.

[详细]

  • Nature Protocols 10, 927 (2015)
  • 10年前
  • Protocol

Corrigendum: RNAi-based biosynthetic pathway screens to identify in vivo functions of non-nucleic acid–based metabolites such as lipids

勘误:基于RNAi的生物合成途径的屏幕在非核酸体内功能识别–基础代谢产物如脂

Nat. Protoc.10, 681–700 (2015); doi:10.1038/nprot.2015.031; published online 2 April 2015; corrected after print 27 May 2015In the version of this article initially published, the text detailing the HPLC elution protocol in the Equipment Setup section (a 2-min

[详细]

  • Nature Protocols 10, 939 (2015)
  • 10年前
  • Corrigendum

Erratum: RGB marking with lentiviral vectors for multicolor clonal cell tracking

勘误:RGB彩色标记的慢病毒载体克隆细胞跟踪

Nat. Protoc.5, 839–849 (2012); doi:10.1038/nprot.2012.026; published online 5 April 2012; corrected after print 3 April 2015In the version of this article initially published, the text 'RGB-marked 293 T cells.' was missing from the legend of Figure 3

[详细]

  • Nature Protocols 10, 939 (2015)
  • 10年前
  • Erratum

antaRNA - Ant Colony Based RNA Sequence Design

AntaRNA - Ant Colony -based RNA Sequence Design

Motivation: RNA sequence design is studied at least as long as the classical folding problem. While for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components.

Results: In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GCcontent distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets.

Availability: http://www.bioinf.uni-freiburg.de/Software/antaRNA

Contact: backofen@informatik.uni-freiburg.de

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

xHeinz: An algorithm for mining cross-species network modules under a flexible conservation model

xHeinz:An algorithm cross-species network for采矿对弹性养护model modules之下

Motivation: Integrative network analysis methods provide robust interpretations of differential high-throughput molecular profile measurements. They are often used in a biomedical context—to generate novel hypotheses about the underlying cellular processes or to derive biomarkers for classification and subtyping. The underlying molecular profiles are frequently measured and validated on animal or cellular models. Therefore the results are not immediately transferable to human. In particular, this is also the case in a study of the recently discovered interleukin-17 producing helper T cells (Th17), which are fundamental for anti-microbial immunity but also known to contribute to autoimmune diseases.

Results: We propose a mathematical model for finding active subnetwork modules that are conserved between two species. These are sets of genes, one for each species, which (i) induce a connected subnetwork in a species-specific interaction network, (ii) show overall differential behavior and (iii) contain a large number of orthologous genes. We propose a flexible notion of conservation, which turns out to be crucial for the quality of the resulting modules in terms of biological interpretability. We propose an algorithm that finds provably optimal or near-optimal conserved active modules in our model. We apply our algorithm to understand the mechanisms underlying Th17 T cell differentiation in both mouse and human. As a main biological result, we find that the key regulation of Th17 differentiation is conserved between human and mouse.

Availability: xHeinz, an implementation of our algorithm, as well as all input data and results, are available at http://software.cwi.nl/xheinz and as a Galaxy service at http://services.cbib.u-bordeaux2.fr/galaxy in CBiB Tools.

Contact: gunnar.klau@cwi.nl

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

RVD2: an ultra-sensitive variant detection model for low-depth heterogeneous next-generation sequencing data

RVD2:一个极度敏感的变体检测模型low-depth异构下一代测序数据

Motivation: Next-generation sequencing technology is increasingly being used for clinical diagnostic tests. Clinical samples are often genomically heterogeneous due to low sample purity or the presence of genetic subpopulations. Therefore, a variant calling algorithm for calling low-frequency polymorphisms in heterogeneous samples is needed.

Results: We present a novel variant calling algorithm that uses a hierarchical Bayesian model to estimate allele frequency and call variants in heterogeneous samples. We show that our algorithm improves upon current classifiers and has higher sensitivity and specificity over a wide range of median read depth and minor allele fraction. We apply our model and identify 15 mutated loci in the PAXP1 gene in a matched clinical breast ductal carcinoma tumor sample; two of which are likely loss-of-heterozygosity events.

Availability and implementation: http://genomics.wpi.edu/rvd2/.

Contact: pjflaherty@wpi.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

hiHMM: Bayesian non-parametric joint inference of chromatin state maps

hiHMM:贝叶斯非参数联合推断染色质状态的地图

Motivation: Genome-wide mapping of chromatin states is essential for defining regulatory elements and inferring their activities in eukaryotic genomes. A number of hidden Markov model (HMM)-based methods have been developed to infer chromatin state maps from genome-wide histone modification data for an individual genome. To perform a principled comparison of evolutionarily distant epigenomes, we must consider species-specific biases such as differences in genome size, strength of signal enrichment and co-occurrence patterns of histone modifications.

Results: Here, we present a new Bayesian non-parametric method called hierarchically linked infinite HMM (hiHMM) to jointly infer chromatin state maps in multiple genomes (different species, cell types and developmental stages) using genome-wide histone modification data. This flexible framework provides a new way to learn a consistent definition of chromatin states across multiple genomes, thus facilitating a direct comparison among them. We demonstrate the utility of this method using synthetic data as well as multiple modENCODE ChIP-seq datasets.

Conclusion: The hierarchical and Bayesian non-parametric formulation in our approach is an important extension to the current set of methodologies for comparative chromatin landscape analysis.

Availability and implementation: Source codes are available at https://github.com/kasohn/hiHMM. Chromatin data are available at http://encode-x.med.harvard.edu/data_sets/chromatin/.

Contact: peter_park@harvard.edu or juhan@snu.ac.kr

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Seq2pathway: an R/Bioconductor package for pathway analysis of next-generation sequencing data

Seq2pathway:R / Bioconductor包路径下一代测序数据的分析

Summary: Seq2pathway is an R/Python wrapper for pathway (or functional gene-set) analysis of genomic loci, adapted for advances in genome research. Seq2pathway associates the biological significance of genomic loci with their target transcripts and then summarizes the quantified values on the gene-level into pathway scores. It is designed to isolate systematic disturbances and common biological underpinnings from next-generation sequencing (NGS) data. Seq2pathway offers Bioconductor users enhanced capability in discovering collective pathway effects caused by both coding genes and cis-regulation of non-coding elements.

Availability and implementation: The package is freely available at http://www.bioconductor.org/packages/release/bioc/html/seq2pathway.html.

Contact: xyang2@uchicago.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATION NOTE

Identification of C2H2-ZF binding preferences from ChIP-seq data using RCADE

识别C2H2-ZF从使用RCADE ChIP-seq数据绑定的偏好

Summary: Current methods for motif discovery from chromatin immunoprecipitation followed by sequencing (ChIP-seq) data often identify non-targeted transcription factor (TF) motifs, and are even further limited when peak sequences are similar due to common ancestry rather than common binding factors. The latter aspect particularly affects a large number of proteins from the Cys2His2 zinc finger (C2H2-ZF) class of TFs, as their binding sites are often dominated by endogenous retroelements that have highly similar sequences. Here, we present recognition code-assisted discovery of regulatory elements (RCADE) for motif discovery from C2H2-ZF ChIP-seq data. RCADE combines predictions from a DNA recognition code of C2H2-ZFs with ChIP-seq data to identify models that represent the genuine DNA binding preferences of C2H2-ZF proteins. We show that RCADE is able to identify generalizable binding models even from peaks that are exclusively located within the repeat regions of the genome, where state-of-the-art motif finding approaches largely fail.

Availability and implementation: RCADE is available as a webserver and also for download at http://rcade.ccbr.utoronto.ca/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: t.hughes@utoronto.ca

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps

seqgl识别上下文结合信号在全基因组调控元件图

by Manu Setty, Christina S. Leslie

Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.

[详细]

  • PLOS Computational Biology
  • 10年前

Native Contact Density and Nonnative Hydrophobic Effects in the Folding of Bacterial Immunity Proteins

在细菌免疫蛋白质折叠的原生接触密度和外来的疏水作用

by Tao Chen, Hue Sun Chan

The bacterial colicin-immunity proteins Im7 and Im9 fold by different mechanisms. Experimentally, at pH 7.0 and 10°C, Im7 folds in a three-state manner via an intermediate but Im9 folding is two-state-like. Accordingly, Im7 exhibits a chevron rollover, whereas the chevron arm for Im9 folding is linear. Here we address the biophysical basis of their different behaviors by using native-centric models with and without additional transferrable, sequence-dependent energies. The Im7 chevron rollover is not captured by either a pure native-centric model or a model augmented by nonnative hydrophobic interactions with a uniform strength irrespective of residue type. By contrast, a more realistic nonnative interaction scheme that accounts for the difference in hydrophobicity among residues leads simultaneously to a chevron rollover for Im7 and an essentially linear folding chevron arm for Im9. Hydrophobic residues identified by published experiments to be involved in nonnative interactions during Im7 folding are found to participate in the strongest nonnative contacts in this model. Thus our observations support the experimental perspective that the Im7 folding intermediate is largely underpinned by nonnative interactions involving large hydrophobics. Our simulation suggests further that nonnative effects in Im7 are facilitated by a lower local native contact density relative to that of Im9. In a one-dimensional diffusion picture of Im7 folding with a coordinate- and stability-dependent diffusion coefficient, a significant chevron rollover is consistent with a diffusion coefficient that depends strongly on native stability at the conformational position of the folding intermediate.

[详细]

  • PLOS Computational Biology
  • 10年前