Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

因果生物网络数据库:一个全面的因果生物网络模型平台集中在肺和血管系统

With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation.

Database URL: http://causalbionet.com

[详细]

  • Database
  • 9年前
  • Original Article

CellWhere: graphical display of interaction networks organized on subcellular localizations

单元:互动网络的亚细胞定位组织图形显示

Given a query list of genes or proteins, CellWhere produces an interactive graphical display that mimics the structure of a cell, showing the local interaction network organized into subcellular locations. This user-friendly tool helps in the formulation of mechanistic hypotheses by enabling the experimental biologist to explore simultaneously two elements of functional context: (i) protein subcellular localization and (ii) protein–protein interactions or gene functional associations. Subcellular localization terms are obtained from public sources (the Gene Ontology and UniProt—together containing several thousand such terms) then mapped onto a smaller number of CellWhere localizations. These localizations include all major cell compartments, but the user may modify the mapping as desired. Protein–protein interaction listings, and their associated evidence strength scores, are obtained from the Mentha interactome server, or power-users may upload a pre-made network produced using some other interactomics tool. The Cytoscape.js JavaScript library is used in producing the graphical display. Importantly, for a protein that has been observed at multiple subcellular locations, users may prioritize the visual display of locations that are of special relevance to their research domain. CellWhere is at http://cellwhere-myology.rhcloud.com.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

QmRLFS-finder: a model, web server and stand-alone tool for prediction and analysis of R-loop forming sequences

qmrlfs仪:一个模型,对R环形成序列的预测和分析Web服务器和独立的工具

The possible formation of three-stranded RNA and DNA hybrid structures (R-loops) in thousands of functionally important guanine-rich genic and inter-genic regions could suggest their involvement in transcriptional regulation and even development of diseases. Here, we introduce the first freely available R-loop prediction program called Quantitative Model of R-loop Forming Sequence (RLFS) finder (QmRLFS-finder), which predicts RLFSs in nucleic acid sequences based on experimentally supported structural models of RLFSs. QmRLFS-finder operates via a web server or a stand-alone command line tool. This tool identifies and visualizes RLFS coordinates from any natural or artificial DNA or RNA input sequences and creates standards-compliant output files for further annotation and analysis. QmRLFS-finder demonstrates highly accurate predictions of the detected RLFSs, proposing new perspective to further discoveries in R-loop biology, biotechnology and molecular therapy. QmRLFS-finder is freely available at http://rloop.bii.a-star.edu.sg/?pg=qmrlfs-finder.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

等位基因特异的拷贝数,发现从整个基因组的全基因组测序

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

ENCoM server: exploring protein conformational space and the effect of mutations on protein function and stability

包含服务器:探索蛋白质的构象空间和突变对蛋白质功能和稳定性的影响

ENCoM is a coarse-grained normal mode analysis method recently introduced that unlike previous such methods is unique in that it accounts for the nature of amino acids. The inclusion of this layer of information was shown to improve conformational space sampling and apply for the first time a coarse-grained normal mode analysis method to predict the effect of single point mutations on protein dynamics and thermostability resulting from vibrational entropy changes. Here we present a web server that allows non-technical users to have access to ENCoM calculations to predict the effect of mutations on thermostability and dynamics as well as to generate geometrically realistic conformational ensembles. The server is accessible at: http://bcb.med.usherbrooke.ca/encom.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

I-TASSER server: new development for protein structure and function predictions

I-TASSER服务器:新开发的蛋白质结构和功能预测

The I-TASSER server (http://zhanglab.ccmb.med.umich.edu/I-TASSER) is an online resource for automated protein structure prediction and structure-based function annotation. In I-TASSER, structural templates are first recognized from the PDB using multiple threading alignment approaches. Full-length structure models are then constructed by iterative fragment assembly simulations. The functional insights are finally derived by matching the predicted structure models with known proteins in the function databases. Although the server has been widely used for various biological and biomedical investigations, numerous comments and suggestions have been reported from the user community. In this article, we summarize recent developments on the I-TASSER server, which were designed to address the requirements from the user community and to increase the accuracy of modeling predictions. Focuses have been made on the introduction of new methods for atomic-level structure refinement, local structure quality estimation and biological function annotations. We expect that these new developments will improve the quality of the I-TASSER server and further facilitate its use by the community for high-resolution structure and function prediction.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community

的reprogenomics观众:对生殖科学共同体一体化的跨物种的工具箱

We report the development of the ReproGenomics Viewer (RGV), a multi- and cross-species working environment for the visualization, mining and comparison of published omics data sets for the reproductive science community. The system currently embeds 15 published data sets related to gametogenesis from nine model organisms. Data sets have been curated and conveniently organized into broad categories including biological topics, technologies, species and publications. RGV's modular design for both organisms and genomic tools enables users to upload and compare their data with that from the data sets embedded in the system in a cross-species manner. The RGV is freely available at http://rgv.genouest.org.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Thioflavin T as a fluorescence probe for monitoring RNA metabolism at molecular and cellular levels

硫磺素T作为监测在分子和细胞水平的荧光探针RNA代谢

The intrinsically stochastic dynamics of mRNA metabolism have important consequences on gene regulation and non-genetic cell-to-cell variability; however, no generally applicable methods exist for studying such stochastic processes quantitatively. Here, we describe the use of the amyloid-binding probe Thioflavin T (ThT) for monitoring RNA metabolism in vitro and in vivo. ThT fluoresced strongly in complex with bacterial total RNA than with genomic DNA. ThT bound purine oligoribonucleotides preferentially over pyrimidine oligoribonucleotides and oligodeoxyribonucleotides. This property enabled quantitative real-time monitoring of poly(A) synthesis and phosphorolysis by polyribonucleotide phosphorylase in vitro. Cellular analyses, in combination with genetic approaches and the transcription-inhibitor rifampicin treatment, demonstrated that ThT mainly stained mRNA in actively dividing Escherichia coli cells. ThT also facilitated mRNA metabolism profiling at the single-cell level in diverse bacteria. Furthermore, ThT can also be used to visualise transitions between non-persister and persister cell states, a phenomenon of isogenic subpopulations of antibiotic-sensitive bacteria that acquire tolerance to multiple antibiotics due to stochastically induced dormant states. Collectively, these results suggest that probing mRNA dynamics with ThT is a broadly applicable approach ranging from the molecular level to the single-cell level.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters

guidance2:不可靠的取向区域多参数不确定性会计准确的检测

Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures

aggrescan3d(三维):用于蛋白质结构预测服务器聚集特性

Protein aggregation underlies an increasing number of disorders and constitutes a major bottleneck in the development of therapeutic proteins. Our present understanding on the molecular determinants of protein aggregation has crystalized in a series of predictive algorithms to identify aggregation-prone sites. A majority of these methods rely only on sequence. Therefore, they find difficulties to predict the aggregation properties of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The AGGRESCAN3D (A3D) server overcomes these limitations by taking into account the protein structure and the experimental aggregation propensity scale from the well-established AGGRESCAN method. Using the A3D server, the identified aggregation-prone residues can be virtually mutated to design variants with increased solubility, or to test the impact of pathogenic mutations. Additionally, A3D server enables to take into account the dynamic fluctuations of protein structure in solution, which may influence aggregation propensity. This is possible in A3D Dynamic Mode that exploits the CABS-flex approach for the fast simulations of flexibility of globular proteins. The A3D server can be accessed at http://biocomp.chem.uw.edu.pl/A3D/.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

webSDA: a web server to simulate macromolecular diffusional association

websda:Web服务器来模拟大分子扩散协会

Macromolecular interactions play a crucial role in biological systems. Simulation of diffusional association (SDA) is a software for carrying out Brownian dynamics simulations that can be used to study the interactions between two or more biological macromolecules. webSDA allows users to run Brownian dynamics simulations with SDA to study bimolecular association and encounter complex formation, to compute association rate constants, and to investigate macromolecular crowding using atomically detailed macromolecular structures. webSDA facilitates and automates the use of the SDA software, and offers user-friendly visualization of results. webSDA currently has three modules: ‘SDA docking’ to generate structures of the diffusional encounter complexes of two macromolecules, ‘SDA association’ to calculate bimolecular diffusional association rate constants, and ‘SDA multiple molecules’ to simulate the diffusive motion of hundreds of macromolecules. webSDA is freely available to all users and there is no login requirement. webSDA is available at http://mcm.h-its.org/webSDA/.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

JPred4: a protein secondary structure prediction server

jpred4:一个蛋白质二级结构预测服务器

JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Assessing the impact of mutations found in next generation sequencing data over human signaling pathways

评估发现,下一代测序数据的人体信号通路的基因突变的影响

Modern sequencing technologies produce increasingly detailed data on genomic variation. However, conventional methods for relating either individual variants or mutated genes to phenotypes present known limitations given the complex, multigenic nature of many diseases or traits. Here we present PATHiVar, a web-based tool that integrates genomic variation data with gene expression tissue information. PATHiVar constitutes a new generation of genomic data analysis methods that allow studying variants found in next generation sequencing experiment in the context of signaling pathways. Simple Boolean models of pathways provide detailed descriptions of the impact of mutations in cell functionality so as, recurrences in functionality failures can easily be related to diseases, even if they are produced by mutations in different genes. Patterns of changes in signal transmission circuits, often unpredictable from individual genes mutated, correspond to patterns of affected functionalities that can be related to complex traits such as disease progression, drug response, etc. PATHiVar is available at: http://pathivar.babelomics.org.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

FAF-Drugs3: a web server for compound property calculation and chemical library design

faf-drugs3:Web服务器性能的计算和化合物化学库设计

Drug attrition late in preclinical or clinical development is a serious economic problem in the field of drug discovery. These problems can be linked, in part, to the quality of the compound collections used during the hit generation stage and to the selection of compounds undergoing optimization. Here, we present FAF-Drugs3, a web server that can be used for drug discovery and chemical biology projects to help in preparing compound libraries and to assist decision-making during the hit selection/lead optimization phase. Since it was first described in 2006, FAF-Drugs has been significantly modified. The tool now applies an enhanced structure curation procedure, can filter or analyze molecules with user-defined or eight predefined physicochemical filters as well as with several simple ADMET (absorption, distribution, metabolism, excretion and toxicity) rules. In addition, compounds can be filtered using an updated list of 154 hand-curated structural alerts while Pan Assay Interference compounds (PAINS) and other, generally unwanted groups are also investigated. FAF-Drugs3 offers access to user-friendly html result pages and the possibility to download all computed data. The server requires as input an SDF file of the compounds; it is open to all users and can be accessed without registration at http://fafdrugs3.mti.univ-paris-diderot.fr.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

ChEMBL web services: streamlining access to drug discovery data and utilities

数据服务:简化访问药物发现数据和公用事业

ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Web Server issue

Telomere-associated proteins add deoxynucleotides to terminal proteins during replication of the telomeres of linear chromosomes and plasmids in Streptomyces

端粒相关蛋白添加脱氧核苷酸末端蛋白链霉菌线性染色体和质粒端粒的复制过程中

Typical telomeres of linear chromosomes and plasmids of soil bacteria Streptomyces consist of tightly packed palindromic sequences with a terminal protein (‘TP’) covalently attached to the 5' end of the DNA. Replication of these linear replicons is initiated internally and proceeds bidirectionally toward the telomeres, which leaves single-strand overhangs at the 3' ends. These overhangs are filled by DNA synthesis using the TPs as the primers (‘end patching’). The gene encoding for typical TP, tpg, forms an operon with tap, encoding an essential telomere-associated protein, which binds TP and the secondary structures formed by the 3' overhangs. Previously one of the two translesion synthesis DNA polymerases, DinB1 or DinB2, was proposed to catalyze the protein-primed synthesis. However, using an in vitro end-patching system, we discovered that Tpg and Tap alone could carry out the protein-primed synthesis to a length of 13 nt. Similarly, an ‘atypical’ terminal protein, Tpc, and its cognate telomere-associated protein, Tac, of SCP1 plasmid, were sufficient to achieve protein-primed synthesis in the absence of additional polymerase. These results indicate that these two telomere-associated proteins possess polymerase activities alone or in complex with the cognate TPs.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Genome integrity, repair and replication

DEOD: uncovering dominant effects of cancer-driver genes based on a partial covariance selection method

DEOD:揭露cancer-driver基因的显性效应基于局部协方差选择方法

Motivation: The generation of a large volume of cancer genomes has allowed us to identify disease-related alterations more accurately, which is expected to enhance our understanding regarding the mechanism of cancer development. With genomic alterations detected, one challenge is to pinpoint cancer-driver genes that cause functional abnormalities.

Results: Here, we propose a method for uncovering the dominant effects of cancer-driver genes (DEOD) based on a partial covariance selection approach. Inspired by a convex optimization technique, it estimates the dominant effects of candidate cancer-driver genes on the expression level changes of their target genes. It constructs a gene network as a directed-weighted graph by integrating DNA copy numbers, single nucleotide mutations and gene expressions from matched tumor samples, and estimates partial covariances between driver genes and their target genes. Then, a scoring function to measure the cancer-driver score for each gene is applied. To test the performance of DEOD, a novel scheme is designed for simulating conditional multivariate normal variables (targets and free genes) given a group of variables (driver genes). When we applied the DEOD method to both the simulated data and breast cancer data, DEOD successfully uncovered driver variables in the simulation data, and identified well-known oncogenes in breast cancer. In addition, two highly ranked genes by DEOD were related to survival time. The copy number amplifications of MYC (8q24.21) and TRPS1 (8q23.3) were closely related to the survival time with P-values = 0.00246 and 0.00092, respectively. The results demonstrate that DEOD can efficiently uncover cancer-driver genes.

Availability and implementation: DEOD was implemented in Matlab, and source codes and data are available at http://combio.gist.ac.kr/softwares/.

Contact: hyunjulee@gist.ac.kr

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels

PROVEAN web服务器:一个工具来预测氨基酸替换的功能效果和indels

Summary: We present a web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN. The server provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels.

Availability and implementation: The web server is freely available and open to all users with no login requirements at http://provean.jcvi.org.

Contact: achan@jcvi.org

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

do_x3dna: a tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations

do_x3dna:一个工具来分析结构的波动dsDNA或dsRNA从分子动力学模拟

Summary: The do_x3dna package has been developed to analyze the structural fluctuations of DNA or RNA during molecular dynamics simulations. It extends the capability of the 3DNA package to GROMACS MD trajectories and includes new methods to calculate the global-helical axis of DNA and bending fluctuations during simulations. The package also includes a Python module dnaMD to perform and visualize statistical analyses of complex data obtained from the trajectories.

Availability and Implementation: The source code of the do_x3dna is available at https://github.com/rjdkmr/do_x3dna under GNU GPLv3 license. A detailed documentation, including tutorials and required input data, are freely available at http://rjdkmr.github.io/do_x3dna/.

Contact: rjdkmr@gmail.com

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Global optimization-based inference of chemogenomic features from drug-target interactions

全球optimization-based推理chemogenomic功能药物的相互作用

Motivation: Gaining insight into chemogenomic drug–target interactions, such as those involving the substructures of synthetic drugs and protein domains, is important in fragment-based drug discovery and drug repositioning. Previous studies evaluated the interactions locally, thereby ignoring the competitive effects of different substructures or domains, but this could lead to high false-positive estimation, calling for a computational method that presents more predictive power.

Results: A statistical model, termed Global optimization-based InFerence of chemogenomic features from drug–Target interactions, or GIFT, is proposed herein to evaluate substructure-domain interactions globally such that all substructure-domain contributions to drug–target interaction are analyzed simultaneously. Combinations of different chemical substructures were included since they may function as one unit. When compared to previous methods, GIFT showed better interpretive performance, and performance for the recovery of drug–target interactions was good. Among 53 known drug–domain interactions, 81% were accurately predicted by GIFT. Eighteen of the top 100 predicted combined substructure-domain interactions had corresponding drug–target structures in the Protein Data Bank database, and 15 out of the 18 had been proved. GIFT was then implemented to predict substructure-domain interactions based on drug repositioning. For example, the anticancer activities of tazarotene, adapalene, acitretin and raloxifene were identified. In summary, GIFT is a global chemogenomic inference approach and offers fresh insight into drug–target interactions.

Availability and implementation: The source codes can be found at http://bioinfo.au.tsinghua.edu.cn/software/GIFT.

Contact: shaoli@mail.tsinghua.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

RiboTools: a Galaxy toolbox for qualitative ribosome profiling analysis

RiboTools:星系为核糖体定性分析分析工具箱

Motivation: Ribosome profiling provides genome-wide information about translational regulation. However, there is currently no standard tool for the qualitative analysis of Ribo-seq data. We present here RiboTools, a Galaxy toolbox for the analysis of ribosome profiling (Ribo-seq) data. It can be used to detect translational ambiguities, stop codon readthrough events and codon occupancy. It provides a large number of plots for the visualisation of these events.

Availability and implementation: RiboTools is available from https://testtoolshed.g2.bx.psu.edu/view/rlegendre/ribo_tools as part of the Galaxy Project, under the GPLv3 licence. It is written in python2.7 and uses standard python libraries, such as matplotlib and numpy.

Contact: olivier.namy@igmors.u-psud.fr

Supplementary Information: Supplementary data are available from Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

DISSCO: direct imputation of summary statistics allowing covariates

DISSCO:直接归责的摘要统计信息允许校正

Background: Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates.

Methods: We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO).

Results: We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9–15.2% for variants with minor allele frequency <5%.

Availability and implementation: http://www.unc.edu/~yunmli/DISSCO.

Contact: yunli@med.unc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

No Evidence that MicroRNAs Coevolve with Genes Located in Copy Number Regions

没有证据表明,miRNA进化位于区域基因拷贝数

MicroRNAs (miRNAs) are a widespread class of regulatory noncoding RNAs with key roles in physiology and development, conferring robustness to noise in regulatory networks. Consistent with this buffering function, it was recently suggested that human miRNAs coevolve with genes in copy number regions (copy number variation [CNV] genes) to reduce dosage imbalance. Here, I compare miRNA regulation between CNV and non-CNV genes in four model organisms. miRNA regulation of CNV genes is elevated in human and fly but reduced in nematode and zebrafish. By analyzing 31 human CNV data sets, careful analysis of human and chimpanzee orthologs, resampling genes within species and comparing structural variant types, I show that the apparent coevolution between CNV genes and miRNAs is due to the strong dependency between 3'-untranslated region length and miRNA target prediction. Deciphering the interplay between CNVs and miRNAs will likely require a deeper understanding of how miRNAs are embedded in regulatory circuits.

[详细]

  • Molecular Biology and Evolution
  • 9年前
  • Letter

Halvade: scalable sequence analysis with MapReduce

MapReduce Halvade:可伸缩的序列分析

Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine.

Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50x coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.

Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: jan.fostier@intec.ugent.be

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER