Genetic testing: Clinical whole-genome sequencing

基因检测:临床全基因组测序

Taylor et al. consider the utility of whole-genome sequencing for diagnosis of genetic disorders in routine clinical practice as part of the WGS500 project to sequence the whole genomes of 500 patients. The authors examine whole-genome sequencing data from 217 individuals (including 156 independent

[详细]

  • Nature Reviews Genetics 16, 377 (2015)
  • 10年前
  • Research Highlight

Pathogen genetics: Rapid typing of S. Enteritidis clinical isolates

病原菌的遗传学:快速肠炎沙门菌临床分型菌株

Quick et al. demonstrate the benefits of rapidly available and accurate prospective typing results during the course of an outbreak of Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis) in Birmingham, United Kingdom, in June 2014. The authors obtained 43

[详细]

  • Nature Reviews Genetics 16, 377 (2015)
  • 10年前
  • Research Highlight

Marine microbiology: Deep sequencing of the global oceans

海洋微生物学:全球海洋的深度测序

Microorganisms have central roles in biogeochemical processes in marine environments, but our understanding of the composition of these communities and the ecological factors that determine community structure on a global scale is limited. Now, 5 studies report the initial findings of the international Tara Oceans

[详细]

  • Nature Reviews Genetics 16, 378 (2015)
  • 10年前
  • Research Highlight

Genome-Wide Association Study Identifies Novel Pharmacogenomic Loci For Therapeutic Response to Montelukast in Asthma

全基因组关联研究确定孟鲁司特的哮喘治疗反应的新基因位点

by Amber Dahlin, Augusto Litonjua, John J. Lima, Mayumi Tamari, Michiaki Kubo, Charles G. Irvin, Stephen P. Peters, Kelan G. Tantisira

Background

Genome-wide association study (GWAS) is a powerful tool to identify novel pharmacogenetic single nucleotide polymorphisms (SNPs). Leukotriene receptor antagonists (LTRAs) are a major class of asthma medications, and genetic factors contribute to variable responses to these drugs. We used GWAS to identify novel SNPs associated with the response to the LTRA, montelukast, in asthmatics.

Methods

Using genome-wide genotype and phenotypic data available from American Lung Association - Asthma Clinical Research Center (ALA-ACRC) cohorts, we evaluated 8-week change in FEV1 related to montelukast administration in a discovery population of 133 asthmatics. The top 200 SNPs from the discovery GWAS were then tested in 184 additional samples from two independent cohorts.

Results

Twenty-eight SNP associations from the discovery GWAS were replicated. Of these, rs6475448 achieved genome-wide significance (combined P = 1.97 x 10-09), and subjects from all four studies who were homozygous for rs6475448 showed increased ΔFEV1 from baseline in response to montelukast.

Conclusions

Through GWAS, we identified a novel pharmacogenomic locus related to improved montelukast response in asthmatics.

[详细]

  • PloS one
  • 10年前

Genomic Comparison of Non-Typhoidal Salmonella enterica Serovars Typhimurium, Enteritidis, Heidelberg, Hadar and Kentucky Isolates from Broiler Chickens

非伤寒沙门氏菌血清型鼠伤寒沙门氏菌,沙门氏菌,海德堡基因组比较,哈达和肯塔基肉鸡分离

by Akhilesh S. Dhanani, Glenn Block, Ken Dewar, Vincenzo Forgetta, Edward Topp, Robert G. Beiko, Moussa S. Diarra

Background

Non-typhoidal Salmonella enterica serovars, associated with different foods including poultry products, are important causes of bacterial gastroenteritis worldwide. The colonization of the chicken gut by S. enterica could result in the contamination of the environment and food chain. The aim of this study was to compare the genomes of 25 S. enterica serovars isolated from broiler chicken farms to assess their intra- and inter-genetic variability, with a focus on virulence and antibiotic resistance characteristics.

Methodology/Principal Finding

The genomes of 25 S. enterica isolates covering five serovars (ten Typhimurium including three monophasic 4,[5],12:i:, four Enteritidis, three Hadar, four Heidelberg and four Kentucky) were sequenced. Most serovars were clustered in strongly supported phylogenetic clades, except for isolates of serovar Enteritidis that were scattered throughout the tree. Plasmids of varying sizes were detected in several isolates independently of serovars. Genes associated with the IncF plasmid and the IncI1 plasmid were identified in twelve and four isolates, respectively, while genes associated with the IncQ plasmid were found in one isolate. The presence of numerous genes associated with Salmonella pathogenicity islands (SPIs) was also confirmed. Components of the type III and IV secretion systems (T3SS and T4SS) varied in different isolates, which could explain in part, differences of their pathogenicity in humans and/or persistence in broilers. Conserved clusters of genes in the T3SS were detected that could be used in designing effective strategies (diagnostic, vaccination or treatments) to combat Salmonella. Antibiotic resistance genes (CMY, aadA, ampC, florR, sul1, sulI, tetAB, and srtA) and class I integrons were detected in resistant isolates while all isolates carried multidrug efflux pump systems regardless of their antibiotic susceptibility profile.

Conclusions/Significance

This study showed that the predominant Salmonella serovars in broiler chickens harbor genes encoding adhesins, flagellar proteins, T3SS, iron acquisition systems, and antibiotic and metal resistance genes that may explain their pathogenicity, colonization ability and persistence in chicken. The existence of mobile genetic elements indicates that isolates from a given serovar could acquire and transfer genetic material. Conserved genes in the T3SS and T4SS that we have identified are promising candidates for identification of diagnostic, antimicrobial or vaccine targets for the control of Salmonella in broiler chickens.

[详细]

  • PloS one
  • 10年前

Genome-Wide Specific Selection in Three Domestic Sheep Breeds

全基因组选择三个绵羊品种

by Huihua Wang, Li Zhang, Jiaxve Cao, Mingming Wu, Xiaomeng Ma, Zhen Liu, Ruizao Liu, Fuping Zhao, Caihong Wei, Lixin Du

Background

Commercial sheep raised for mutton grow faster than traditional Chinese sheep breeds. Here, we aimed to evaluate genetic selection among three different types of sheep breed: two well-known commercial mutton breeds and one indigenous Chinese breed.

Results

We first combined locus-specific branch lengths and di statistical methods to detect candidate regions targeted by selection in the three different populations. The results showed that the genetic distances reached at least medium divergence for each pairwise combination. We found these two methods were highly correlated, and identified many growth-related candidate genes undergoing artificial selection. For production traits, APOBR and FTO are associated with body mass index. For meat traits, ALDOA, STK32B and FAM190A are related to marbling. For reproduction traits, CCNB2 and SLC8A3 affect oocyte development. We also found two well-known genes, GHR (which affects meat production and quality) and EDAR (associated with hair thickness) were associated with German mutton merino sheep. Furthermore, four genes (POL, RPL7, MSL1 and SHISA9) were associated with pre-weaning gain in our previous genome-wide association study.

Conclusions

Our results indicated that combine locus-specific branch lengths and di statistical approaches can reduce the searching ranges for specific selection. And we got many credible candidate genes which not only confirm the results of previous reports, but also provide a suite of novel candidate genes in defined breeds to guide hybridization breeding.

[详细]

  • PloS one
  • 10年前

Deciphering Signaling Pathway Networks to Understand the Molecular Mechanisms of Metformin Action

破译信号通路网络了解二甲双胍作用的分子机制

by Jingchun Sun, Min Zhao, Peilin Jia, Lily Wang, Yonghui Wu, Carissa Iverson, Yubo Zhou, Erica Bowton, Dan M. Roden, Joshua C. Denny, Melinda C. Aldrich, Hua Xu, Zhongming Zhao

A drug exerts its effects typically through a signal transduction cascade, which is non-linear and involves intertwined networks of multiple signaling pathways. Construction of such a signaling pathway network (SPNetwork) can enable identification of novel drug targets and deep understanding of drug action. However, it is challenging to synopsize critical components of these interwoven pathways into one network. To tackle this issue, we developed a novel computational framework, the Drug-specific Signaling Pathway Network (DSPathNet). The DSPathNet amalgamates the prior drug knowledge and drug-induced gene expression via random walk algorithms. Using the drug metformin, we illustrated this framework and obtained one metformin-specific SPNetwork containing 477 nodes and 1,366 edges. To evaluate this network, we performed the gene set enrichment analysis using the disease genes of type 2 diabetes (T2D) and cancer, one T2D genome-wide association study (GWAS) dataset, three cancer GWAS datasets, and one GWAS dataset of cancer patients with T2D on metformin. The results showed that the metformin network was significantly enriched with disease genes for both T2D and cancer, and that the network also included genes that may be associated with metformin-associated cancer survival. Furthermore, from the metformin SPNetwork and common genes to T2D and cancer, we generated a subnetwork to highlight the molecule crosstalk between T2D and cancer. The follow-up network analyses and literature mining revealed that seven genes (CDKN1A, ESR1, MAX, MYC, PPARGC1A, SP1, and STK11) and one novel MYC-centered pathway with CDKN1A, SP1, and STK11 might play important roles in metformin’s antidiabetic and anticancer effects. Some results are supported by previous studies. In summary, our study 1) develops a novel framework to construct drug-specific signal transduction networks; 2) provides insights into the molecular mode of metformin; 3) serves a model for exploring signaling pathways to facilitate understanding of drug action, disease pathogenesis, and identification of drug targets.

[详细]

  • PLOS Computational Biology
  • 10年前

Automated High-Throughput Characterization of Single Neurons by Means of Simplified Spiking Models

自动化高通量表征单个神经元的扣球模型简化

by Christian Pozzorini, Skander Mensi, Olivier Hagens, Richard Naud, Christof Koch, Wulfram Gerstner

Single-neuron models are useful not only for studying the emergent properties of neural circuits in large-scale simulations, but also for extracting and summarizing in a principled way the information contained in electrophysiological recordings. Here we demonstrate that, using a convex optimization procedure we previously introduced, a Generalized Integrate-and-Fire model can be accurately fitted with a limited amount of data. The model is capable of predicting both the spiking activity and the subthreshold dynamics of different cell types, and can be used for online characterization of neuronal properties. A protocol is proposed that, combined with emergent technologies for automatic patch-clamp recordings, permits automated, in vitro high-throughput characterization of single neurons.

[详细]

  • PLOS Computational Biology
  • 10年前

Whole-genome cartography of p53 response elements ranked on transactivation potential

p53反应元件的全基因组转录潜力制图排名

Background: Many recent studies using ChIP-seq approaches cross-referenced to trascriptome data and also to potentially unbiased in vitro DNA binding selection experiments are detailing with increasing precision the p53-directed gene regulatory network that, nevertheless, is still expanding. However, most experiments have been conducted in established cell lines subjected to specific p53-inducing stimuli, both factors potentially biasing the results. Results: We developed p53retriever, a pattern search algorithm that maps p53 response elements (REs) and ranks them according to predicted transactivation potentials in five classes. Besides canonical, full site REs, we developed specific pattern searches for non-canonical half sites and 3/4 sites and show that they can mediate p53-dependent responsiveness of associated coding sequences. Using ENCODE data, we also mapped p53 REs in about 44,000 distant enhancers and identified a 16-fold enrichment for high activity REs within those sites in the comparison with genomic regions near transcriptional start sites (TSS). Predictions from our pattern search were cross-referenced to ChIP-seq, ChIP-exo, expression, and various literature data sources. Based on the mapping of predicted functional REs near TSS, we examined expression changes of thirteen genes as a function of different p53-inducing conditions, providing further evidence for PDE2A, GAS6, E2F7, APOBEC3H, KCTD1, TRIM32, DICER, HRAS, KITLG and TGFA p53-dependent regulation, while MAP2K3, DNAJA1 and potentially YAP1 were identified as new direct p53 target genes. Conclusions: We provide a comprehensive annotation of canonical and non-canonical p53 REs in the human genome, ranked on predicted transactivation potential. We also establish or corroborate direct p53 transcriptional control of thirteen genes. The entire list of identified and functionally classified p53 REs near all UCSC-annotated genes and within ENCODE mapped enhancer elements is provided. Our approach is distinct from, and complementary to, existing methods designed to identify p53 response elements. p53retriever is available as an R package at: http://tomateba.github.io/p53retriever.

[详细]

  • BMC Genomics 2015, null:464
  • 10年前

From ERα66 to ERα36: a generic method for validating a prognosis marker of breast tumor progression

从二

Background: Estrogen receptor alpha36 (ERalpha36), a variant of estrogen receptor alpha (ER) is expressed in about half of breast tumors, independently of the [ER+]/[ER-] status. In vitro, ERalpha36 triggers mitogenic non-genomic signaling and migration ability in response to 17beta-estradiol and tamoxifen. In vivo, highly ERalpha36 expressing tumors are of poor outcome especially as [ER+] tumors are submitted to tamoxifen treatment which, in turn, enhances ERalpha36 expression. Results: Our study aimed to validate ERalpha36 expression as a reliable prognostic factor for cancer progression from an estrogen dependent proliferative tumor toward an estrogen dispensable metastatic disease. In a retrospective study, we tried to decipher underlying mechanisms of cancer progression by using an original modeling of the relationships between ERalpha36, other estrogen and growth factor receptors and metastatic marker expression. Nonlinear correlation analyses and mutual information computations led to characterize a complex network connecting ERalpha36 to either non-genomic estrogen signaling or to metastatic process. Conclusions: This study identifies ERalpha36 expression level as a relevant classifier which should be taken into account for breast tumors clinical characterization and [ER+] tumor treatment orientation, using a generic approach for the rapid, cheap and relevant evaluation of any candidate gene expression as a predictor of a complex biological process.

[详细]

  • BMC Systems Biology 2015, null:28
  • 10年前

Comparative transcriptome profiling of Pyropia yezoensis (Ueda) M.S. Hwang & H.G. Choi in response to temperature stresses

比较转录组分析<它> pyropia条斑紫菜< / >(田)硕士黄

Background: Pyropia yezoensis is a model organism often used to investigate the mechanisms underlying stress tolerance in intertidal zones. The digital gene expression (DGE) approach was used to characterize a genome-wide comparative analysis of differentially expressed genes (DEGs) that influence the physiological, developmental or biochemical processes in samples subjected to 4 treatments: high-temperature stress (HT), chilling stress (CS), freezing stress (FS) and normal temperature (NT). Results: Equal amounts of total RNAs collected from 8 samples (two biological replicates per treatment) were sequenced using the Illumina/Solexa platform. Compared with NT, a total of 2202, 1334 and 592 differentially expressed unigenes were detected in HT, CS and FS respectively. Clustering analysis suggested P. yezoensis acclimates to low and high-temperature stress condition using different mechanisms: In heat stress, the unigenes related to replication and repair of DNA and protein processing in endoplasmic reticulum were active; however at low temperature stresses, unigenes related to carbohydrate metabolism and energy metabolism were active. Analysis of gene differential expression showed that four categories of DEGs functioning as temperature sensors were found, including heat shock proteins, H2A, histone deacetylase complex and transcription factors. Heat stress caused chloroplast genes down-regulated and unigenes encoding metacaspases up-regulated, which is an important regulator of PCD. Cold stress caused an increase in the expression of FAD to improve the proportion of polyunsaturated fatty acids. An up-regulated unigene encoding farnesyl pyrophosphate synthase was found in cold stress, indicating that the plant hormone ABA also played an important role in responding to temperature stress in P. yezoensis. Conclusion: The variation of amount of unigenes and different gene expression pattern under different temperature stresses indicated the complicated and diverse regulation mechanism in response to temperature stress in P. yezoensis. Several common metabolism pathways were found both in P. yezoensis and in higher plants, such as FAD in low-temperature stress and HSP in heat stress. Meanwhile, many chloroplast genes and unigene related to the synthesis of abscisic acid were detected, revealing its unique temperature-regulation mechanism in this intertidal species. This sequencing dataset and analysis may serve as a valuable resource to study the mechanisms involved in abiotic stress tolerance in intertidal seaweeds.

[详细]

  • BMC Genomics 2015, null:463
  • 10年前

AmyLoad - website dedicated to amyloidogenic protein fragments

amyload -网站致力于淀粉样蛋白片段

Summary: Analyses of amyloidogenic sequence fragments are essential in studies of neurodegenerative diseases. However, there is no one internet dataset that collects all the sequences that have been investigated for their amyloidogenicity. Therefore, we have created the AmyLoad website which collects the amyloidogenic sequences from all major sources. The website allows for filtration of the fragments and provides detailed information about each of them. Registered users can both personalize their work with the website and submit their own sequences into the database. To maintain database reliability, submitted sequences are reviewed before making them available to the public. Finally, we re-implemented several amyloidogenic sequence predictors, thus the AmyLoad website can be used as a sequence analysis tool. We encourage researchers working on amyloid proteins to contribute to our service.

Availability and implementation: The AmyLoad website is freely available at http://comprec-lin.iiar.pwr.edu.pl/amyload/.

Contact: malgorzata.kotulska@pwr.edu.pl

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map

rchemcpp:一种数据结构模拟Web服务,DrugBank和连接图

Summary: We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. In order to considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side-effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies.

Availability: The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor.

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

libRoadRunner: A High Performance SBML Simulation and Analysis Library

libroadrunner:高性能模拟与分析库用

Motivation: This paper presents libRoadRunner, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using Systems Biology Markup Language (SBML). SBML is the most widely used standard for representing dynamic networks, especially biochemical networks. libRoadRunner is fast enough to support large-scale problems such as tissue models, studies that require large numbers of repeated runs and interactive simulations.

Results: libRoadRunner is a self-contained library, able to run both as a component inside other tools via its C++ and C bindings, and interactively through its Python interface. Its Python Application Programming Interface (API) is similar to the APIs of MATLAB (www.mathworks.com) and SciPy (http://www.scipy.org/), making it fast and easy to learn. libRoadRunner uses a custom Just-In-Time (JIT) compiler built on the widely-used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a variety of processors, making it appropriate for solving extremely large models or repeated runs. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and nonlinear algebraic equations) including several SBML extensions (composition and distributions). It offers multiple deterministic and stochastic integrators, as well as tools for steady-state analysis, stability analysis and structural analysis of the stoichiometric matrix.

Availability and Implementation: libRoadRunner binary distributions are available for Mac OS X, Linux and Windows. The library is licensed under Apache License Version 2.0. libRoadRunner is also available for ARM based computers such as the Raspberry Pi. http://www.libroadrunner.org provides online documentation, full build instructions, binaries and a git source repository.

Contacts: hsauro@u.washington.edu, somogyie@indiana.edu

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Fast-SL: An efficient algorithm to identify synthetic lethal sets in metabolic networks

SL:快速识别代谢网络合成致死集的高效算法

Motivation: Synthetic lethal sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. Previous approaches to identify synthetic lethal genes in genome-scale metabolic networks have built on the framework of Flux Balance Analysis (FBA), extending it either to exhaustively analyse all possible combinations of genes or formulate the problem as a bi-level Mixed Integer Linear Programming (MILP) problem. We here propose an algorithm, Fast-SL, which surmounts the computational complexity of previous approaches by iteratively reducing the search space for synthetic lethals, resulting in a substantial reduction in running time, even for higher order synthetic lethals.

Results: We performed synthetic reaction and gene lethality analysis, using Fast-SL, for genome-scale metabolic networks of Escherichia coli, Salmonella enterica Typhimurium and Mycobacterium tuberculosis. Fast-SL also rigorously identifies synthetic lethal gene deletions, uncovering synthetic lethal triplets that were not reported previously. We confirm that the triple lethal gene sets obtained for the three organisms have a precise match with the results obtained through exhaustive enumeration of lethals performed on a computer cluster.We also parallelised our algorithm, enabling the identification of synthetic lethal gene quadruplets for all three organisms in under six hours. Overall, Fast-SL enables an efficient enumeration of higher order synthetic lethals in metabolic networks, which may help uncover previously unknown genetic interactions and combinatorial drug targets.

Availability: The MATLAB implementation of the algorithm, compatible with COBRA toolbox v2.0 is available at https://github.com/RamanLab/FastSL

Contact: kraman@iitm.ac.in

Supplementary Information: Supplementary data are available online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

Massively parallel quantification of the regulatory effects of non-coding genetic variation in a human cohort [METHOD]

对非编码在人类世代[方法]遗传变异的调节作用的大规模并行化

We report a novel high-throughput method to empirically quantify individual-specific regulatory element activity at the population scale. The approach combines targeted DNA capture with a high-throughput reporter gene expression assay. As demonstration, we measured the activity of more than 100 putative regulatory elements from 95 individuals in a single experiment. We found that, in agreement with previous reports, most genetic variants have weak effects on distal regulatory element activity. Because haplotypes are typically maintained within but not between assayed regulatory elements, the approach can be used to identify causal regulatory haplotypes that likely contribute to human phenotypes. Finally, we demonstrate the utility of the method to functionally fine map causal regulatory variants in regions of high linkage disequilibrium identified by expression quantitative trait loci (eQTL) analyses.

[详细]

  • Genome Research
  • 10年前
  • METHOD

Control of mammalian gene expression by selective mRNA export

通过选择性mRNA输出控制哺乳动物基因表达

Nuclear export of mRNAs is a crucial step in the regulation of gene expression, linking transcription in the nucleus to translation in the cytoplasm. Although important components of the mRNA export machinery are well characterized, such as transcription-export complexes TREX and TREX-2, recent work has

[详细]

  • Nature Reviews Molecular Cell Biology 16, 431 (2015)
  • 10年前
  • Review

Posterior Probability Matching and Human Perceptual Decision Making

后验概率匹配和人的感性决策

by Richard F. Murray, Khushbu Patel, Alan Yee

Probability matching is a classic theory of decision making that was first developed in models of cognition. Posterior probability matching, a variant in which observers match their response probabilities to the posterior probability of each response being correct, is being used increasingly often in models of perception. However, little is known about whether posterior probability matching is consistent with the vast literature on vision and hearing that has developed within signal detection theory. Here we test posterior probability matching models using two tools from detection theory. First, we examine the models’ performance in a two-pass experiment, where each block of trials is presented twice, and we measure the proportion of times that the model gives the same response twice to repeated stimuli. We show that at low performance levels, posterior probability matching models give highly inconsistent responses across repeated presentations of identical trials. We find that practised human observers are more consistent across repeated trials than these models predict, and we find some evidence that less practised observers more consistent as well. Second, we compare the performance of posterior probability matching models on a discrimination task to the performance of a theoretical ideal observer that achieves the best possible performance. We find that posterior probability matching is very inefficient at low-to-moderate performance levels, and that human observers can be more efficient than is ever possible according to posterior probability matching models. These findings support classic signal detection models, and rule out a broad class of posterior probability matching models for expert performance on perceptual tasks that range in complexity from contrast discrimination to symmetry detection. However, our findings leave open the possibility that inexperienced observers may show posterior probability matching behaviour, and our methods provide new tools for testing for such a strategy.

[详细]

  • PLOS Computational Biology
  • 10年前

Interlocus gene conversion explains at least 2.7 % of single nucleotide variants in human segmental duplications

基因转换的解释至少2.7间

Background: Interlocus gene conversion (IGC) is a recombination-based mechanism that results in the unidirectional transfer of short stretches of sequence between paralogous loci. Although IGC is a well-established mechanism of human disease, the extent to which this mutagenic process has shaped overall patterns of segregating variation in multi-copy regions of the human genome remains unknown. One expected manifestation of IGC in population genomic data is the presence of one-to-one paralogous SNPs that segregate identical alleles. Results: Here, I use SNP genotype calls from the low-coverage phase 3 release of the 1000 Genomes Project to identify 15,790 parallel, shared SNPs in duplicated regions of the human genome. My approach for identifying these sites accounts for the potential redundancy of short read mapping in multi-copy genomic regions, thereby effectively eliminating false positive SNP calls arising from paralogous sequence variation. I demonstrate that independent mutation events to identical nucleotides at paralogous sites are not a significant source of shared polymorphisms in the human genome, consistent with the interpretation that these sites are the outcome of historical IGC events. These putative signals of IGC are enriched in genomic contexts previously associated with non-allelic homologous recombination, including clear signals in gene families that form tandem intra-chromosomal clusters. Conclusions: Taken together, my analyses implicate IGC, not point mutation, as the mechanism generating at least 2.7 % of single nucleotide variants in duplicated regions of the human genome.

[详细]

  • BMC Genomics 2015, null:456
  • 10年前

piRNA-like small RNAs mark extended 3’UTRs present in germ and somatic cells

像小分子RNA标记扩展3类

Background: Piwi-interacting RNAs (piRNAs) are a class of small RNAs; distinct types of piRNAs are expressed in the mammalian testis at different stages of development. The function of piRNAs expressed in the adult testis is not well established. We conducted a detailed characterization of piRNAs aligning at or near the 3’ UTRs of protein-coding genes in a deep dataset of small RNAs from adult mouse testis. Results: We identified 2710 piRNA clusters associated with 3’ UTRs, including 1600 that overlapped genes not previously associated with piRNAs. 35 % of the clusters extend beyond the annotated transcript; we find that these clusters correspond to, and are likely derived from, novel polyadenylated mRNA isoforms that contain previously unannotated extended 3’UTRs. Extended 3’ UTRs, and small RNAs derived from them, are also present in somatic tissues; a subset of these somatic 3’UTR small RNA clusters are absent in mice lacking MIWI2, indicating a role for MIWI2 in the metabolism of somatic small RNAs. Conclusions: The finding that piRNAs are processed from extended 3’ UTRs suggests a role for piRNAs in the remodeling of 3’ UTRs. The presence of both clusters and extended 3’UTRs in somatic cells, with evidence for involvement of MIWI2, indicates that this pathway is more broadly distributed than currently appreciated.

[详细]

  • BMC Genomics 2015, null:462
  • 10年前

Comparative mitogenomic analysis of the superfamily Pentatomoidea (Insecta: Hemiptera: Heteroptera) and phylogenetic implications

家族的蝽总科比较mitogenomic分析(昆虫纲半翅目:异翅亚目)和系统发育的影响

Background: Insect mitochondrial genomes (mitogenomes) are the most extensively used genetic marker for evolutionary and population genetics studies of insects. The Pentatomoidea superfamily is economically important and the largest superfamily within Pentatomomorpha with over 7,000 species. To better understand the diversity and evolution of pentatomoid species, we sequenced and annotated the mitogenomes of Eurydema gebleri and Rubiconia intermedia, and present the first comparative analysis of the 11 pentatomoid mitogenomes that have been sequenced to date. Results: We obtained the complete mitogenome of Eurydema gebleri (16,005 bp) and a nearly complete mitogenome of Rubiconia intermedia (14,967 bp). Our results show that gene content, gene arrangement, base composition, codon usage, and mitochondrial transcription termination factor sequences are highly conserved in pentatomoid species, especially for species in the same family. Evolutionary rate analyses of protein-coding genes reveal that the highest and lowest rates are found in atp8 and cox1 and distinctive evolutionary patterns are significantly correlated with the G + C content of genes. We inferred the secondary structures for two rRNA genes for eleven pentatomoid species, and identify some conserved motifs of RNA structures in Pentatomidea. All tRNA genes in pentatomoid mitogenomes have a canonical cloverleaf secondary structure, except for two tRNAs (trnS1 and trnV) which appear to lack the dihydrouridine arm. Regions that are A + T-rich have several distinct characteristics (e.g. size variation and abundant tandem repeats), and have potential as species or population level molecular markers. Phylogenetic analyses based on mitogenomic data strongly support the monophyly of Pentatomoidea, and the estimated phylogenetic relationships are: (Urostylididae + (Plataspidae + (Pentatomidae + (Cydnidae + (Dinidoridae + Tessaratomidae))))). Conclusions: This comparative mitogenomic analysis sheds light on the architecture and evolution of mitogenomes in the superfamily Pentatomoidea. Mitogenomes can be effectively used to resolve phylogenetic relationships of pentatomomorphan insects at various taxonomic levels. Sequencing more mitogenomes at various taxonomic levels, particularly from closely related species, will improve the annotation accuracy of mitochondrial genes, as well as greatly enhance our understanding of mitogenomic evolution and phylogenetic relationships in pentatomoids.

[详细]

  • BMC Genomics 2015, null:460
  • 10年前

Genome-wide transcriptomic analysis of a superior biomass-degrading strain of A. fumigatus revealed active lignocellulose-degrading genes

一个优越的生物降解菌株<它> <它>显示烟曲霉活性木质纤维素降解基因的全基因组的转录组分析

Background: Various saprotrophic microorganisms, especially filamentous fungi, can efficiently degrade lignocellulose that is one of the most abundant natural materials on earth. It consists of complex carbohydrates and aromatic polymers found in the plant cell wall and thus in plant debris. Aspergillus fumigatus Z5 was isolated from compost heaps and showed highly efficient plant biomass-degradation capability. Results: The 29-million base-pair genome of Z5 was sequenced and 9540 protein-coding genes were predicted and annotated. Genome analysis revealed an impressive array of genes encoding cellulases, hemicellulases and pectinases involved in lignocellulosic biomass degradation. Transcriptional responses of A. fumigatus Z5 induced by sucrose, oat spelt xylan, Avicel PH-101 and rice straw were compared. There were 444, 1711 and 1386 significantly differently expressed genes in xylan, cellulose and rice straw, respectively, when compared to sucrose as a control condition. Conclusions: Combined analysis of the genomic and transcriptomic data provides a comprehensive understanding of the responding mechanisms to the most abundant natural polysaccharides in A. fumigatus. This study provides a basis for further analysis of genes shown to be highly induced in the presence of polysaccharide substrates and also the information which could prove useful for biomass degradation and heterologous protein expression.

[详细]

  • BMC Genomics 2015, null:459
  • 10年前

Large-scale identification of encystment-related proteins and genes in Pseudourostyla cristata

包囊相关蛋白和基因在冠突伪尾柱虫的大规模鉴定

The transformation of a ciliate into cyst is an advance strategy against an adverse situation. However, the molecular mechanism for the encystation of free-living ciliates is poorly understood. A large-scale identification of the encystment-related proteins and genes in ciliate would provide us with deeper insights into the molecular mechanisms for the encystations of ciliate. We identified the encystment-related proteins and genes in Pseudourostyla cristata with shotgun LC-MS/MS and scale qRT-PCR, respectively, in this report. A total of 668 proteins were detected in the resting cysts, 102 of these proteins were high credible proteins, whereas 88 high credible proteins of the 724 total proteins were found in the vegetative cells. Compared with the vegetative cell, 6 specific proteins were found in the resting cyst. However, the majority of high credible proteins in the resting cyst and the vegetative cell were co-expressed. We compared 47 genes of the co-expressed proteins with known functions in both the cyst and the vegetative cell using scale qRT-PCR. Twenty-seven of 47 genes were differentially expressed in the cyst compared with the vegetative cell. In our identifications, many uncharacterized proteins were also found. These results will help reveal the molecular mechanism for the formation of cyst in ciliates.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article