Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data

变化的邻:用于分析大型B细胞免疫球蛋白谱测序数据的工具

Summary: Advances in high-throughput sequencing technologies now allow for large-scale characterization of B cell immunoglobulin (Ig) repertoires. The high germline and somatic diversity of the Ig repertoire presents challenges for biologically meaningful analysis, which requires specialized computational methods. We have developed a suite of utilities, Change-O, which provides tools for advanced analyses of large-scale Ig repertoire sequencing data. Change-O includes tools for determining the complete set of Ig variable region gene segment alleles carried by an individual (including novel alleles), partitioning of Ig sequences into clonal populations, creating lineage trees, inferring somatic hypermutation targeting models, measuring repertoire diversity, quantifying selection pressure, and calculating sequence chemical properties. All Change-O tools utilize a common data format, which enables the seamless integration of multiple analyses into a single workflow.

Availability and Implementation: Change-O is freely available for non-commercial use and may be downloaded from http://clip.med.yale.edu/changeo.

Contact: steven.kleinstein@yale.edu

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

Identification of a critical determinant that enables efficient fatty acid synthesis in oleaginous fungi

一个关键因素,使有效的脂肪酸合成在产油真菌的鉴定

Microorganisms are valuable resources for lipid production. What makes one microbe but not the other able to efficiently synthesize and accumulate lipids is poorly understood. In the present study, global gene expression prior to and after the onset of lipogenesis was determined by transcriptomics using the oleaginous fungus Mortierella alpina as a model system. A core of 23 lipogenesis associated genes was identified and their expression patterns shared a high similarity among oleaginous microbes Chlamydomonas reinhardtii, Mucor circinelloides and Rhizopus oryzae but was dissimilar to the non-oleaginous Aspergillus nidulans. Unexpectedly, Glucose-6-phosphate dehydrogenase (G6PD) and 6-phosphogluconate dehydrogenase (PGD) in the pentose phosphate pathway (PPP) were found to be the NADPH producers responding to lipogenesis in the oleaginous microbes. Their role in lipogenesis was confirmed by a knockdown experiment. Our results demonstrate, for the first time, that the PPP plays a significant role during fungal lipogenesis. Up-regulation of NADPH production by the PPP, especially G6PD, may be one of the critical determinants that enables efficiently fatty acid synthesis in oleaginous microbes.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

A Parallel and Sensitive Software Tool for Methylation Analysis on Multicore Platforms

基于多核平台的甲基化分析并行敏感的软件工具

Motivation: DNA methylation analysis suffers from very long processing time, since the advent of Next-Generation Sequencers (NGS) has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. Since it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed.

Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulfite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows-Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPGMethyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulfite reads.

Availability: Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password "anonymous").

Contact: Juan.Orduna@uv.es

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

In situ characterizing membrane lipid phenotype of breast cancer cells using mass spectrometry profiling

原位表征用质谱分析乳腺癌细胞膜脂质表型

Lipid composition in cell membrane is closely associated with cell characteristics. Here, matrix-assisted laser desorption/ionization- Fourier transform ion cyclotron resonance mass spectrometry was employed to in situ determine membrane components of human mammary epithelial cells (MCF-10 A) and six different breast cancer cell lines (i.e., BT-20, MCF-7, SK-BR-3, MDA-MB-231, MDA-MB-157, and MDA-MB-361) without any lipid extraction and separation. Partial least-square discriminant analysis indicated that changes in the levels of these membrane lipids were closely correlated with the types of breast cell lines. Elevated levels of polyunsaturated lipids in MCF-10 A cells relative to six breast cancer cells and in BT-20 cells relative to other breast cancer cell lines were detected. The Western blotting assays indicated that the expression of five lipogenesis-related enzymes (i.e., fatty acid synthase 1(FASN1), stearoyl-CoA desaturase 1 (SCD1), stearoyl-CoA desaturase 5 (SCD5), choline kinase α (CKα), and sphingomyelin synthase 1) was associated with the types of the breast cells, and that the SCD1 level in MCF-7 cells was significantly increased relative to other breast cell lines. Our findings suggest that elevated expression levels of FASN1, SCD1, SCD5, and CKα may closely correlated with enhanced levels of saturated and monounsaturated lipids in breast cancer cell lines.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

Population genetic structure of Oryza sativa in East and Southeast Asia and the discovery of elite alleles for grain traits

水稻在东南洋和籽粒性状优良等位基因的发现种群遗传结构

We investigated the nuclear simple sequence repeat (SSR) genotypes of 532 rice (Oryza sativa L.) accessions collected from East and Southeast Asia and detected abundant genetic diversity within the population. We identified 6 subpopulations and found a tendency towards directional evolution in O. sativa from low to high latitudes, with levels of linkage disequilibrium (LD) in the 6 subpopulations ranging from 10 to 30 cM. We then investigated the phenotypic data for grain length, grain width, grain thickness and 1,000-grain weight over 4 years. Using a genome-wide association analysis, we identified 17 marker-trait associations involving 14 SSR markers on 12 chromosome arms, and 8 of the 17 associations were novel. The elite alleles were mined based on the phenotypic effects of the detected quantitative trait loci (QTLs). These elite alleles could be used to improve target traits through optimal cross designs, with the expected results obtained by pyramiding or substituting the elite alleles per QTL (independent of possible epistatic effects). Together, these results provide an in-depth understanding of the genetic diversity pattern among rice-grain traits across a broad geographic scale, which has potential use in future research work, including studies related to germplasm conservation and molecular breeding by design.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity

构建基于lncRNA病协会和疾病的语义相似度的lncRNA功能相似的网络

Increasing evidence has indicated that plenty of lncRNAs play important roles in many critical biological processes. Developing powerful computational models to construct lncRNA functional similarity network based on heterogeneous biological datasets is one of the most important and popular topics in the fields of both lncRNAs and complex diseases. Functional similarity network consturction could benefit the model development for both lncRNA function inference and lncRNA-disease association identification. However, little effort has been attempted to analysis and calculate lncRNA functional similarity on a large scale. In this study, based on the assumption that functionally similar lncRNAs tend to be associated with similar diseases, we developed two novel lncRNA functional similarity calculation models (LNCSIM). LNCSIM was evaluated by introducing similarity scores into the model of Laplacian Regularized Least Squares for LncRNA–Disease Association (LRLSLDA) for lncRNA-disease association prediction. As a result, new predictive models improved the performance of LRLSLDA in the leave-one-out cross validation of various known lncRNA-disease associations datasets. Furthermore, some of the predictive results for colorectal cancer and lung cancer were verified by independent biological experimental studies. It is anticipated that LNCSIM could be a useful and important biological tool for human disease diagnosis, treatment, and prevention.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

Phylogenetic relationships of subfamilies in the family Hesperiidae (Lepidoptera: Hesperioidea) from China

在家庭弄蝶亚科的系统发育关系(鳞翅目:弄蝶总科)来自中国

Hesperiidae is one of the largest families of butterflies. Our knowledge of the higher systematics on hesperiids from China is still very limited. We infer the phylogenetic relationships of the subfamilies of Chinese skippers based on three mitochondrial genes (cytochrome b (Cytb), the NADH dehydrogenase subunit 1 (ND1) and cytochrome oxidase I (COI)). In this study, 30 species in 23 genera were included in the Bayesian and maximum likelihood analyses. The subfamily Coeliadinae, Eudaminae, Pyrginae and Heteropterinae were recovered as a monophyletic clade with strong support. The subfamily Hesperiinae formed a clade, but support for monophyly was weak. Our results imply that the five subfamilies of Chinese Hesperiidae should be divided into: Coeliadinae, Eudaminae, Pyrginae, Heteropterinae and Hesperiinae. The relationships of the five subfamilies should be as follows: Coeliadinae + (Eudaminae + (Pyrginae + (Heteropterinae + Hesperiinae))).

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

Intracellular microRNA profiles form in the Xenopus laevis oocyte that may contribute to asymmetric cell division

在非洲爪蟾卵母细胞可能有助于细胞不对称分裂细胞microRNA表达谱的形式

Asymmetric distribution of fate determinants within cells is an essential biological strategy to prepare them for asymmetric division. In this work we measure the intracellular distribution of 12 maternal microRNAs (miRNA) along the animal-vegetal axis of the Xenopus laevis oocyte using qPCR tomography. We find the miRNAs have distinct intracellular profiles that resemble two out of the three profiles we previously observed for mRNAs. Our results suggest that miRNAs in addition to proteins and mRNAs may have asymmetric distribution within the oocyte and may contribute to asymmetric cell division as cell fate determinants.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

PsANT, the adenine nucleotide translocase of Puccinia striiformis, promotes cell death and fungal growth

psant,条锈菌的腺嘌呤核苷酸转位酶,促进细胞死亡和真菌生长

Adenine nucleotide translocase (ANT) is a constitutive mitochondrial component that is involved in ADP/ATP exchange and mitochondrion-mediated apoptosis in yeast and mammals. However, little is known about the function of ANT in pathogenic fungi. In this study, we identified an ANT gene of Puccinia striiformis f. sp. tritici (Pst), designated PsANT. The PsANT protein contains three typical conserved mitochondrion-carrier-protein (mito-carr) domains and shares more than 70% identity with its orthologs from other fungi, suggesting that ANT is conserved in fungi. Immuno-cytochemical localization confirmed the mitochondrial localization of PsANT in normal Pst hyphal cells or collapsed cells. Over-expression of PsANT indicated that PsANT promotes cell death in tobacco, wheat and fission yeast cells. Further study showed that the three mito-carr domains are all needed to induce cell death. qRT-PCR analyses revealed an in-planta induced expression of PsANT during infection. Knockdown of PsANT using a host-induced gene silencing system (HIGS) attenuated the growth and development of virulent Pst at the early infection stage but not enough to alter its pathogenicity. These results provide new insight into the function of PsANT in fungal cell death and growth and might be useful in the search for and design of novel disease control strategies.

[详细]

  • Scientific Reports 5
  • 10年前
  • Article

Identification of New Candidate Genes and Chemicals Related to Esophageal Cancer Using a Hybrid Interaction Network of Chemicals and Proteins

新的候选基因与食管癌采用化学物质和蛋白质相互作用网络相关的化学品混合识别

by Yu-Fei Gao, Fei Yuan, Junbao Liu, Li-Peng Li, Yi-Chun He, Ru-Jian Gao, Yu-Dong Cai, Yang Jiang

Cancer is a serious disease responsible for many deaths every year in both developed and developing countries. One reason is that the mechanisms underlying most types of cancer are still mysterious, creating a great block for the design of effective treatments. In this study, we attempted to clarify the mechanism underlying esophageal cancer by searching for novel genes and chemicals. To this end, we constructed a hybrid network containing both proteins and chemicals, and generalized an existing computational method previously used to identify disease genes to identify new candidate genes and chemicals simultaneously. Based on jackknife test, our generalized method outperforms or at least performs at the same level as those obtained by a widely used method - the Random Walk with Restart (RWR). The analysis results of the final obtained genes and chemicals demonstrated that they highly shared gene ontology (GO) terms and KEGG pathways with direct and indirect associations with esophageal cancer. In addition, we also discussed the likelihood of selected candidate genes and chemicals being novel genes and chemicals related to esophageal cancer.

[详细]

  • PloS one
  • 10年前

Comparative Analysis of Codon Usage Bias Patterns in Microsporidian Genomes

在微孢子虫基因组的密码子使用模式的比较分析

by Heng Xiang, Ruizhi Zhang, Robert R. Butler, Tie Liu, Li Zhang, Jean-François Pombert, Zeyang Zhou

The sub-3 Mbp genomes from microsporidian species of the Encephalitozoon genus are the smallest known among eukaryotes and paragons of genomic reduction and compaction in parasites. However, their diminutive stature is not characteristic of all Microsporidia, whose genome sizes vary by an order of magnitude. This large variability suggests that different evolutionary forces are applied on the group as a whole. In this study, we have compared the codon usage bias (CUB) between eight taxonomically distinct microsporidian genomes: Encephalitozoon intestinalis, Encephalitozoon cuniculi, Spraguea lophii, Trachipleistophora hominis, Enterocytozoon bieneusi, Nematocida parisii, Nosema bombycis and Nosema ceranae. While the CUB was found to be weak in all eight Microsporidia, nearly all (98%) of the optimal codons in S. lophii, T. hominis, E. bieneusi, N. parisii, N. bombycis and N. ceranae are fond of A/U in third position whereas most (64.6%) optimal codons in the Encephalitozoon species E. intestinalis and E. cuniculi are biased towards G/C. Although nucleotide composition biases are likely the main factor driving the CUB in Microsporidia according to correlation analyses, directed mutational pressure also likely affects the CUB as suggested by ENc-plots, correspondence and neutrality analyses. Overall, the Encephalitozoon genomes were found to be markedly different from the other microsporidians and, despite being the first sequenced representatives of this lineage, are uncharacteristic of the group as a whole. The disparities observed cannot be attributed solely to differences in host specificity and we hypothesize that other forces are at play in the lineage leading to Encephalitozoon species.

[详细]

  • PloS one
  • 10年前

High-Throughput, Amplicon-Based Sequencing of the CREBBP Gene as a Tool to Develop a Universal Platform-Independent Assay

高通量测序,扩增为基础的CREBBP基因作为一种工具来开发一个通用的平台独立的检测

by Marc W. Fuellgrabe, Dietrich Herrmann, Henrik Knecht, Sven Kuenzel, Michael Kneba, Christiane Pott, Monika Brüggemann

High-throughput sequencing technologies are widely used to analyse genomic variants or rare mutational events in different fields of genomic research, with a fast development of new or adapted platforms and technologies, enabling amplicon-based analysis of single target genes or even whole genome sequencing within a short period of time. Each sequencing platform is characterized by well-defined types of errors, resulting from different steps in the sequencing workflow. Here we describe a universal method to prepare amplicon libraries that can be used for sequencing on different high-throughput sequencing platforms. We have sequenced distinct exons of the CREB binding protein (CREBBP) gene and analysed the output resulting from three major deep-sequencing platforms. platform-specific errors were adjusted according to the result of sequence analysis from the remaining platforms. Additionally, bioinformatic methods are described to determine platform dependent errors. Summarizing the results we present a platform-independent cost-efficient and timesaving method that can be used as an alternative to commercially available sample-preparation kits.

[详细]

  • PloS one
  • 10年前

Identification and Expression Analysis of the Barley (Hordeum vulgare L.) Aquaporin Gene Family

的鉴定和表达分析大麦(Hordeum vulgare L.)水通道蛋白基因家族

by Runyararo M. Hove, Mark Ziemann, Mrinal Bhave

Aquaporins (AQPs) are major intrinsic proteins (MIPs) that mediate bidirectional flux of water and other substrates across cell membranes, and play critical roles in plant-water relations, dehydration stress responses and crop productivity. However, limited data are available as yet on the contributions of these proteins to the physiology of the major crop barley (Hordeum vulgare). The present work reports the identification and expression analysis of the barley MIP family. A comprehensive search of publicly available leaf mRNA-seq data, draft barley genome data, GenBank transcripts and sixteen new annotations together revealed that the barley MIP family is comprised of at least forty AQPs. Alternative splicing events were likely in two plasma membrane intrinsic protein (PIP) AQPs. Analyses of the AQP signature sequences and specificity determining positions indicated a potential of several putative AQP isoforms to transport non-aqua substrates including physiological important substrates, and respond to abiotic stresses. Analysis of our publicly available leaf mRNA-seq data identified notable differential expression of HvPIP1;2 and HvTIP4;1 under salt stress. Analyses of other gene expression resources also confirmed isoform-specific responses in different tissues and/or in response to salinity, as well as some potentially inter-cultivar differences. The work reports systematic and comprehensive analysis of most, if not all, barley AQP genes, their sequences, expression patterns in different tissues, potential transport and stress response functions, and a strong framework for selection and/or development of stress tolerant barley varieties. In addition, the barley data would be highly valuable for genetic studies of the evolutionarily closely related wheat (Triticum aestivum L.).

[详细]

  • PloS one
  • 10年前

Discrete Logic Modelling Optimization to Contextualize Prior Knowledge Networks Using PRUNET

离散逻辑建模优化将先验知识的网络使用普吕内

by Ana Rodriguez, Isaac Crespo, Ganna Androsova, Antonio del Sol

High-throughput technologies have led to the generation of an increasing amount of data in different areas of biology. Datasets capturing the cell’s response to its intra- and extra-cellular microenvironment allows such data to be incorporated as signed and directed graphs or influence networks. These prior knowledge networks (PKNs) represent our current knowledge of the causality of cellular signal transduction. New signalling data is often examined and interpreted in conjunction with PKNs. However, different biological contexts, such as cell type or disease states, may have distinct variants of signalling pathways, resulting in the misinterpretation of new data. The identification of inconsistencies between measured data and signalling topologies, as well as the training of PKNs using context specific datasets (PKN contextualization), are necessary conditions to construct reliable, predictive models, which are current challenges in the systems biology of cell signalling. Here we present PRUNET, a user-friendly software tool designed to address the contextualization of a PKNs to specific experimental conditions. As the input, the algorithm takes a PKN and the expression profile of two given stable steady states or cellular phenotypes. The PKN is iteratively pruned using an evolutionary algorithm to perform an optimization process. This optimization rests in a match between predicted attractors in a discrete logic model (Boolean) and a Booleanized representation of the phenotypes, within a population of alternative subnetworks that evolves iteratively. We validated the algorithm applying PRUNET to four biological examples and using the resulting contextualized networks to predict missing expression values and to simulate well-characterized perturbations. PRUNET constitutes a tool for the automatic curation of a PKN to make it suitable for describing biological processes under particular experimental conditions. The general applicability of the implemented algorithm makes PRUNET suitable for a variety of biological processes, for instance cellular reprogramming or transitions between healthy and disease states.

[详细]

  • PloS one
  • 10年前

Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off

序列比对算法根据序列的数量与网站数量的权衡

Background: Comparative analysis of homologous sequences enables the understanding of evolutionary patterns at the molecular level, unraveling the functional constraints that shaped the underlying genes. Bioinformatic pipelines for comparative sequence analysis typically include procedures for (i) alignment quality assessment and (ii) control of sequence redundancy. An additional, underassessed step is the control of the amount and distribution of missing data in sequence alignments. While the number of sequences available for a given gene typically increases with time, the site-specific coverage of each alignment position remains highly variable because of differences in sequencing and annotation quality, or simply because of biological variation. For any given alignment-based analysis, the selection of sequences thus defines a trade-off between the species representation and the quantity of sites with sufficient coverage to be included in the subsequent analyses. Results: We introduce an algorithm for the optimization of sequence alignments according to the number of sequences vs. number of sites trade-off. The algorithm uses a guide tree to compute scores for each bipartition of the alignment, allowing the recursive selection of sequence subsets with optimal combinations of sequence and site numbers. By applying our methods to two large data sets of several thousands of gene families, we show that significant site-specific coverage increases can be achieved while controlling for the species representation. Conclusions: The algorithm introduced in this work allows the control of the distribution of missing data in any sequence alignment by removing sequences to increase the number of sites with a defined minimum coverage. We advocate that our missing data optimization procedure in an important step which should be considered in comparative analysis pipelines, together with alignment quality assessment and control of sampled diversity. An open source C++ implementation is available at http://bioweb.me/physamp.

[详细]

  • BMC Bioinformatics 2015, null:190
  • 10年前

WormGUIDES: an interactive single cell developmental atlas and tool for collaborative multidimensional data exploration

wormguides:交互式单细胞发育阿特拉斯和工具协同多维数据探索

Background: Imaging and image analysis advances are yielding increasingly complete and complicated records of cellular events in tissues and whole embryos. The ability to follow hundreds to thousands of cells at the individual level demands a spatio-temporal data infrastructure: tools to assemble and collate knowledge about development spatially in a manner analogous to geographic information systems (GIS). Just as GIS indexes items or events based on their spatio-temporal or 4D location on the Earth these tools would organize knowledge based on location within the tissues or embryos. Developmental processes are highly context-specific, but the complexity of the 4D environment in which they unfold is a barrier to assembling an understanding of any particular process from diverse sources of information. In the same way that GIS aids the understanding and use of geo-located large data sets, software can, with a proper frame of reference, allow large biological data sets to be understood spatially. Intuitive tools are needed to navigate the spatial structure of complex tissue, collate large data sets and existing knowledge with this spatial structure and help users derive hypotheses about developmental mechanisms. Results: Toward this goal we have developed WormGUIDES, a mobile application that presents a 4D developmental atlas for Caenorhabditis elegans. The WormGUIDES mobile app enables users to navigate a 3D model depicting the nuclear positions of all cells in the developing embryo. The identity of each cell can be queried with a tap, and community databases searched for available information about that cell. Information about ancestry, fate and gene expression can be used to label cells and craft customized visualizations that highlight cells as potential players in an event of interest. Scenes are easily saved, shared and published to other WormGUIDES users. The mobile app is available for Android and iOS platforms. Conclusion: WormGUIDES provides an important tool for examining developmental processes and developing mechanistic hypotheses about their control. Critically, it provides the typical end user with an intuitive interface for developing and sharing custom visualizations of developmental processes. Equally important, because users can select cells based on their position and search for information about them, the app also serves as a spatially organized index into the large body of knowledge available to the C. elegans community online. Moreover, the app can be used to create and publish the result of exploration: interactive content that brings other researchers and students directly to the spatio-temporal point of insight. Ultimately the app will incorporate a detailed time lapse record of cell shape, beginning with neurons. This will add the key ability to navigate and understand the developmental events that result in the coordinated and precise emergence of anatomy, particularly the wiring of the nervous system.

[详细]

  • BMC Bioinformatics 2015, null:189
  • 10年前

Light-weight reference-based compression of FASTQ data

基于光的参考重量fastq数据压缩

Background: The exponential growth of next generation sequencing (NGS) data has posed big challenges to data storage, management and archive. Data compression is one of the effective solutions, where reference-based compression strategies can typically achieve superior compression ratios compared to the ones not relying on any reference. Results: This paper presents a lossless light-weight reference-based compression algorithm namely LW-FQZip to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed into three data streams in which the redundancy information are identified and eliminated independently. Particularly, well-designed incremental and run-length-limited encoding schemes are utilized to compress the metadata and quality score streams, respectively. To handle the short reads, LW-FQZip uses a novel light-weight mapping model to fast map them against external reference sequence(s) and produce concise alignment results for storage. The three processed data streams are then packed together with some general purpose compression algorithms like LZMA. LW-FQZip was evaluated on eight real-world NGS data sets and achieved compression ratios in the range of 0.111-0.201. This is comparable or superior to other state-of-the-art lossless NGS data compression algorithms. Conclusions: LW-FQZip is a program that enables efficient lossless FASTQ data compression. It contributes to the state of art applications for NGS data storage and transmission. LW-FQZip is freely available online at: http://csse.szu.edu.cn/staff/zhuzx/LWFQZip.

[详细]

  • BMC Bioinformatics 2015, null:188
  • 10年前

Expansion of the HSFY gene family in pig lineages: HSFY expansion in suids

在< >上< / >家族基因在猪血统的扩展:<它>上< / >扩张SUIDS

Background: Amplified gene families on sex chromosomes can harbour genes with important biological functions, especially relating to fertility. The Y-linked heat shock transcription factor (HSFY) family has become amplified on the Y chromosome of the domestic pig (Sus scrofa), in an apparently independent event to an HSFY expansion on the Y chromosome of cattle (Bos taurus). Although the biological functions of HSFY genes are poorly understood, they appear to be involved in gametogenesis in a number of mammalian species, and, in cattle, HSFY gene copy number may correlate with levels of fertility. Results: We have investigated the HSFY family in domestic pig, and other suid species including warthog, bushpig, babirusa and peccaries. The domestic pig contains at least two amplified variants of HSFY, distinguished predominantly by presence or absence of a SINE within the intron. Both these variants are expressed in testis, and both are present in approximately 50 copies each in a single cluster on the short arm of the Y. The longer form has multiple nonsense mutations rendering it likely non-functional, but many of the shorter forms still have coding potential. Other suid species also have these two variants of HSFY, and estimates of copy number suggest the HSFY family may have amplified independently twice during suid evolution. Conclusions: The HSFY genes have become amplified in multiple species lineages independently. HSFY is predominantly expressed in testis in domestic pig, a pattern conserved with cattle, in which HSFY may play a role in fertility. Further investigation of the potential associations of HSFY with fertility and testis development may be of agricultural interest.

[详细]

  • BMC Genomics 2015, null:442
  • 10年前

Automated Profiling of Individual Cell-Cell Interactions from High-throughput Time-lapse Imaging Microscopy in Nanowell Grids (TIMING)

自动分析单个细胞间相互作用的高吞吐量的时间推移成像显微镜纳为网格(定时)

Motivation: There is a need for effective automated methods for profiling dynamic cell-cell interactions with single-cell resolution from high-throughput time-lapse imaging data, especially, the interactions between immune effector cells and tumor cells in adoptive immunotherapy.

Results: Fluorescently labeled human T cells, Natural Killer cells (NK), and various target cells (NALM6, K562, EL4) were co-incubated on PDMS arrays of sub-nanoliter wells (nanowells), and imaged using multi-channel time-lapse microscopy. The proposed cell segmentation and tracking algorithms account for cell variability and exploit the nanowell confinement property to increase the yield of correctly analyzed nanowells from 45% (existing algorithms) to 98% for wells containing one effector and a single target, enabling automated quantification of cell locations, morphologies, movements, interactions, and deaths without the need for manual proofreading. Automated analysis of recordings from 12 different experiments demonstrated automated nanowell delineation accuracy >99%, automated cell segmentation accuracy >95%, and automated cell tracking accuracy of 90%, with default parameters, despite variations in illumination, staining, imaging noise, cell morphology, and cell clustering. An example analysis revealed that NK cells efficiently discriminate between live and dead targets by altering the duration of conjugation. The data also demonstrated that cytotoxic cells display higher motility than non-killers, both before and during contact.

Supplements: A (Sample Data), B (Flowchart), C (Tracking Data), D (Software).

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

在:评估基因组组装和注释的完整性与单拷贝基因

Motivation: Genomics has revolutionised biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50.

Results: We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO.

Availability and Implementation: Software implemented in Python and datasets available for download from http://busco.ezlab.org.

Contact: Evgeny.Zdobnov@unige.ch

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE

UniAlign: Protein Structure Alignment Meets Evolution

unialign:蛋白质结构比对和进化

Motivation: During the evolution, functional sites on the surface of the protein as well as the hydrophobic core maintaining the structural integrity are well-conserved. However, available protein structure alignment methods align protein structures based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins.

Results: In this paper, we propose a new protein pairwise structure alignment algorithm (UniAlign) that incorporates additional evolutionary information captured in the form of sequence similarity, sequence profiles, and residue conservation. We define a per-residue score (UniScore) as a weighted sum of these and other features and develop an iterative optimization procedure to search for an alignment with the best overall UniScore. Our extensive experiments on CDD, HOMSTRAD, and BAliBASE benchmark datasets show that UniAlign outperforms commonly used structure alignment methods. We further demonstrate UniAlign's ability to develop family-specific models to drastically improve the quality of the alignments.

Availability: UniAlign is available as a web service at: http://sacan.biomed.drexel.edu/unialign

Contact: ahmet.sacan@drexel.edu

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

DISTMIX: Direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

distmix:对不可测的SNP混合种族人群直接归集汇总统计

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured Single Nucleotide Polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary Statistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.

Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics and cohort allele frequencies at measured SNPs. Simulations show that the proposed method adequately controls the Type I error rates. 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.

Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.

Contact: dlee4@vcu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality

当少即是多:“切片”测序数据改善阅读解码准确性和新创装配质量

Motivation: As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing.

Results: We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on ‘divide and conquer’: we ‘slice’ a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

Availability and implementation: Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs

Contact: stelo@cs.ucr.edu or timothy.close@ucr.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 10年前
  • ORIGINAL PAPER

DSigDB: drug signatures database for gene set analysis

DSigDB:药物基因签名数据库设置分析

Summary: We report the creation of Drug Signatures Database (DSigDB), a new gene set resource that relates drugs/compounds and their target genes, for gene set enrichment analysis (GSEA). DSigDB currently holds 22 527 gene sets, consists of 17 389 unique compounds covering 19 531 genes. We also developed an online DSigDB resource that allows users to search, view and download drugs/compounds and gene sets. DSigDB gene sets provide seamless integration to GSEA software for linking gene expressions with drugs/compounds for drug repurposing and translational research.

Availability and implementation: DSigDB is freely available for non-commercial use at http://tan lab.ucdenver.edu/DSigDB.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: aikchoon.tan@ucdenver.edu

[详细]

  • Bioinformatics
  • 10年前
  • APPLICATIONS NOTE