Molecular Mechanisms of Congenital Hyperinsulinism due to Autosomal Dominant Mutations in ABCC8

由于常染色体显性突变ABCC8先天性高胰岛素血症的分子机制

Congenital Hyperinsulinism (CHI) is a rare heterogeneous disease characterised by unregulated insulin secretion. Dominant mutations in ABCC8 causing medically unresponsive CHI have been reported, however the molecular mechanisms are not clear. The molecular basis of medically unresponsive CHI due to dominant ABCC8 mutations has been studied in 10 patients, who were medically unresponsive to diazoxide, and 9 of whom required a near total pancreatectomy, and 1 partial pancreatectomy. DNA sequencing revealed 7 dominant inactivating heterozygous missense mutations in ABCC8, including one novel and six previously reported but uncharacterised mutations. Two groups of mutations with different cellular mechanisms were characterised. Mutations in the transmembrane domain (TMD) were more responsive to channel activators such as diazoxide, MgADP and metabolic inhibition. The trafficking analysis has shown that nucleotide binding domain two (NBD2) mutations are not retained in the endoplasmic reticulum (ER) and are present on the membrane. However, the TMD mutations were retained in the ER. D1506E was the most severe SUR1 NBD2 mutation. Homologous expression of D1506E revealed a near absence of KATP currents in the presence of diazoxide and intracellular MgADP. Heterozygous expression of D1506E showed a strong dominant-negative effect on SUR1\Kir6.2 currents. Overall we define two groups of mutation with different cellular mechanisms. In the first group, channel complexes with mutations in NBD2 of SUR1 traffic normally but are unable to be activated by MgADP. In the second group, channels mutations in the TMD of SUR1 are retained in the ER and have variable functional impairment.

[详细]

  • Human Molecular Genetics
  • 9年前
  • Article

Characterization of New Isolates of Apricot vein clearing-associated virus and of a New Prunus-Infecting Virus: Evidence for Recombination as a Driving Force in Betaflexiviridae Evolution

杏脉清相关病毒和新病毒分离株鉴定李感染新的重组的证据在betaflexiviridae演化的驱动力

by Armelle Marais, Chantal Faure, Eldar Mustafayev, Thierry Candresse

Double stranded RNAs from Prunus samples gathered from various surveys were analyzed by a deep-sequencing approach. Contig annotations revealed the presence of a potential new viral species in an Azerbaijani almond tree (Prunus amygdalus) and its genome sequence was completed. Its genomic organization is similar to that of the recently described Apricot vein clearing associated virus (AVCaV) for which two new isolates were also characterized, in a similar fashion, from two Japanese plums (Prunus salicina) from a French germplasm collection. The amino acid identity values between the four proteins encoded by the genome of the new virus have identity levels with those of AVCaV which fall clearly outside the species demarcation criteria. The new virus should therefore be considered as a new species for which the name of Caucasus prunus virus (CPrV) has been proposed. Phylogenetic relationships and nucleotide comparisons suggested that together with AVCaV, CPrV could define a new genus (proposed name: Prunevirus) in the family Betaflexiviridae. A molecular test targeting both members of the new genus was developed, allowing the detection of additional AVCaV isolates, and therefore extending the known geographical distribution and the host range of AVCaV. Moreover, the phylogenetic trees reconstructed with the amino acid sequences of replicase, movement and coat proteins of representative Betaflexiviridae members suggest that Citrus leaf blotch virus (CLBV, type member of the genus Citrivirus) may have evolved from a recombination event involving a Prunevirus, further highlighting the importance of recombination as a driving force in Betaflexiviridae evolution. The sequences reported in the present manuscript have been deposited in the GenBank database under accession numbers KM507061-KM504070.

[详细]

  • PloS one
  • 9年前

Is Model Fitting Necessary for Model-Based fMRI?

模型拟合必要的基于模型的功能磁共振成像?

by Robert C. Wilson, Yael Niv

Model-based analysis of fMRI data is an important tool for investigating the computational role of different brain regions. With this method, theoretical models of behavior can be leveraged to find the brain structures underlying variables from specific algorithms, such as prediction errors in reinforcement learning. One potential weakness with this approach is that models often have free parameters and thus the results of the analysis may depend on how these free parameters are set. In this work we asked whether this hypothetical weakness is a problem in practice. We first developed general closed-form expressions for the relationship between results of fMRI analyses using different regressors, e.g., one corresponding to the true process underlying the measured data and one a model-derived approximation of the true generative regressor. Then, as a specific test case, we examined the sensitivity of model-based fMRI to the learning rate parameter in reinforcement learning, both in theory and in two previously-published datasets. We found that even gross errors in the learning rate lead to only minute changes in the neural results. Our findings thus suggest that precise model fitting is not always necessary for model-based fMRI. They also highlight the difficulty in using fMRI data for arbitrating between different models or model parameters. While these specific results pertain only to the effect of learning rate in simple reinforcement learning models, we provide a template for testing for effects of different parameters in other models.

[详细]

  • PLOS Computational Biology
  • 9年前

Hox cluster characterization of Banna caecilian (Ichthyophis bannanicus) provides hints for slow evolution of its genome

Hox版纳蚓螈聚类表征(<它>版纳鱼螈<它>)提供的基因组进化缓慢的提示

Background: Caecilians, with a discrete lifestyle, are the least explored group of amphibians. Though with distinct traits, many aspects of their biology are poorly investigated. Obtaining the caecilian genomic sequences will offer new perspectives and aid the fundamental studies in caecilian biology. The caecilian genomic sequences are also important and practical in the comparative genomics of amphibians. Currently, however, only sparse genomic sequences of caecilians are available. Hox genes, an old family of transcription factors playing central roles in the establishment of metazoan body plan. Understanding their structure and genomic organization may provide insights into the animal’s genome, which is valuable for animals without a sequenced genome. Results: We sequenced and characterized the Hox clusters of Banna caecilian (Ichthyophis bannanicus) with a strategy combining long range PCR and genome walking. We obtained the majority of the four caecilian Hox clusters and identified 39 Hox genes, 5 microRNA genes and 1 pseudogene (ψHoxD12). There remained seven intergenic gaps we were unable to fill. From the obtained sequences, the caecilian Hox clusters contained less repetitive sequences and more conserved noncoding elements (CNEs) than the frog counterparts. We found that caecilian and coelacanth shared many more CNEs than frog and coelacanth did. Relative rate of sequence evolution showed that caecilian Hox genes evolved significantly more slowly than the other tetrapod species used in this study and were comparable to the slowly evolving coelacanth Hox genes. Phylogenetic tree of the four Hox clusters also revealed shorter branch length especially for the caecilian HoxA, HoxB and HoxD clusters. These features of the caecilian Hox clusters suggested a slowly evolving genome, which was supported by further analysis of a large orthologous protein dataset. Conclusions: Our analyses greatly extended the knowledge about the caecilian Hox clusters from previous PCR surveys. From the obtained Hox sequences and the orthologous protein dataset, the caecilian Hox loci and its genome appear evolving comparatively slowly. As the basal lineage of amphibians and land vertebrate, this characteristic of the caecilian genome is valuable in the study concerning the genome biology and evolution of amphibians and early tetrapods.

[详细]

  • BMC Genomics 2015, null:468
  • 9年前

RNA-seq analysis of an apical meristem time series reveals a critical point in Arabidopsis thaliana flower initiation

茎尖分生组织的时间序列的RNA序列分析揭示了一个临界点,在<它> <它>拟南芥开花

Background: Floral transition is a critical event in the life cycle of a flowering plant as it determines its reproductive success. Despite extensive studies of specific genes that regulate this process, the global changes in transcript expression profiles at the point when a vegetative meristem transitions into an inflorescence have not been reported. We analyzed gene expression during Arabidopsis thaliana meristem development under long day conditions from day 7 to 16 after germination in one-day increments. Results: The dynamics of the expression of the main flowering regulators was consistent with previous reports: notably, the expression of FLOWERING LOCUS C (FLC) decreased over the course of the time series while expression of LEAFY (LFY) increased. This analysis revealed a developmental time point between 10 and 12 days after germination where FLC expression had decreased but LFY expression had not yet increased, which was characterized by a peak in the number of differentially expressed genes. Gene Ontology (GO) enrichment analysis of these genes identified an overrepresentation of genes related to the cell cycle. Conclusions: We discovered an unprecedented burst of differential expression of cell cycle related genes at one particular point during transition to flowering. We suggest that acceleration of rate of the divisions and partial cell cycling synchronization takes place at this point.

[详细]

  • BMC Genomics 2015, null:466
  • 9年前

Identification of a common risk haplotype for canine idiopathic epilepsy in the ADAM23 gene

一种犬特发性癫痫在<它> > < / ADAM23基因常见风险单倍型的鉴定

Background: Idiopathic epilepsy is a common neurological disease in human and domestic dogs but relatively few risk genes have been identified to date. The seizure characteristics, including focal and generalised seizures, are similar between the two species, with gene discovery facilitated by the reduced genetic heterogeneity of purebred dogs. We have recently identified a risk locus for idiopathic epilepsy in the Belgian Shepherd breed on a 4.4 megabase region on CFA37. Results: We have expanded a previous study replicating the association with a combined analysis of 157 cases and 179 controls in three additional breeds: Schipperke, Finnish Spitz and Beagle (p c  = 2.9e–07, p GWAS  = 1.74E-02). A targeted resequencing of the 4.4 megabase region in twelve Belgian Shepherd cases and twelve controls with opposite haplotypes identified 37 case-specific variants within the ADAM23 gene. Twenty-seven variants were validated in 285 cases and 355 controls from four breeds, resulting in a strong replication of the ADAM23 locus (p raw  = 2.76e–15) and the identification of a common 28 kb-risk haplotype in all four breeds. Risk haplotype was present in frequencies of 0.49–0.7 in the breeds, suggesting that ADAM23 is a low penetrance risk gene for canine epilepsy. Conclusions: These results implicate ADAM23 in common canine idiopathic epilepsy, although the causative variant remains yet to be identified. ADAM23 plays a role in synaptic transmission and interacts with known epilepsy genes, LGI1 and LGI2, and should be considered as a candidate gene for human epilepsies.

[详细]

  • BMC Genomics 2015, null:465
  • 9年前

Distinguishing the rates of gene activation from phenotypic variations

区别从表型变异基因的活化率

Background: Stochastic genetic switching driven by intrinsic noise is an important process in gene expression. When the rates of gene activation/inactivation are relatively slow, fast, or medium compared with the synthesis/degradation rates of mRNAs and proteins, the variability of protein and mRNA levels may exhibit very different dynamical patterns. It is desirable to provide a systematic approach to identify their key dynamical features in different regimes, aiming at distinguishing which regime a considered gene regulatory network is in from their phenotypic variations. Results: We studied a gene expression model with positive feedbacks when genetic switching rates vary over a wide range. With the goal of providing a method to distinguish the regime of the switching rates, we first focus on understanding the essential dynamics of gene expression system in different cases. In the regime of slow switching rates, we found that the effective dynamics can be reduced to independent evolutions on two separate layers corresponding to gene activation and inactivation states, and the transitions between two layers are rare events, after which the system goes mainly along deterministic ODE trajectories on a particular layer to reach new steady states. The energy landscape in this regime can be well approximated by using Gaussian mixture model. In the regime of intermediate switching rates, we analyzed the mean switching time to investigate the stability of the system in different parameter ranges. We also discussed the case of fast switching rates from the viewpoint of transition state theory. Based on the obtained results, we made a proposal to distinguish these three regimes in a simulation experiment. We identified the intermediate regime from the fact that the strength of cellular memory is lower than the other two cases, and the fast and slow regimes can be distinguished by their different perturbation-response behavior with respect to the switching rates perturbations. Conclusions: We proposed a simulation experiment to distinguish the slow, intermediate and fast regimes, which is the main point of our paper. In order to achieve this goal, we systematically studied the essential dynamics of gene expression system when the switching rates are in different regimes. Our theoretical understanding provides new insights on the gene expression experiments.

[详细]

  • BMC Systems Biology 2015, null:29
  • 9年前

Functional marker development of miR1511-InDel and allelic diversity within the genus Glycine

在甘氨酸属的mir1511品种的等位基因多样性功能标记开发

Background: Single-stranded non-protein coding small RNAs, 18–25 nucleotides in length, are ubiquitous throughout plants genomes and are involved in post-transcriptional gene regulation. Several types of DNA markers have been reported for the detection of genetic diversity or sequence variation in soybean, one of the most important legume crops in worldwide for seed protein and oil content. Recently, with the available of public genomic databases, there has been a shift from the labor-intensive development of PCR-based markers to sequence-based genotyping and the development of functional markers within genes, often coupled with the use of RNA information. But thus far miRNA-based markers have been only developed in rice and tobacco. Here we report the first functional molecular miRNA marker, miR1511-InDel, in soybean for a specific single copy locus used to assess genetic variation in domesticated soybean (Glycine max [L.] Merr) and its wild progenitor (Glycine soja Sieb. & Zucc.). Results: We genotyped a total of 1,669 accessions of domesticated soybean (G. max) and its wild progenitor G. soja which are native throughout the China and parts of Korea, Japan and Russia. The results indicate that the miR1511 locus is distributed in cultivated soybean and has three alleles in annual wild soybean. Based on this result, we proposed that miR-InDel marker technology can be used to assess genetic variation. The inclusion of geo-reference data with miR1511-InDel marker data corroborated that accessions from the Yellow River basin (Huanghuai) exhibited high genetic diversity which provides more molecular evidence for gene diversity in annual wild soybean and domestication of soybean. Conclusions: These results provide evidence for the use of RNA marker, miRNA1511-InDel, as a soybean-specific functional maker for the study of genetic diversity, genotyping of germplasm and evolution studies. This is also the first report of functional marker developed from soybean miRNA located within the functional region of pre-miRNA1511.

[详细]

  • BMC Genomics 2015, null:467
  • 9年前

iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity

isuc PseAAC:结合肽的特定位置的倾向性预测蛋白质中赖氨酸的酰化

Lysine succinylation in protein is one type of post-translational modifications (PTMs). Succinylation is associated with some diseases and succinylated sites data just has been found in recent years in experiments. It is highly desired to develop computational methods to identify the candidate proteins and their sites. In view of this, a new predictor called iSuc-PseAAC was proposed by incorporating the peptide position-specific propensity into the general form of pseudo amino acid composition. The accuracy is 79.94%, sensitivity 51.07%, specificity 89.42% and MCC 0.431 in leave-one-out cross validation with support vector machine algorithm. It demonstrated by rigorous leave-one-out on stringent benchmark dataset that the new predictor is quite promising and may become a useful high throughput tool in this area. Meanwhile a user-friendly web-server for iSuc-PseAAC is accessible at http://app.aporc.org/iSuc-PseAAC/ . Users can easily obtain their desired results without the need to understand the complicated mathematical equations presented in this paper just for its integrity.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

The Plasmodiophora brassicae genome reveals insights in its life cycle and ancestry of chitin synthases

的根肿菌基因组揭示其生命周期与几丁质合成酶的祖先的见解

Plasmodiophora brassicae causes clubroot, a major disease of Brassica oil and vegetable crops worldwide. P. brassicae is a Plasmodiophorid, obligate biotrophic protist in the eukaryotic kingdom of Rhizaria. Here we present the 25.5 Mb genome draft of P. brassicae, developmental stage-specific transcriptomes and a transcriptome of Spongospora subterranea, the Plasmodiophorid causing powdery scab on potato. Like other biotrophic pathogens both Plasmodiophorids are reduced in metabolic pathways. Phytohormones contribute to the gall phenotypes of infected roots. We report a protein (PbGH3) that can modify auxin and jasmonic acid. Plasmodiophorids contain chitin in cell walls of the resilient resting spores. If recognized, chitin can trigger defense responses in plants. Interestingly, chitin-related enzymes of Plasmodiophorids built specific families and the carbohydrate/chitin binding (CBM18) domain is enriched in the Plasmodiophorid secretome. Plasmodiophorids chitin synthases belong to two families, which were present before the split of the eukaryotic Stramenopiles/Alveolates/Rhizaria/Plantae and Metazoa/Fungi/Amoebozoa megagroups, suggesting chitin synthesis to be an ancient feature of eukaryotes. This exemplifies the importance of genomic data from unexplored eukaryotic groups, such as the Plasmodiophorids, to decipher evolutionary relationships and gene diversification of early eukaryotes.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

A neonicotinoid impairs olfactory learning in Asian honey bees (Apis cerana) exposed as larvae or as adults

新烟碱类损害嗅觉学习亚洲蜜蜂(Apis cerana)暴露的幼虫或成人

Xenobiotics such as the neonicotinoid pesticide, imidacloprid, are used globally, but their effects on native bee species are poorly understood. We studied the effects of sublethal doses of imidacloprid on olfactory learning in the native honey bee species, Apis cerana, an important pollinator of agricultural and native plants throughout Asia. We provide the first evidence that imidacloprid can impair learning in A. cerana workers exposed as adults or as larvae. Adults that ingested a single imidacloprid dose as low as 0.1 ng/bee had significantly reduced olfactory learning acquisition, which was 1.6-fold higher in control bees. Longer-term learning (1-17 h after the last learning trial) was also impaired. Bees exposed as larvae to a total dose of 0.24 ng/bee did not have reduced survival to adulthood. However, these larval-treated bees had significantly impaired olfactory learning when tested as adults: control bees exhibited up to 4.8-fold better short-term learning acquisition, though longer-term learning was not affected. Thus, sublethal cognitive deficits elicited by neonicotinoids on a broad range of native bee species deserve further study.

[详细]

  • Scientific Reports 5
  • 9年前
  • Article

Genome-wide profiling of in vivo RNA structure at single-nucleotide resolution using structure-seq

全基因组分析体内RNA结构采用结构序列单核苷酸分辨率

Structure-seq applies chemically based RNA structure probing on a genome-wide scale, allowing the in vivo structures of thousands of distinct transcripts to be discerned, and the meta-properties of RNA structure–function relationships to be unveiled.

[详细]

  • Nature Protocols 10, 1050 (2015)
  • 9年前
  • Protocol

MiRBooking simulates the stoichiometric mode of action of microRNAs

mirbooking模拟动作的microRNA的计量模式

In eucaryotes, gene expression is regulated by microRNAs (miRNAs) which bind to messenger RNAs (mRNAs) and interfere with their translation into proteins, either by promoting their degradation or inducing their repression. We study the effect of miRNA interference on each gene using experimental methods, such as microarrays and RNA-seq at the mRNA level, or luciferase reporter assays and variations of SILAC at the protein level. Alternatively, computational predictions would provide clear benefits. However, no algorithm toward this task has ever been proposed. Here, we introduce a new algorithm to predict genome-wide expression data from initial transcriptome abundance. The algorithm simulates the miRNA and mRNA hybridization competition that occurs in given cellular conditions, and derives the whole set of miRNA::mRNA interactions at equilibrium (microtargetome). Interestingly, solving the competition improves the accuracy of miRNA target predictions. Furthermore, this model implements a previously reported and fundamental property of the microtargetome: the binding between a miRNA and a mRNA depends on their sequence complementarity, but also on the abundance of all RNAs expressed in the cell, i.e. the stoichiometry of all the miRNA sites and all the miRNAs given their respective abundance. This model generalizes the miRNA-induced synchronistic silencing previously observed, and described as sponges and competitive endogenous RNAs.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Computational Biology

The poor homology stringency in the heteroduplex allows strand exchange to incorporate desirable mismatches without sacrificing recognition in vivo

在异源双链的链交换性差的同源性允许把理想错位而不牺牲识别体内

RecA family proteins are responsible for homology search and strand exchange. In bacteria, homology search begins after RecA binds an initiating single-stranded DNA (ssDNA) in the primary DNA-binding site, forming the presynaptic filament. Once the filament is formed, it interrogates double-stranded DNA (dsDNA). During the interrogation, bases in the dsDNA attempt to form Watson–Crick bonds with the corresponding bases in the initiating strand. Mismatch dependent instability in the base pairing in the heteroduplex strand exchange product could provide stringent recognition; however, we present experimental and theoretical results suggesting that the heteroduplex stability is insensitive to mismatches. We also present data suggesting that an initial homology test of 8 contiguous bases rejects most interactions containing more than 1/8 mismatches without forming a detectable 20 bp product. We propose that, in vivo, the sparsity of accidental sequence matches allows an initial 8 bp test to rapidly reject almost all non-homologous sequences. We speculate that once the initial test is passed, the mismatch insensitive binding in the heteroduplex allows short mismatched regions to be incorporated in otherwise homologous strand exchange products even though sequences with less homology are eventually rejected.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Nucleic Acid Enzymes

By the company they keep: interaction networks define the binding ability of transcription factors

他们维持的公司:互动网络确定转录因子的结合能力

Access to genome-wide data provides the opportunity to address questions concerning the ability of transcription factors (TFs) to assemble in distinct macromolecular complexes. Here, we introduce the PAnDA (Protein And DNA Associations) approach to characterize DNA associations with human TFs using expression profiles, protein–protein interactions and recognition motifs. Our method predicts TF binding events with >0.80 accuracy revealing cell-specific regulatory patterns that can be exploited for future investigations. Even when the precise DNA-binding motifs of a specific TF are not available, the information derived from protein-protein networks is sufficient to perform high-confidence predictions (area under the ROC curve of 0.89). PAnDA is freely available at http://service.tartaglialab.com/new_submission/panda.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Transcriptome dynamics of the microRNA inhibition response

MicroRNA抑制转录的反应动力学

We report a high-resolution time series study of transcriptome dynamics following antimiR-mediated inhibition of miR-9 in a Hodgkin lymphoma cell-line—the first such dynamic study of the microRNA inhibition response—revealing both general and specific aspects of the physiological response. We show miR-9 inhibition inducing a multiphasic transcriptome response, with a direct target perturbation before 4 h, earlier than previously reported, amplified by a downstream peak at ~32 h consistent with an indirect response due to secondary coherent regulation. Predictive modelling indicates a major role for miR-9 in post-transcriptional control of RNA processing and RNA binding protein regulation. Cluster analysis identifies multiple co-regulated gene regulatory modules. Functionally, we observe a shift over time from mRNA processing at early time points to translation at later time points. We validate the key observations with independent time series qPCR and we experimentally validate key predicted miR-9 targets. Methodologically, we developed sensitive functional data analytic predictive methods to analyse the weak response inherent in microRNA inhibition experiments. The methods of this study will be applicable to similar high-resolution time series transcriptome analyses and provides the context for more accurate experimental design and interpretation of future microRNA inhibition studies.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Computational Biology

Efficient conditional knockout targeting vector construction using co-selection BAC recombineering (CoSBR)

高效的条件性敲除打靶载体的构建采用有限选择BAC重组(cosbr)

A simple and efficient strategy for Bacterial Artificial Chromosome (BAC) recombineering based on co-selection is described. We show that it is possible to efficiently modify two positions of a BAC simultaneously by co-transformation of a single-stranded DNA oligo and a double-stranded selection cassette. The use of co-selection BAC recombineering reduces the DNA manipulation needed to make a conditional knockout gene targeting vector to only two steps: a single round of BAC modification followed by a retrieval step.

[详细]

  • Nucleic Acids Research
  • 9年前
  • Methods Online

Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models

时间过程的基因表达数据和隐马尔可夫模型的发展驱动基因的分类排列

Background: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function. Results: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes. Conclusions: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

[详细]

  • BMC Bioinformatics 2015, null:196
  • 9年前

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data

ncc-auc:AUC优化方法确定癌症预后从基因组数据和临床多的生物标志物的面板

Motivation: In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need.

Results: In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named NCC-AUC (Nearest Centroid Classifier for AUC optimization). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell’s concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB NSCLC (Non-Small-Cell Lung Cancer). For the genomic data, NCC-AUC outperforms SVM (Support Vector Machine) and SVM-RFE (Support Vector Machine-based Recursive Feature Elimination) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories.

Conclusion: In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis.

Availability: NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC.

[详细]

  • Bioinformatics
  • 9年前
  • ORIGINAL PAPER

atSNP: transcription factor binding affinity testing for regulatory SNP detection

atsnp:转录因子结合的亲和力测试监管SNP检测

Motivation: Genome-wide association studies revealed that most disease-associated single nucleotide polymoprhisms (SNPs) are located in regulatory regions within introns or in regions between genes. Regulatory SNPs (rSNPs) are such SNPs that affect gene regulation by changing transcription factor (TF) binding affinities to genomic sequences. Identifying potential rSNPs is crucial for understanding disease mechanisms. In silico methods that evaluate the impact of SNPs on TF binding affinities are not scalable for large-scale analysis.

Results: We describe atSNP (affinity testing for regulatory SNPs), a computationally efficient R package for identifying rSNPs in silico. atSNP implements an importance sampling algorithm coupled with a first-order Markov model for the background nucleotide sequences to test the significance of affinity scores and SNP-driven changes in these scores. Application of atSNP with >20K SNPs indicates that atSNP is the only available tool for such a large-scale task. atSNP provides user-friendly output in the form of both tables and composite logo plots for visualizing SNP-motif interactions. Evaluations of atSNP with known rSNP-TF interactions indicates that rSNP is able to prioritize motifs for a given set of SNPs with high accuracy.

Availability: http://github.com/chandlerzuo/atsnp.

Contact: keles@stat.wisc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

[详细]

  • Bioinformatics
  • 9年前
  • APPLICATIONS NOTE

Genome stability: Chromothripsis and micronucleus formation

基因组稳定性:染色体和微核形成

It has generally been assumed that cancers arise through the accumulation of individual mutations over time; however, recent cancer genome sequence analyses suggest that multiple mutations can arise simultaneously during a single event such as chromothripsis, which results in extensive genomic rearrangements that are usually

[详细]

  • Nature Reviews Genetics 16, 376 (2015)
  • 9年前
  • Research Highlight