Epigenome editing made easy
表观基因组编辑变得容易
Fusions of Cas9 to histone-modifying enzymes enable functional interrogation of the epigenome.
- Nature Biotechnology 33, 606 (2015)
- 10年前
表观基因组编辑变得容易
Fusions of Cas9 to histone-modifying enzymes enable functional interrogation of the epigenome.
吸血鬼的蜗牛<它> colubraria reticulata < / >有毒的鸡尾酒(软体动物,腹足纲)
Background: Hematophagy arose independently multiple times during metazoan evolution, with several lineages of vampire animals particularly diversified in invertebrates. However, the biochemistry of hematophagy has been studied in a few species of direct medical interest and is still underdeveloped in most invertebrates, as in general is the study of venom toxins. In cone snails, leeches, arthropods and snakes, the strong target specificity of venom toxins uniquely aligns them to industrial and academic pursuits (pharmacological applications, pest control etc.) and provides a biochemical tool for studying biological activities including cell signalling and immunological response. Neogastropod snails (cones, oyster drills etc.) are carnivorous and include active predators, scavengers, grazers on sessile invertebrates and hematophagous parasites; most of them use venoms to efficiently feed. It has been hypothesized that trophic innovations were the main drivers of rapid radiation of Neogastropoda in the late Cretaceous.We present here the first molecular characterization of the alimentary secretion of a non-conoidean neogastropod, Colubraria reticulata. Colubrariids successfully feed on the blood of fishes, throughout the secretion into the host of a complex mixture of anaesthetics and anticoagulants. We used a NGS RNA-Seq approach, integrated with differential expression analyses and custom searches for putative secreted feeding-related proteins, to describe in detail the salivary and mid-oesophageal transcriptomes of this Mediterranean vampire snail, with functional and evolutionary insights on major families of bioactive molecules. Results: A remarkably low level of overlap was observed between the gene expression in the two target tissues, which also contained a high percentage of putatively secreted proteins when compared to the whole body. At least 12 families of feeding-related proteins were identified, including: 1) anaesthetics, such as ShK Toxin-containing proteins and turripeptides (ion-channel blockers), Cysteine-rich secretory proteins (CRISPs), Adenosine Deaminase (ADA); 2) inhibitors of primary haemostasis, such as novel vWFA domain-containing proteins, the Ectonucleotide pyrophosphatase/phosphodiesterase family member 5 (ENPP5) and the wasp Antigen-5; 3) anticoagulants, such as TFPI-like multiple Kunitz-type protease inhibitors, Peptidases S1 (PS1), CAP/ShKT domain-containing proteins, Astacin metalloproteases and Astacin/ShKT domain-containing proteins; 4) additional proteins, such the Angiotensin-Converting Enzyme (ACE: vasopressive) and the cytolytic Porins. Conclusions: Colubraria feeding physiology seems to involve inhibitors of both primary and secondary haemostasis, anaesthetics, a vasoconstrictive enzyme to reduce feeding time and tissue-degrading proteins such as Porins and Astacins. The complexity of Colubraria venomous cocktail and the divergence from the arsenal of the few neogastropods studied to date (mostly conoideans) suggest that biochemical diversification of neogastropods might be largely underestimated and worth of extensive investigation.
的蛋白质定向进化方法
Directed evolution has proved to be an effective strategy for improving or altering the activity of biomolecules for industrial, research and therapeutic applications. The evolution of proteins in the laboratory requires methods for generating genetic diversity and for identifying protein variants with desired properties. This
重建古老的基因组和表观基因组
Research involving ancient DNA (aDNA) has experienced a true technological revolution in recent years through advances in the recovery of aDNA and, particularly, through applications of high-throughput sequencing. Formerly restricted to the analysis of only limited amounts of genetic information, aDNA studies have now progressed
蛋白质序列的进化速率的决定因素
The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what determines functional constraint has remained unclear. The increasing availability of
比较转录组分析揭示脱落酸对色素代谢的影响,维生素C和叶酸草莓果实成熟过程中
by Dongdong Li, Li Li, Zisheng Luo, Wangshu Mou, Linchun Mao, Tiejin Ying
A comprehensive investigation of abscisic acid (ABA) biosynthesis and its influence on other important phytochemicals is critical for understanding the versatile roles that ABA plays during strawberry fruit ripening. Using RNA-seq technology, we sampled strawberry fruit in response to ABA or nordihydroguaiaretic acid (NDGA; an ABA biosynthesis blocker) treatment during ripening and assessed the expression changes of genes involved in the metabolism of pigments, ascorbic acid (AsA) and folic acid in the receptacles. The transcriptome analysis identified a lot of genes differentially expressed in response to ABA or NDGA treatment. In particular, genes in the anthocyanin biosynthesis pathway were actively regulated by ABA, with the exception of the gene encoding cinnamate 4-hydroxylase. Chlorophyll degradation was accelerated by ABA mainly owing to the higher expression of gene encoding pheide a oxygenase. The decrease of β-carotene content was accelerated by ABA treatment and delayed by NDGA. A high negative correlation rate was found between ABA and β-carotene content, indicating the importance of the requirement for ABA synthesis during fruit ripening. In addition, evaluation on the folate biosynthetic pathway indicate that ABA might have minor function in this nutrient’s biosynthesis process, however, it might be involved in its homeostasis. Surprisingly, though AsA content accumulated during fruit ripening, expressions of genes involved in its biosynthesis in the receptacles were significantly lower in ABA-treated fruits. This transcriptome analysis expands our understanding of ABA’s role in phytochemical metabolism during strawberry fruit ripening and the regulatory mechanisms of ABA on these pathways were discussed. Our study provides a wealth of genetic information in the metabolism pathways and may be helpful for molecular manipulation in the future.
转录组分析揭示了ABA对色素代谢的影响,在成熟的番茄果实中黄酮类化合物和抗氧化剂
by Wangshu Mou, Dongdong Li, Zisheng Luo, Linchun Mao, Tiejin Ying
Abscisic acid (ABA) has been proven to be involved in the regulation of climacteric fruit ripening, but a comprehensive investigation of its influence on ripening related processes is still lacking. By applying the next generation sequencing technology, we conducted a comparative analysis of the effects of exogenous ABA and NDGA (Nordihydroguaiaretic acid, an inhibitor of ABA biosynthesis) on tomato fruit ripening. The high throughput sequencing results showed that out of the 25728 genes expressed across all three samples, 10388 were identified as significantly differently expressed genes. Exogenous ABA was found to enhance the transcription of genes involved in pigments metabolism, including carotenoids biosynthesis and chlorophyll degradation, whereas NDGA treatment inhibited these processes. The results also revealed the crucial role of ABA in flavonoids synthesis and regulation of antioxidant system. Intriguingly, we also found that an inhibition of endogenous ABA significantly enhanced the transcriptional abundance of genes involved in photosynthesis. Our results highlighted the significance of ABA in regulating tomato ripening, which provided insight into the regulatory mechanism of fruit maturation and senescence process.
在中国鸭基因组分析和冠状病毒主要监控
by Qing-Ye Zhuang, Kai-Cheng Wang, Shuo Liu, Guang-Yu Hou, Wen-Ming Jiang, Su-Chun Wang, Jin-Ping Li, Jian-Min Yu, Ji-Ming Chen
The genetic diversity, evolution, distribution, and taxonomy of some coronaviruses dominant in birds other than chickens remain enigmatic. In this study we sequenced the genome of a newly identified coronavirus dominant in ducks (DdCoV), and performed a large-scale surveillance of coronaviruses in chickens and ducks using a conserved RT-PCR assay. The viral genome harbors a tandem repeat which is rare in vertebrate RNA viruses. The repeat is homologous to some proteins of various cellular organisms, but its origin remains unknown. Many substitutions, insertions, deletions, and some frameshifts and recombination events have occurred in the genome of the DdCoV, as compared with the coronavirus dominant in chickens (CdCoV). The distances between DdCoV and CdCoV are large enough to separate them into different species within the genus Gammacoronavirus. Our surveillance demonstrated that DdCoVs and CdCoVs belong to different lineages and occupy different ecological niches, further supporting that they should be classified into different species. Our surveillance also demonstrated that DdCoVs and CdCoVs are prevalent in live poultry markets in some regions of China. In conclusion, this study shed novel insight into the genetic diversity, evolution, distribution, and taxonomy of the coronaviruses circulating in chickens and ducks.
酵母实验集锦人PML在内含子3和内含子6的复制叉蛋白作用的内在的基因组不稳定性
by Roland Chanet, Guy Kienda, Amélie Heneman-Masurel, Laurence Vernis, Bruno Cassinat, Philippe Guardiola, Pierre Fenaux, Christine Chomienne, Meng-Er Huang
Human acute promyelocytic leukemia (APL) is characterized by a specific balanced translocation t(15;17)(q22;q21) involving the PML and RARA genes. In both de novo and therapy-related APL, the most frequent PML breakpoints are located within intron 6, and less frequently in intron 3; the precise mechanisms by which these breakpoints arise and preferentially in PML intron 6 remain unsolved. To investigate the intrinsic properties of the PML intron sequences in vivo, we designed Saccharomyces cerevisiae strains containing human PML intron 6 or intron 3 sequences inserted in yeast chromosome V and measured gross chromosomal rearrangements (GCR). This approach provided evidence that intron 6 had a superior instability over intron 3 due to an intrinsic property of the sequence and identified the 3’ end of intron 6 as the most susceptible to break. Using yeast strains invalidated for genes that control DNA replication, we show that this differential instability depended at least upon Rrm3, a DNA helicase, and Mrc1, the human claspin homolog. GCR induction by hydrogen peroxide, a general genotoxic agent, was also dependent on genetic context. We conclude that: 1) this yeast system provides an alternative approach to study in detail the properties of human sequences in a genetically controlled situation and 2) the different susceptibility to produce DNA breaks in intron 6 versus intron 3 of the human PML gene is likely due to an intrinsic property of the sequence and is under replication fork genetic control.
白色和紫薯花青素苷合成相关基因的鉴定比较转录组分析
by Yuhui Liu, Kui Lin-Wang, Cecilia Deng, Ben Warran, Li Wang, Bin Yu, Hongyu Yang, Jing Wang, Richard V. Espley, Junlian Zhang, Di Wang, Andrew C. Allan
IntroductionThe potato (Solanum tuberosum) cultivar ‘Xin Daping’ is tetraploid with white skin and white flesh, while the cultivar ‘Hei Meiren’ is also tetraploid with purple skin and purple flesh. Comparative transcriptome analysis of white and purple cultivars was carried out using high-throughput RNA sequencing in order to further understand the mechanism of anthocyanin biosynthesis in potato.
Methods and ResultsBy aligning transcript reads to the recently published diploid potato genome and de novo assembly, 209 million paired-end Illumina RNA-seq reads from these tetraploid cultivars were assembled on to 60,930 transcripts, of which 27,754 (45.55%) are novel transcripts and 9393 alternative transcripts. Using a comparison of the RNA-sequence datasets, multiple versions of the genes encoding anthocyanin biosynthetic steps and regulatory transcription factors were identified. Other novel genes potentially involved in anthocyanin biosynthesis in potato tubers were also discovered. Real-time qPCR validation of candidate genes revealed good correlation with the transcriptome data. SNPs (Single Nucleotide Polymorphism) and indels were predicted and validated for the transcription factors MYB AN1 and bHLH1 and the biosynthetic gene anthocyanidin 3-O-glucosyltransferase (UFGT).
ConclusionsThese results contribute to our understanding of the molecular mechanism of white and purple potato development, by identifying differential responses of biosynthetic gene family members together with the variation in structural genes and transcription factors in this highly heterozygous crop. This provides an excellent platform and resource for future genetic and functional genomic research.
自然选择和人类非编码元素的功能电位显示下一代测序数据分析
by Pankaj Jha, Dongsheng Lu, Shuhua Xu
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution.
洋葱:功能方法学和转录组学数据的整合
by Monika Piwowar, Wiktor Jurkowski
To date, the massive quantity of data generated by high-throughput techniques has not yet met bioinformatics treatment required to make full use of it. This is partially due to a mismatch in experimental and analytical study design but primarily due to a lack of adequate analytical approaches. When integrating multiple data types e.g. transcriptomics and metabolomics, multidimensional statistical methods are currently the techniques of choice. Typical statistical approaches, such as canonical correlation analysis (CCA), that are applied to find associations between metabolites and genes are failing due to small numbers of observations (e.g. conditions, diet etc.) in comparison to data size (number of genes, metabolites). Modifications designed to cope with this issue are not ideal due to the need to add simulated data resulting in a lack of p-value computation or by pruning of variables hence losing potentially valid information. Instead, our approach makes use of verified or putative molecular interactions or functional association to guide analysis. The workflow includes dividing of data sets to reach the expected data structure, statistical analysis within groups and interpretation of results. By applying pathway and network analysis, data obtained by various platforms are grouped with moderate stringency to avoid functional bias. As a consequence CCA and other multivariate models can be applied to calculate robust statistics and provide easy to interpret associations between metabolites and genes to leverage understanding of metabolic response. Effective integration of lipidomics and transcriptomics is demonstrated on publically available murine nutrigenomics data sets. We are able to demonstrate that our approach improves detection of genes related to lipid metabolism, in comparison to applying statistics alone. This is measured by increased percentage of explained variance (95% vs. 75–80%) and by identifying new metabolite-gene associations related to lipid metabolism.
十二指肠组织从大肠杆菌F18抗性和敏感的断奶仔猪的蛋白质组学分析
by Zhengchang Wu, Riwei Xia, Xuemei Yin, Yongjiu Huo, Guoqiang Zhu, Shenglong Wu, Wenbin Bao
Diarrhea and edema disease in weaned piglets due to infection by Escherichia coli F18 is a leading cause of economic loss in the pig industry. Resistance to E. coli F18 depends on expression of receptors on intestinal epithelial cells, and individual immunity. This study was conducted in Sutai pig E. coli F18-resistant and -susceptible full sib-pair individuals, identified on the basis of resource populations and verification of adhesion assays. The molecular mechanism underlying E. coli F18 resistance was investigated through analysis of the expression of E. coli F18 receptor associated and innate immunity proteins, using proteomics and bioinformatics techniques. Two-dimensional electrophoresis analysis revealed a total of 20 differentially expressed proteins in E. coli F18-resistant and -susceptible groups (10 upregulated and 10 downregulated). A total of 16 differentially expressed proteins were identified by MALDI TOF/TOF mass spectral analysis. According to gene ontology and pathway analysis, differentially expressed proteins were mainly involved in cell adhesion, immune response and other biologically relevant functions. Network analysis of interactions between differentially expressed proteins indicated a likelihood of their involvement in E. coli F18 infection. The expression levels of several important proteins including actin beta (ACTB), vinculin (VCL), heat stress proteins (HSPs) and transferrin (TF) in E. coli F18-resistant and -susceptible individuals were verified by Western blotting, supporting the identification of ACTB, VCL, HSPs and TF as promising candidate proteins for association with E. coli F18 susceptibility.
强大的和自动化的密集的细胞核在不同的生物标本的视线分解三维分割线
Background: Due to the large amount of data produced by advanced microscopy, automated image analysis is crucial in modern biology. Most applications require reliable cell nuclei segmentation. However, in many biological specimens cell nuclei are densely packed and appear to touch one another in the images. Therefore, a major difficulty of three-dimensional cell nuclei segmentation is the decomposition of cell nuclei that apparently touch each other. Current methods are highly adapted to a certain biological specimen or a specific microscope. They do not ensure similarly accurate segmentation performance, i.e. their robustness for different datasets is not guaranteed. Hence, these methods require elaborate adjustments to each dataset. Results: We present an advanced three-dimensional cell nuclei segmentation algorithm that is accurate and robust. Our approach combines local adaptive pre-processing with decomposition based on Lines-of-Sight (LoS) to separate apparently touching cell nuclei into approximately convex parts. We demonstrate the superior performance of our algorithm using data from different specimens recorded with different microscopes. The three-dimensional images were recorded with confocal and light sheet-based fluorescence microscopes. The specimens are an early mouse embryo and two different cellular spheroids. We compared the segmentation accuracy of our algorithm with ground truth data for the test images and results from state-of-the-art methods. The analysis shows that our method is accurate throughout all test datasets (mean F-measure: 91 %) whereas the other methods each failed for at least one dataset (F-measure ≤ 69 %). Furthermore, nuclei volume measurements are improved for LoS decomposition. The state-of-the-art methods required laborious adjustments of parameter values to achieve these results. Our LoS algorithm did not require parameter value adjustments. The accurate performance was achieved with one fixed set of parameter values. Conclusion: We developed a novel and fully automated three-dimensional cell nuclei segmentation method incorporating LoS decomposition. LoS are easily accessible features that ensure correct splitting of apparently touching cell nuclei independent of their shape, size or intensity. Our method showed superior performance compared to state-of-the-art methods, performing accurately for a variety of test images. Hence, our LoS approach can be readily applied to quantitative evaluation in drug testing, developmental and cell biology.
人类多能干细胞[研究]谱系中定时复制和基因表达的动态变化
Duplication of the genome in mammalian cells occurs in a defined temporal order referred as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competence and changes in sub-nuclear position. We generated genome-wide RT profiles for 26 distinct human cell types including embryonic stem cell (hESC)-derived, primary cells and established cell lines representing intermediate stages of endoderm, mesoderm, ectoderm and neural crest (NC) development. We identified clusters of RDs that replicate at unique times in each stage (RT signatures) and confirmed global consolidation of the genome into larger synchronously replicating segments during differentiation. Surprisingly, transcriptome data revealed that the well-accepted correlation between early replication and transcriptional activity was restricted to RT-constitutive genes, whereas two thirds of the genes that switched RT during differentiation were strongly expressed when late replicating in one or more cell types. Closer inspection revealed that transcription of this class of genes was frequently restricted to the lineage in which the RT switch occurred, but was induced prior to a late to early RT switch and/or down-regulated after an early to late RT switch. Analysis of transcriptional regulatory networks showed that this class of genes contains strong regulators of genes that were only expressed when early replicating. These results provide intriguing new insight into the complex relationship between transcription and RT regulation during human development.
揭露剪接蛋白编码外显子的定义及其在蛋白质组exitrons可塑性[研究]的角色里面
Alternative splicing (AS) diversifies transcriptomes and proteomes and is widely recognized as a key mechanism for regulating gene expression. Previously, in an analysis of intron retention events in Arabidopsis, we found unusual AS events inside annotated protein-coding exons. Here, we also identify such AS events in human and use these two sets to analyse their features, regulation, functional impact, and evolutionary origin. As these events involve introns with features of both introns and protein-coding exons, we name them exitrons (exonic introns). Though exitrons were detected as a subset of retained introns, they are clearly distinguishable, and their splicing results in transcripts with different fates. About half of the 1002 Arabidopsis and 923 human exitrons have sizes of multiples of 3 nucleotides (nt). Splicing of these exitrons results in internally deleted proteins and affects protein domains, disordered regions, and various post-translational modification sites, thus broadly impacting protein function. Exitron splicing is regulated across tissues, in response to stress and in carcinogenesis. Intriguingly, annotated intronless genes can be also alternatively spliced via exitron usage. We demonstrate that at least some exitrons originate from ancestral coding exons. Based on our findings, we propose a "splicing memory" hypothesis whereby upon intron loss imprints of former exon borders defined by vestigial splicing regulatory elements could drive the evolution of exitron splicing. Altogether, our studies show that exitron splicing is a conserved strategy for increasing proteome plasticity in plants and animals, complementing the repertoire of AS events.
用人类和微生物基因组的种群在基因生物检测[资源]
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.
协调组织特异性调节相邻的替代3’剪接位点线虫[研究]
Adjacent alternative 3' splice sites, those separated by ≤18 nucleotides, provide a unique problem in the study of alternative splicing regulation; there is overlap of the cis-elements that define the adjacent sites. Identification of the intron's 3' end depends upon sequence elements that define the branchpoint, polypyrimidine tract, and terminal AG dinucleotide. Starting with RNA-seq data from germline-enriched and somatic cell-enriched Caenorhabditis elegans samples, we identify hundreds of introns with adjacent alternative 3' splice sites. We identify 203 events that undergo tissue-specific alternative splicing. For these, the regulation is monodirectional, with somatic cells preferring to splice at the distal 3' splice site (furthest from the 5' end of the intron) and germline cells showing a distinct shift toward usage of the adjacent proximal 3' splice site (closer to the 5' end of the intron). Splicing patterns in somatic cells follow C. elegans consensus rules of 3' splice site definition; a short stretch of pyrimidines preceding an AG dinucleotide. Splicing in germline cells occurs at proximal 3' splice sites that lack a preceding polypyrimidine tract, and in three instances the germline-specific site lacks the AG dinucleotide. We provide evidence that use of germline-specific proximal 3' splice sites is conserved across Caenorhabditis species. We propose that there are differences between germline and somatic cells in the way that the basal splicing machinery functions to determine the intron terminus.
DNA双链发夹稳定主要是通过促进熔化而不是通过抑制杂交
The effect of secondary structure on DNA duplex formation is poorly understood. Using oxDNA, a nucleotide level coarse-grained model of DNA, we study how hairpins influence the rate and reaction pathways of DNA hybridzation. We compare to experimental systems studied by Gao et al. (1) and find that 3-base pair hairpins reduce the hybridization rate by a factor of 2, and 4-base pair hairpins by a factor of 10, compared to DNA with limited secondary structure, which is in good agreement with experiments. By contrast, melting rates are accelerated by factors of ~100 and ~2000. This surprisingly large speed-up occurs because hairpins form during the melting process, and significantly lower the free energy barrier for dissociation. These results should assist experimentalists in designing sequences to be used in DNA nanotechnology, by putting limits on the suppression of hybridization reaction rates through the use of hairpins and offering the possibility of deliberately increasing dissociation rates by incorporating hairpins into single strands.
共享资源,共享成本——利用生物精选资源
The manual curation of the information in biomedical resources is an expensive task. This article argues the value of this approach in comparison with other apparently less costly options, such as automated annotation or text-mining, then discusses ways in which databases can make cost savings by sharing infrastructure and tool development. Sharing curation effort is a model already being adopted by several data resources. Approaches taken by two of these, the Gene Ontology annotation effort and the IntAct molecular interaction database, are reviewed in more detail. These models help to ensure long-term persistence of curated data and minimizes redundant development of resources by multiple disparate groups.
Database URL: http://www.ebi.ac.uk/intact and http://www.ebi.ac.uk/GOA/
在编码DCC本体的应用
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.
Database URL: https://www.encodeproject.org/
genestoryteller:一个快速和全面的人类基因信息检索的移动应用程序
In the last few years, mobile devices such as smartphones and tablets have become an integral part of everyday life, due to their software/hardware rapid development, as well as the increased portability they offer. Nevertheless, up to now, only few Apps have been developed in the field of bioinformatics, capable to perform fast and robust access to services. We have developed the GeneStoryTeller, a mobile application for Android platforms, where users are able to instantly retrieve information regarding any recorded human gene, derived from eight publicly available databases, as a summary story. Complementary information regarding gene–drugs interactions, functional annotation and disease associations for each selected gene is also provided in the gene story. The most challenging part during the development of the GeneStoryTeller was to keep balance between storing data locally within the app and obtaining the updated content dynamically via a network connection. This was accomplished with the implementation of an administrative site where data are curated and synchronized with the application requiring a minimum human intervention.
Database URL: http://bioserver-3.bioacademy.gr/Bioserver/GeneStoryTeller/.
一个埃博拉病毒为中心的知识库
Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard.
Database URL: http://ebola.semanticscience.org.
为计算分析一致的文档的一般概念
The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields.
Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip