Databases & Softw...
Follow
Find
4.1K views | +0 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

PLOS Collections: Exploring Massive, Genome Scale Datasets with the GenometriCorr Package

PLOS Collections: Exploring Massive, Genome Scale Datasets with the GenometriCorr Package | Databases & Softwares | Scoop.it

Abstract Top

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets.

Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SteatoNet: The First Integrated Human Metabolic Model with Multi-layered Regulation to Investigate Liver-Associated Pathologies

SteatoNet: The First Integrated Human Metabolic Model with Multi-layered Regulation to Investigate Liver-Associated Pathologies | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

Current state-of-the-art mathematical models to investigate complex biological processes, in particular liver-associated pathologies, have limited expansiveness, flexibility, representation of integrated regulation and rely on the availability of detailed kinetic data. We generated the SteatoNet, a multi-pathway, multi-tissue model and in silico platform to investigate hepatic metabolism and its associated deregulations. SteatoNet is based on object-oriented modelling, an approach most commonly applied in automotive and process industries, whereby individual objects correspond to functional entities. Objects were compiled to feature two novel hepatic modelling aspects: the interaction of hepatic metabolic pathways with extra-hepatic tissues and the inclusion of transcriptional and post-transcriptional regulation. SteatoNet identification at normalised steady state circumvents the need for constraining kinetic parameters. Validation and identification of flux disturbances that have been proven experimentally in liver patients and animal models highlights the ability of SteatoNet to effectively describe biological behaviour. SteatoNet identifies crucial pathway branches (transport of glucose, lipids and ketone bodies) where changes in flux distribution drive the healthy liver towards hepatic steatosis, the primary stage of non-alcoholic fatty liver disease. Cholesterol metabolism and its transcription regulators are highlighted as novel steatosis factors. SteatoNet thus serves as an intuitive in silico platform to identify systemic changes associated with complex hepatic metabolic disorders.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis

A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis | Databases & Softwares | Scoop.it
BGC
Biswapriya Biswavas Misra's insight:

Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

more...
Ken D'Amato's curator insight, December 17, 8:57 PM

This fit well with the keynote at AU!

Scooped by Biswapriya Biswavas Misra
Scoop.it!

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets

In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.

Results

We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.

Conclusion

Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Medicago truncatula Genome Database | Drupal

Medicago truncatula Genome Database | Drupal | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. JCVI (formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The web site (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant “mines” like ThaleMine and PhytoMine, and other Model Organism Databases (MODs). In addition to these new features, we continue to provide keyword and locus identifier based searches served via a Chado-backed Tripal Instance, a BLAST search interface, and bulk downloads of datasets from the iPlant Data Store (iDS). Finally, we maintain an email helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific datasets from the community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GBshape: a genome browser database for DNA shape annotations

GBshape: a genome browser database for DNA shape annotations | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew

Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew | Databases & Softwares | Scoop.it
The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.
Biswapriya Biswavas Misra's insight:

The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Virology and Bioinformatics from Virology.ca
Scoop.it!

AliView: a fast and lightweight alignment viewer and editor for large data sets

AliView: a fast and lightweight alignment viewer and editor for large data sets | Databases & Softwares | Scoop.it

AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview CONTACT: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.


Via Chris Upton + helpers
more...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The big data challenges of connectomics

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.
Biswapriya Biswavas Misra's insight:

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

The 5000 arthropod genomes initiative (i5k) has tasked itself with coordinating the sequencing of 5000 insect or related arthropod genomes. The resulting influx of data, mostly from small research groups or communities with little bioinformatics experience, will require visualization, dissemination and curation, preferably from a centralized platform. The National Agricultural Library (NAL) has implemented the i5k Workspace@NAL (http://i5k.nal.usda.gov/) to help meet the i5k initiative's genome hosting needs. Any i5k member is encouraged to contact the i5k Workspace with their genome project details. Once submitted, new content will be accessible via organism pages, genome browsers and BLAST search engines, which are implemented via the open-source Tripal framework, a web interface for the underlying Chado database schema. We also implement the Web Apollo software for groups that choose to curate gene models. New content will add to the existing body of 35 arthropod species, which include species relevant for many aspects of arthropod genomic research, including agriculture, invasion biology, systematics, ecology and evolution, and developmental research.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ProteoAnnotator - Open Source Proteogenomics Annotation Software Supporting PSI Standards.

The recent massive increase in capability for sequencing genomes is producing enormous advances in our understanding of biological systems. However, there is a bottleneck in genome annotation - determining the structure of all transcribed genes. Experimental data from MS studies can play a major role in confirming and correcting gene structure - proteogenomics. However, there are some technical and practical challenges to overcome, since proteogenomics requires pipelines comprising a complex set of interconnected modules as well as bespoke routines, for example in protein inference and statistics. We are introducing a complete, open source pipeline for proteogenomics, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools. We have developed new modules for pre-processing and combining multiple search databases, for performing peptide-level statistics on mzIdentML files, for scoring grouped protein identifications matched to a given genomic locus to validate that updates to the official gene models are statistically sound, and for mapping end results back onto the genome. ProteoAnnotator is available from http://www.proteoannotator.org/. This article is protected by copyright. All rights reserved.
Biswapriya Biswavas Misra's insight:

The recent massive increase in capability for sequencing genomes is producing enormous advances in our understanding of biological systems. However, there is a bottleneck in genome annotation - determining the structure of all transcribed genes. Experimental data from MS studies can play a major role in confirming and correcting gene structure - proteogenomics. However, there are some technical and practical challenges to overcome, since proteogenomics requires pipelines comprising a complex set of interconnected modules as well as bespoke routines, for example in protein inference and statistics. We are introducing a complete, open source pipeline for proteogenomics, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools. We have developed new modules for pre-processing and combining multiple search databases, for performing peptide-level statistics on mzIdentML files, for scoring grouped protein identifications matched to a given genomic locus to validate that updates to the official gene models are statistically sound, and for mapping end results back onto the genome. ProteoAnnotator is available from http://www.proteoannotator.org/. This article is protected by copyright. All rights reserved.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification | Databases & Softwares | Scoop.it
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
Biswapriya Biswavas Misra's insight:

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

XGlycScan: An Open-source Software For N-linked Glycosite Assignmen... - PubMed - NCBI

Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.
Biswapriya Biswavas Misra's insight:

Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

canSAR: updated cancer research and drug discovery knowledgebase.

Nucleic Acids Res. 2014 Jan;42(Database issue):D1040-7. doi: 10.1093/nar/gkt1182. Epub 2013 Dec 3. Research Support, Non-U.S. Gov't
Biswapriya Biswavas Misra's insight:

canSAR (http://cansar.icr.ac.uk) is a public integrative cancer-focused knowledgebase for the support of cancer translational research and drug discovery. Through the integration of biological, pharmacological, chemical, structural biology and protein network data, it provides a single information portal to answer complex multidisciplinary questions including--among many others--what is known about a protein, in which cancers is it expressed or mutated, and what chemical tools and cell line models can be used to experimentally probe its activity? What is known about a drug, its cellular sensitivity profile and what proteins is it known to bind that may explain unusual bioactivity? Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities and new target, cancer cell line, protein family and 3D structure summaries and tools.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Developmental Self-Construction and -Configuration of Functional Neocortical Neuronal Networks

Developmental Self-Construction and -Configuration of Functional Neocortical Neuronal Networks | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

The prenatal development of neural circuits must provide sufficient configuration to support at least a set of core postnatal behaviors. Although knowledge of various genetic and cellular aspects of development is accumulating rapidly, there is less systematic understanding of how these various processes play together in order to construct such functional networks. Here we make some steps toward such understanding by demonstrating through detailed simulations how a competitive co-operative (‘winner-take-all’, WTA) network architecture can arise by development from a single precursor cell. This precursor is granted a simplified gene regulatory network that directs cell mitosis, differentiation, migration, neurite outgrowth and synaptogenesis. Once initial axonal connection patterns are established, their synaptic weights undergo homeostatic unsupervised learning that is shaped by wave-like input patterns. We demonstrate how this autonomous genetically directed developmental sequence can give rise to self-calibrated WTA networks, and compare our simulation results with biological data.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ECOD: An Evolutionary Classification of Protein Domains

ECOD: An Evolutionary Classification of Protein Domains | Databases & Softwares | Scoop.it
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
Biswapriya Biswavas Misra's insight:

Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HotSpotter: efficient visualization of driver mutations

Driver mutations are positively selected during the evolution of cancers. The relative frequency of a particular mutation within a gene is typically used as a criterion for identifying a driver mutation. However, driver mutations may occur with relative infrequency at a particular site, but cluster within a region of the gene. When analyzing across different cancers, particular mutation sites or mutations within a particular region of the gene may be of relatively low frequency in some cancers, but still provide selective growth advantage.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Driver mutations are positively selected during the evolution of cancers. The relative frequency of a particular mutation within a gene is typically used as a criterion for identifying a driver mutation. However, driver mutations may occur with relative infrequency at a particular site, but cluster within a region of the gene. When analyzing across different cancers, particular mutation sites or mutations within a particular region of the gene may be of relatively low frequency in some cancers, but still provide selective growth advantage.

Results

This paper presents a method that allows rapid and easy visualization of mutation data sets and identification of potential gene mutation hotspot sites and/or regions. As an example, we identified hotspot regions in the NFE2L2 gene that are potentially functionally relevant in endometrial cancer, but would be missed using other analyses.

Conclusions

HotSpotter is a quick, easy-to-use visualization tool that delivers gene identities with associated mutation locations and frequencies overlaid upon a large cancer mutation reference set. This allows the user to identify potential driver mutations that are less frequent in a cancer or are localized in a hotspot region of relatively infrequent mutations.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities

BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks) are shared across all groups (microbiome samples), but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems). The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD) patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD-associated communities might arise from opportunist growth of bacteria that can circumvent the host's nutrient-based mechanism for bacterial partner selection.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RaftProt: mammalian lipid raft proteome database

RaftProt: mammalian lipid raft proteome database | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

RaftProt (http://lipid-raft-database.di.uq.edu.au/) is a database of mammalian lipid raft-associated proteins as reported in high-throughput mass spectrometry studies. Lipid rafts are specialized membrane microdomains enriched in cholesterol and sphingolipids thought to act as dynamic signalling and sorting platforms. Given their fundamental roles in cellular regulation, there is a plethora of information on the size, composition and regulation of these membrane microdomains, including a large number of proteomics studies. To facilitate the mining and analysis of published lipid raft proteomics studies, we have developed a searchable database RaftProt. In addition to browsing the studies, performing basic queries by protein and gene names, searching experiments by cell, tissue and organisms; we have implemented several advanced features to facilitate data mining. To address the issue of potential bias due to biochemical preparation procedures used, we have captured the lipid raft preparation methods and implemented advanced search option for methodology and sample treatment conditions, such as cholesterol depletion. Furthermore, we have identified a list of high confidence proteins, and enabled searching only from this list of likely bona fide lipid raft proteins. Given the apparent biological importance of lipid raft and their associated proteins, this database would constitute a key resource for the scientific community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SeqFindr 0.32.2 : Python Package Index

SeqFindr 0.32.2 : Python Package Index | Databases & Softwares | Scoop.it
SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.
Biswapriya Biswavas Misra's insight:

SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Bayesian transcriptome assembly

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.
Biswapriya Biswavas Misra's insight:

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.
Biswapriya Biswavas Misra's insight:

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HPD: an online integrated human pathway database enabling systems biology studies. : Chowbina, Sudhir R

This article is from BMC Bioinformatics, volume 10.AbstractBackground: Pathway-oriented experimental and computational studies have led to a significant...
Biswapriya Biswavas Misra's insight:

Background: Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities. Results: We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies. Conclusion: HPD http://bio.informatics.iupui.edu/HPD is a new resource for searching, managing, and studying human biological pathways. Users of HPD can search against large collections of human biological pathways, compare related pathways and their molecular entity compositions, and build high-quality, expanded-scope disease pathway models. The current HPD software can help users address a wide range of pathway-related questions in human disease biology studies.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events

Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events | Databases & Softwares | Scoop.it
The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90 696 proteins, 165 044 N-termini, 130 182 C-termini, 14 382 cleavage sites and 33 209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.
Biswapriya Biswavas Misra's insight:

The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90 696 proteins, 165 044 N-termini, 130 182 C-termini, 14 382 cleavage sites and 33 209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) Plugin | BioGPS

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation. https://decipher.sanger.ac.uk/
Biswapriya Biswavas Misra's insight:

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation. https://decipher.sanger.ac.uk/

more...
No comment yet.