Databases & Softw...
Follow
Find
4.1K views | +1 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

Overview — A Python Environment for (phylogenetic) Tree Exploration

Overview — A Python Environment for (phylogenetic) Tree Exploration | Databases & Softwares | Scoop.it

Abstract

Mushrooms are high in protein content, which makes them potentially a good source of antihypertensive peptides. Among the mushrooms tested, protein extracts from Pleurotus cystidiosus (E1Pc and E5Pc) and Agaricus bisporus (E1Ab and E3Ab) had high levels of antihypertensive activity. The protein extracts were fractionated by reverse-phase high performance liquid chromatography (RPHPLC) into six fractions. Fraction 3 from E5Pc (E5PcF3) and fraction 6 from E3Ab (E3AbF6) had the highest antihypertensive activity. SDS PAGE analysis showed E5PcF3 consisted mainly of low molecular weight proteins while E3AbF6 contained a variety of high to low molecular weight proteins. There were 22 protein clusters detected by SELDI-TOF-MS analysis with five common peaks found in E5PcF3 and E3AbF6, which had m/z values in the range of 3940 to 11413. This study suggests that the antihypertensive activity in the two mushroom species could be due to proteins with molecular weight ranging from 3 to 10 kDa.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BFA: phenotype prediction integrating metabolic models with constraints derived from experimental data

Background

Flux analysis methods lie at the core of Metabolic Engineering (ME), providing methods for phenotype simulation that allow the determination of flux distributions under different conditions. Although many constraint-based modeling software tools have been developed and published, none provides a free user-friendly application that makes available the full portfolio of flux analysis methods.
Results

This work presents Constraint-based Flux Analysis (CBFA), an open-source software application for flux analysis in metabolic models that implements several methods for phenotype prediction, allowing users to define constraints associated with measured fluxes and/or flux ratios, together with environmental conditions (e.g. media) and reaction/gene knockouts. CBFA identifies the set of applicable methods based on the constraints defined from user inputs, encompassing algebraic and constraint-based simulation methods. The integration of CBFA within the OptFlux framework for ME enables the utilization of different model formats and standards and the integration with complementary methods for phenotype simulation and visualization of results.
Conclusions

A general-purpose and flexible application is proposed that is independent of the origin of the constraints defined for a given simulation. The aim is to provide a simple to use software tool focused on the application of several flux prediction methods.
Biswapriya Biswavas Misra's insight:
Background

Flux analysis methods lie at the core of Metabolic Engineering (ME), providing methods for phenotype simulation that allow the determination of flux distributions under different conditions. Although many constraint-based modeling software tools have been developed and published, none provides a free user-friendly application that makes available the full portfolio of flux analysis methods.

Results

This work presents Constraint-based Flux Analysis (CBFA), an open-source software application for flux analysis in metabolic models that implements several methods for phenotype prediction, allowing users to define constraints associated with measured fluxes and/or flux ratios, together with environmental conditions (e.g. media) and reaction/gene knockouts. CBFA identifies the set of applicable methods based on the constraints defined from user inputs, encompassing algebraic and constraint-based simulation methods. The integration of CBFA within the OptFlux framework for ME enables the utilization of different model formats and standards and the integration with complementary methods for phenotype simulation and visualization of results.

Conclusions

A general-purpose and flexible application is proposed that is independent of the origin of the constraints defined for a given simulation. The aim is to provide a simple to use software tool focused on the application of several flux prediction methods.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

solGS: a web-based tool for genomic selection

Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, analysis, and sharing. A bioinformatics infrastructure for data storage and access, and user-friendly web-based tool for analysis and sharing output is needed to make GS more practical for breeders.
Biswapriya Biswavas Misra's insight:
Background

Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, analysis, and sharing. A bioinformatics infrastructure for data storage and access, and user-friendly web-based tool for analysis and sharing output is needed to make GS more practical for breeders.

Results

We have developed a web-based tool, called solGS, for predicting genomic estimated breeding values (GEBVs) of individuals, using a Ridge-Regression Best Linear Unbiased Predictor (RR-BLUP) model. It has an intuitive web-interface for selecting a training population for modeling and estimating genomic estimated breeding values of selection candidates. It estimates phenotypic correlation and heritability of traits and selection indices of individuals. Raw data is stored in a generic database schema, Chado Natural Diversity, co-developed by multiple database groups. Analysis output is graphically visualized and can be interactively explored online or downloaded in text format. An instance of its implementation can be accessed at the NEXTGEN Cassava breeding database, http://cassavabase.org/solgs webcite.

Conclusions

solGS enables breeders to store raw data and estimate GEBVs of individuals online, in an intuitive and interactive workflow. It can be adapted to any breeding program.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Statistical significance of variables driving systematic variation in high-dimensional data

Statistical significance of variables driving systematic variation in high-dimensional data | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Motivation: There are a number of well-established methods such as principal component analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the genomic variables that are drivers of systematic variation captured by PCA. Principal components (PCs) (and other estimates of systematic variation) are directly constructed from the genomic variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting.

Results: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs. The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables. Using simulation, we demonstrate that our method attains accurate measures of statistical significance over a range of relevant scenarios. We consider yeast cell-cycle gene expression data, and show that the proposed method can be used to straightforwardly identify genes that are cell-cycle regulated with an accurate measure of statistical significance. We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype. Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype. The proposed method provides a useful bridge between large-scale quantifications of systematic variation and gene-level significance analyses.

Availability and implementation: An R software package, called jackstraw, is available in CRAN.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

goldilocks 0.0.52 : Python Package Index

quality
Biswapriya Biswavas Misra's insight:

Goldilocks is a Python package providing functionality for locating 'interesting' genomic regions for some definition of 'interesting'. You can import it to your scripts, pass it sequence data and search for subsequences that match some criteria across one or more samples.

Goldilocks was developed to support our work in the investigation of quality control for genetic sequencing. It was used to quickly locate regions on the human genome that expressed a desired level of variability, which were "just right" for later variant calling and comparison.

The package has since been made more flexible and can be used to find regions of interest based on other criteria such as GC-content, density of target k-mers, defined confidence metrics and missing nucleotides

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SteatoNet: The First Integrated Human Metabolic Model with Multi-layered Regulation to Investigate Liver-Associated Pathologies

SteatoNet: The First Integrated Human Metabolic Model with Multi-layered Regulation to Investigate Liver-Associated Pathologies | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

Current state-of-the-art mathematical models to investigate complex biological processes, in particular liver-associated pathologies, have limited expansiveness, flexibility, representation of integrated regulation and rely on the availability of detailed kinetic data. We generated the SteatoNet, a multi-pathway, multi-tissue model and in silico platform to investigate hepatic metabolism and its associated deregulations. SteatoNet is based on object-oriented modelling, an approach most commonly applied in automotive and process industries, whereby individual objects correspond to functional entities. Objects were compiled to feature two novel hepatic modelling aspects: the interaction of hepatic metabolic pathways with extra-hepatic tissues and the inclusion of transcriptional and post-transcriptional regulation. SteatoNet identification at normalised steady state circumvents the need for constraining kinetic parameters. Validation and identification of flux disturbances that have been proven experimentally in liver patients and animal models highlights the ability of SteatoNet to effectively describe biological behaviour. SteatoNet identifies crucial pathway branches (transport of glucose, lipids and ketone bodies) where changes in flux distribution drive the healthy liver towards hepatic steatosis, the primary stage of non-alcoholic fatty liver disease. Cholesterol metabolism and its transcription regulators are highlighted as novel steatosis factors. SteatoNet thus serves as an intuitive in silico platform to identify systemic changes associated with complex hepatic metabolic disorders.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis

A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis | Databases & Softwares | Scoop.it
BGC
Biswapriya Biswavas Misra's insight:

Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.

more...
Ken D'Amato's curator insight, December 17, 8:57 PM

This fit well with the keynote at AU!

Scooped by Biswapriya Biswavas Misra
Scoop.it!

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets

In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.

Results

We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.

Conclusion

Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Medicago truncatula Genome Database | Drupal

Medicago truncatula Genome Database | Drupal | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. JCVI (formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The web site (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant “mines” like ThaleMine and PhytoMine, and other Model Organism Databases (MODs). In addition to these new features, we continue to provide keyword and locus identifier based searches served via a Chado-backed Tripal Instance, a BLAST search interface, and bulk downloads of datasets from the iPlant Data Store (iDS). Finally, we maintain an email helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific datasets from the community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GBshape: a genome browser database for DNA shape annotations

GBshape: a genome browser database for DNA shape annotations | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew

Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew | Databases & Softwares | Scoop.it
The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.
Biswapriya Biswavas Misra's insight:

The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Virology and Bioinformatics from Virology.ca
Scoop.it!

AliView: a fast and lightweight alignment viewer and editor for large data sets

AliView: a fast and lightweight alignment viewer and editor for large data sets | Databases & Softwares | Scoop.it

AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview CONTACT: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.


Via Chris Upton + helpers
more...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The big data challenges of connectomics

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.
Biswapriya Biswavas Misra's insight:

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

The 5000 arthropod genomes initiative (i5k) has tasked itself with coordinating the sequencing of 5000 insect or related arthropod genomes. The resulting influx of data, mostly from small research groups or communities with little bioinformatics experience, will require visualization, dissemination and curation, preferably from a centralized platform. The National Agricultural Library (NAL) has implemented the i5k Workspace@NAL (http://i5k.nal.usda.gov/) to help meet the i5k initiative's genome hosting needs. Any i5k member is encouraged to contact the i5k Workspace with their genome project details. Once submitted, new content will be accessible via organism pages, genome browsers and BLAST search engines, which are implemented via the open-source Tripal framework, a web interface for the underlying Chado database schema. We also implement the Web Apollo software for groups that choose to curate gene models. New content will add to the existing body of 35 arthropod species, which include species relevant for many aspects of arthropod genomic research, including agriculture, invasion biology, systematics, ecology and evolution, and developmental research.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CDvist: a webserver for identification and visualization of conserved domains in protein sequences

CDvist: a webserver for identification and visualization of conserved domains in protein sequences | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Summary: Identification of domains in protein sequences allows their assigning to biological functions. Several webservers exist for identification of protein domains using similarity searches against various databases of protein domain models. However, none of them provides comprehensive domain coverage while allowing bulk querying and their visualization schemes can be improved. To address these issues we developed CDvist (a comprehensive domain visualization tool), which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting in a best possible comprehensive coverage. Bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DGIdb 1.66 – Rails Frontend to The Genome Institute’s Drug Gene Interaction Database

DGIdb 1.66 – Rails Frontend to The Genome Institute’s Drug Gene Interaction Database | Databases & Softwares | Scoop.it
DGIdb 1.66
:: DESCRIPTION

The DGIdb (Drug-Gene Interaction database) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development.
:
Biswapriya Biswavas Misra's insight:

The DGIdb (Drug-Gene Interaction database) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MOROKOSHI: Transcriptome Database in Sorghum bicolor

MOROKOSHI: Transcriptome Database in Sorghum bicolor | Databases & Softwares | Scoop.it
In transcriptome analysis, accurate annotation of each transcriptional unit and its expression profile is essential. A full-length cDNA (FL-cDNA) collection facilitates the refinement of transcriptional annotation and accurate transcription start sites help to unravel transcriptional regulation. We constructed a normalized FL-cDNA library from eight growth stages of aerial tissues in Sorghum bicolor and isolated 37,607 clones. These clones were Sanger sequenced from both the 5' and 3' ends and in total 38,981 high-quality expressed sequence tags (ESTs) were obtained. About one-third of the transcripts of known genes were captured as FL-cDNA clone resources. In addition to these we also annotated 272 novel genes, 323 antisense transcripts and 1672 candidate isoforms. These clones are available from the RIKEN Bioresource Center.

After obtaining accurate annotation of transcriptional units, we performed expression profile analysis. We carried out spikelet-, seed- and stem-specific RNA-Seq analysis and confirmed the expression of 70.6% of the newly identified genes. We also downloaded 23 sorghum RNA-Seq samples that are publicly available and these are shown on a genome browser together with our original FL-cDNA and RNA-Seq data. Using our original and publicly available data, we made an expression profile of each gene and identified the top 20 genes with the most similar expression. In addition, we visualized their relationships in gene co-expression networks.

Users can access and compare various transcriptome data from Sorghum bicolor at http://sorghum.riken.jp.
Biswapriya Biswavas Misra's insight:

In transcriptome analysis, accurate annotation of each transcriptional unit and its expression profile is essential. A full-length cDNA (FL-cDNA) collection facilitates the refinement of transcriptional annotation and accurate transcription start sites help to unravel transcriptional regulation. We constructed a normalized FL-cDNA library from eight growth stages of aerial tissues in Sorghum bicolor and isolated 37,607 clones. These clones were Sanger sequenced from both the 5' and 3' ends and in total 38,981 high-quality expressed sequence tags (ESTs) were obtained. About one-third of the transcripts of known genes were captured as FL-cDNA clone resources. In addition to these we also annotated 272 novel genes, 323 antisense transcripts and 1672 candidate isoforms. These clones are available from the RIKEN Bioresource Center.

After obtaining accurate annotation of transcriptional units, we performed expression profile analysis. We carried out spikelet-, seed- and stem-specific RNA-Seq analysis and confirmed the expression of 70.6% of the newly identified genes. We also downloaded 23 sorghum RNA-Seq samples that are publicly available and these are shown on a genome browser together with our original FL-cDNA and RNA-Seq data. Using our original and publicly available data, we made an expression profile of each gene and identified the top 20 genes with the most similar expression. In addition, we visualized their relationships in gene co-expression networks.

Users can access and compare various transcriptome data from Sorghum bicolor at http://sorghum.riken.jp.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Developmental Self-Construction and -Configuration of Functional Neocortical Neuronal Networks

Developmental Self-Construction and -Configuration of Functional Neocortical Neuronal Networks | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

The prenatal development of neural circuits must provide sufficient configuration to support at least a set of core postnatal behaviors. Although knowledge of various genetic and cellular aspects of development is accumulating rapidly, there is less systematic understanding of how these various processes play together in order to construct such functional networks. Here we make some steps toward such understanding by demonstrating through detailed simulations how a competitive co-operative (‘winner-take-all’, WTA) network architecture can arise by development from a single precursor cell. This precursor is granted a simplified gene regulatory network that directs cell mitosis, differentiation, migration, neurite outgrowth and synaptogenesis. Once initial axonal connection patterns are established, their synaptic weights undergo homeostatic unsupervised learning that is shaped by wave-like input patterns. We demonstrate how this autonomous genetically directed developmental sequence can give rise to self-calibrated WTA networks, and compare our simulation results with biological data.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ECOD: An Evolutionary Classification of Protein Domains

ECOD: An Evolutionary Classification of Protein Domains | Databases & Softwares | Scoop.it
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
Biswapriya Biswavas Misra's insight:

Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HotSpotter: efficient visualization of driver mutations

Driver mutations are positively selected during the evolution of cancers. The relative frequency of a particular mutation within a gene is typically used as a criterion for identifying a driver mutation. However, driver mutations may occur with relative infrequency at a particular site, but cluster within a region of the gene. When analyzing across different cancers, particular mutation sites or mutations within a particular region of the gene may be of relatively low frequency in some cancers, but still provide selective growth advantage.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Driver mutations are positively selected during the evolution of cancers. The relative frequency of a particular mutation within a gene is typically used as a criterion for identifying a driver mutation. However, driver mutations may occur with relative infrequency at a particular site, but cluster within a region of the gene. When analyzing across different cancers, particular mutation sites or mutations within a particular region of the gene may be of relatively low frequency in some cancers, but still provide selective growth advantage.

Results

This paper presents a method that allows rapid and easy visualization of mutation data sets and identification of potential gene mutation hotspot sites and/or regions. As an example, we identified hotspot regions in the NFE2L2 gene that are potentially functionally relevant in endometrial cancer, but would be missed using other analyses.

Conclusions

HotSpotter is a quick, easy-to-use visualization tool that delivers gene identities with associated mutation locations and frequencies overlaid upon a large cancer mutation reference set. This allows the user to identify potential driver mutations that are less frequent in a cancer or are localized in a hotspot region of relatively infrequent mutations.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities

BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:

Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks) are shared across all groups (microbiome samples), but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems). The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD) patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD-associated communities might arise from opportunist growth of bacteria that can circumvent the host's nutrient-based mechanism for bacterial partner selection.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RaftProt: mammalian lipid raft proteome database

RaftProt: mammalian lipid raft proteome database | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

RaftProt (http://lipid-raft-database.di.uq.edu.au/) is a database of mammalian lipid raft-associated proteins as reported in high-throughput mass spectrometry studies. Lipid rafts are specialized membrane microdomains enriched in cholesterol and sphingolipids thought to act as dynamic signalling and sorting platforms. Given their fundamental roles in cellular regulation, there is a plethora of information on the size, composition and regulation of these membrane microdomains, including a large number of proteomics studies. To facilitate the mining and analysis of published lipid raft proteomics studies, we have developed a searchable database RaftProt. In addition to browsing the studies, performing basic queries by protein and gene names, searching experiments by cell, tissue and organisms; we have implemented several advanced features to facilitate data mining. To address the issue of potential bias due to biochemical preparation procedures used, we have captured the lipid raft preparation methods and implemented advanced search option for methodology and sample treatment conditions, such as cholesterol depletion. Furthermore, we have identified a list of high confidence proteins, and enabled searching only from this list of likely bona fide lipid raft proteins. Given the apparent biological importance of lipid raft and their associated proteins, this database would constitute a key resource for the scientific community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SeqFindr 0.32.2 : Python Package Index

SeqFindr 0.32.2 : Python Package Index | Databases & Softwares | Scoop.it
SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.
Biswapriya Biswavas Misra's insight:

SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Bayesian transcriptome assembly

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.
Biswapriya Biswavas Misra's insight:

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.
Biswapriya Biswavas Misra's insight:

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.

more...
No comment yet.