Databases & Softw...
Follow
Find
4.3K views | +1 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

FOG 1.0 / webFOG - A tool to Map Genomic Features on to Genes

FOG 1.0 / webFOG - A tool to Map Genomic Features on to Genes | Databases & Softwares | Scoop.it

:: DESCRIPTION

FOG can help mapping important genomic features to the latest version of the human genome and also to annotate new features. such as miRNAs, microarray primers or probes, Chip-on-Chip data, CpG islands and SNPs to name a few.

more...
No comment yet.

From around the web

Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

In-source fragmentation and correlation analysis as tools for metabolite identification exemplified with CE-TOF untargeted metabolomics

In-source fragmentation and correlation analysis as tools for metabolite identification exemplified with CE-TOF untargeted metabolomics | Databases & Softwares | Scoop.it
The role of non-targeted metabolomics with its discovery power is constantly growing in many different fields of science. However, its biggest advantage of uncovering the unexpected is turning into one of its biggest bottlenecks, particularly in metabolite identification. Among different methods for metabolite identification or ID confirmation, tandem MS analysis plays a very important role. However, this method is limited to only certain types of MS analysers, making for example TOF-MS inaccessible for this type of metabolite identification. To overcome this, in-source fragmentation has been used to fragment molecules and obtain product ions. Since the molecule of interest is not isolated prior to its fragmentation, the acquired spectrum contains many different signals arising from the fragmentation of all compounds present in the sample. Therefore, to assign product ions to their precursors, a novel use of correlation analysis was tested with r ≥ 0.9 as an assignation of a product ion belonging to the precursor. This method and chosen cut-off was tested on three different sample complexity levels: conducting the analysis on a single standard, mix of co-eluting standards and on a plasma sample. Obtained results clearly proved the effectiveness of the proposed methodology for metabolite ID confirmation. Moreover, the proposed strategy can be successfully applied for semi-quantification of co-eluting molecules with the same monoisotopic mass but that differ in fragmentation pattern. The proposed methodology can greatly improve the robustness and throughput of identification in metabolomics studies by use of TOF-MS, which is crucial to obtain meaningful and trustful results.
Biswapriya Biswavas Misra's insight:

The role of non-targeted metabolomics with its discovery power is constantly growing in many different fields of science. However, its biggest advantage of uncovering the unexpected is turning into one of its biggest bottlenecks, particularly in metabolite identification. Among different methods for metabolite identification or ID confirmation, tandem MS analysis plays a very important role. However, this method is limited to only certain types of MS analysers, making for example TOF-MS inaccessible for this type of metabolite identification. To overcome this, in-source fragmentation has been used to fragment molecules and obtain product ions. Since the molecule of interest is not isolated prior to its fragmentation, the acquired spectrum contains many different signals arising from the fragmentation of all compounds present in the sample. Therefore, to assign product ions to their precursors, a novel use of correlation analysis was tested with r ≥ 0.9 as an assignation of a product ion belonging to the precursor. This method and chosen cut-off was tested on three different sample complexity levels: conducting the analysis on a single standard, mix of co-eluting standards and on a plasma sample. Obtained results clearly proved the effectiveness of the proposed methodology for metabolite ID confirmation. Moreover, the proposed strategy can be successfully applied for semi-quantification of co-eluting molecules with the same monoisotopic mass but that differ in fragmentation pattern. The proposed methodology can greatly improve the robustness and throughput of identification in metabolomics studies by use of TOF-MS, which is crucial to obtain meaningful and trustful results.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RNASeqBrowser: A genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks

Abstract
Background
Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets.

Results
This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform.

Conclusions
The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously.

RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser webcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite
Biswapriya Biswavas Misra's insight:
AbstractBackground

Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets.

Results

This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform.

Conclusions

The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously.

RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowserwebcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach

Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach | Databases & Softwares | Scoop.it
Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html.
Biswapriya Biswavas Misra's insight:

Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoCM - Database of Curated Mutations

DoCM - Database of Curated Mutations | Databases & Softwares | Scoop.it
DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments | Databases & Softwares | Scoop.it
Abstract

Summary: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.

Availability: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface.

Contact: ovitek@purdue.edu
Biswapriya Biswavas Misra's insight:
Abstract

Summary: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.

Availability: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface.

Contact: ovitek@purdue.edu

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

dbSUPER: an integrated database of super-enhancers in mouse and human genome

database
Biswapriya Biswavas Misra's insight:

Super-enhancer is a newly proposed concept, which refers to clusters of enhancers that can drive cell-type-specific gene expression and are crucial in cell identity. Many disease-associated sequence variations are enriched in the super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers for more than 100 cell types in human and mouse. However, no centralized resource to integrate all these findings is available yet. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for further study of transcriptional control of cell identity and disease by archiving computationally produced data. This data can be easily send to Galaxy, GREAT and Cistrome web servers for further downstream analysis. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive searching and browsing. dbSUPER provides downloadable and exportable features in a variety of data formats, and can be visualized in UCSC genome browser while custom tracks will be added automatically. Further, dbSUPER lists genes associated with super-enhancers and links to various databases, including GeneCards, UniProt and Entrez. Our database also provides an overlap analysis tool, to check the overlap of user defined regions with the current database. We believe, dbSUPER is a valuable resource for the bioinformatics and genetics research community.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DISSECT

DISSECT was designed for being able to perform common genomic analysis on big supercomputers thus allowing to analyze very large datasets. DISSECT capabilities include analysis using mixed linear models, principal components analysis, genome-wide association analysis (testing markers individually or in together in big groups), among others. It is designed for being as easy to use as other common software tools such as PLINK or REACTA/GCTA. In addition, despite its capability of working in supercomputers, it can be used also in single computers without problems.
Biswapriya Biswavas Misra's insight:

DISSECT was designed for being able to perform common genomic analysis on big supercomputers thus allowing to analyze very large datasets. DISSECT capabilities include analysis using mixed linear models, principal components analysis, genome-wide association analysis (testing markers individually or in together in big groups), among others. It is designed for being as easy to use as other common software tools such as PLINK or REACTA/GCTA. In addition, despite its capability of working in supercomputers, it can be used also in single computers without problems.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoGSD: the dog and wolf genome SNP database

DoGSD: the dog and wolf genome SNP database | Databases & Softwares | Scoop.it
The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.
Biswapriya Biswavas Misra's insight:

The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

South Green

South Green | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SeqFindr 0.34.0 : Python Package Index

SeqFindr 0.34.0 : Python Package Index | Databases & Softwares | Scoop.it
SeqFindr - easily create informative genomic feature plots. It’s a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.
Biswapriya Biswavas Misra's insight:

SeqFindr - easily create informative genomic feature plots. It’s a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.

 
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Plant-Microbe Symbioses
Scoop.it!

MTGD: The Medicago truncatula Genome Database

MTGD: The Medicago truncatula Genome Database | Databases & Softwares | Scoop.it
Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant ‘mines’ such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.

Via Jean-Michel Ané
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Legacies of inspiring leadership: Prof. Maqs

Legacies of inspiring leadership: Prof. Maqs | Databases & Softwares | Scoop.it

Unfortunately, despite Maqs’ research contributions being a boon to Malaysia’s socio-economic well-being, especially the trillion ringgit natural rubber industry, including that of the smallholders (who are facing an uncertain future), the country is unable to further harness his unique talent and rare expertise.

Biswapriya Biswavas Misra's insight:

Professor Maqsudul Alam was an internationally renowned microbiologist trained at the famous Max Plank Institute, Germany. He specialised in genomic research, more commonly called “synthetic biology”. After much discussion at various levels, including with the then Malaysian Prime Minister Tun Abdullah Ahmad Badawi, Maqs (as he was fondly called) agreed to set up the country’s first Centre for Chemical Biology at USM (CCB@USM) dedicated to natural rubber genomic research especially Hevea brasiliensis.

Heading one out of seven of the world’s first APEX projects to ensure that USM continued to escalate intellectually, he led the charge to decode the genomic constituent of natural rubber within a period of no more than three years, working with a specially assembled talented team of graduate and post-doctoral students, as well as staff.

The mission saw a race between at least five countries worldwide, some major producers of natural rubber, others major producers of natural rubber products globally.

A new entrant, Malaysia faced a particularly challenging task given a late start and from scratch. Notwithstanding that, under Maqs’ no-nonsense stewardship and mentoring, not only were world-class facilities completed in record time but the rubber genomic sequences of some two billion bases were also decoded by CCB@USM. All these happened in less than 20 months, putting Malaysia and USM on the world map of pioneering genomic research in natural rubber internationally.

It was a highly instructive time for those who believed that Malaysia can race from behind to be a world leader as new knowledge creators as called for in Challenge number six of Wawasan 2020, provided there is courage “to challenge the status quo” — the then Chief Secretary Tan Sri Mohd Sidek Hassan’s mantra that has rubbed off onto the USM culture of transforming the university, beginning with several of the APEX world’s first initiatives
.

By October 2009, Malaysia had gained recognition as the first country to decode the genome of natural rubber. As a result, CCB@USM received invitations to assist a South-South collaborative effort in agrogenomic work in jute, fungus, dates and more from the different parts of the developing world. The Bangladeshi government commissioned a national project, under the auspices of Prime Minister Sheikh Hasina, to enhance the use of jute through genomic research under Maqs’ leadership. The genome sequence of the Tosha jute plant was uncovered in less than a year later in June 2010.

Unfortunately, despite Maqs’ research contributions being a boon to Malaysia’s socio-economic well-being, especially the trillion ringgit natural rubber industry, including that of the smallholders (who are facing an uncertain future), the country is unable to further harness his unique talent and rare expertise.

Maqs passed away on Dec 20 at Queen’s Medical Center, Honolulu, Hawaii within a day of that of Ani Arope. Both luminaries will be dearly missed and fondly remembered for their courageous leadership in challenging the status quo in the pursuit of truth and knowledge. May they rest in peace. Al-Fatihah.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CDvist: a webserver for identification and visualization of conserved domains in protein sequences

CDvist: a webserver for identification and visualization of conserved domains in protein sequences | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Summary: Identification of domains in protein sequences allows their assigning to biological functions. Several webservers exist for identification of protein domains using similarity searches against various databases of protein domain models. However, none of them provides comprehensive domain coverage while allowing bulk querying and their visualization schemes can be improved. To address these issues we developed CDvist (a comprehensive domain visualization tool), which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting in a best possible comprehensive coverage. Bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolome searcher: a high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction

Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process. Currently, identification of mass spectrometry (MS) data is commonly done using empirically derived compound databases. Unfortunately, most databases contain relatively few compounds, leaving long lists of unidentified molecules. Incorporating genome-encoded metabolism enables MS output identification that may not be included in databases. Using an organism’s genome as a database restricts metabolite identification to only those compounds that the organism can produce.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process. Currently, identification of mass spectrometry (MS) data is commonly done using empirically derived compound databases. Unfortunately, most databases contain relatively few compounds, leaving long lists of unidentified molecules. Incorporating genome-encoded metabolism enables MS output identification that may not be included in databases. Using an organism’s genome as a database restricts metabolite identification to only those compounds that the organism can produce.

Results

To address the challenge of metabolomic analysis from MS data, a web-based application to directly search genome-constructed metabolic databases was developed. The user query returns a genome-restricted list of possible compound identifications along with the putative metabolic pathways based on the name, formula, SMILES structure, and the compound mass as defined by the user. Multiple queries can be done simultaneously by submitting a text file created by the user or obtained from the MS analysis software. The user can also provide parameters specific to the experiment’s MS analysis conditions, such as mass deviation, adducts, and detection mode during the query so as to provide additional levels of evidence to produce the tentative identification. The query results are provided as an HTML page and downloadable text file of possible compounds that are restricted to a specific genome. Hyperlinks provided in the HTML file connect the user to the curated metabolic databases housed in ProCyc, a Pathway Tools platform, as well as the KEGG Pathway database for visualization and metabolic pathway analysis.

Conclusions

Metabolome Searcher, a web-based tool, facilitates putative compound identification of MS output based on genome-restricted metabolic capability. This enables researchers to rapidly extend the possible identifications of large data sets for metabolites that are not in compound databases. Putative compound names with their associated metabolic pathways from metabolomics data sets are returned to the user for additional biological interpretation and visualization. This novel approach enables compound identification by restricting the possible masses to those encoded in the genome.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Genomic data assimilation using a higher moment filtering technique for restoration of gene regulatory networks

Background As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. Results We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. Conclusions Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.
Biswapriya Biswavas Misra's insight:

Background As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. Results We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. Conclusions Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Identification of AMP-activated protein kinase targets by a consensus sequence search of the proteom

AMP-activated protein kinase (AMPK) is a heterotrimeric serine/threonine protein kinase that is activated by cellular perturbations associated with ATP depletion or stress. While AMPK modulates the activity of a variety of targets containing a specific phosphorylation consensus sequence, the number of AMPK targets and their influence over cellular processes is currently thought to be limited.
Biswapriya Biswavas Misra's insight:
AbstractBackground

AMP-activated protein kinase (AMPK) is a heterotrimeric serine/threonine protein kinase that is activated by cellular perturbations associated with ATP depletion or stress. While AMPK modulates the activity of a variety of targets containing a specific phosphorylation consensus sequence, the number of AMPK targets and their influence over cellular processes is currently thought to be limited.

Results

We queried the human and the mouse proteomes for proteins containing AMPK phosphorylation consensus sequences. Integration of this database into Gaggle software facilitated the construction of probable AMPK-regulated networks based on known and predicted molecular associations. In vitro kinase assays were conducted for preliminary validation of 12 novel AMPK targets across a variety of cellular functional categories, including transcription, translation, cell migration, protein transport, and energy homeostasis. Following initial validation, pathways that include NAD synthetase 1 (NADSYN1) and protein kinase B (AKT2) were hypothesized and experimentally tested to provide a mechanistic basis for AMPK regulation of cell migration and maintenance of cellular NAD+ concentrations during catabolic processes.

Conclusions

This study delineates an approach that encompasses both in silico procedures and in vitroexperiments to produce testable hypotheses for AMPK regulation of cellular processes.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BioTechniques - SIFTER-T: A scalable and optimized framework for the SIFTER phylogenomic method of probabilistic protein domain annotation

BioTechniques - SIFTER-T: A scalable and optimized framework for the SIFTER phylogenomic method of probabilistic protein domain annotation | Databases & Softwares | Scoop.it
Statistical Inference of Function Through Evolutionary Relationships (SIFTER) is a powerful computational platform for probabilistic protein domain annotation. Nevertheless, SIFTER is not widely used, likely due to usability and scalability issues. Here we present SIFTER-T (SIFTER Throughput-optimized), a substantial improvement over SIFTER's original proof-of-principle implementation. SIFTER-T is optimized for better performance, allowing it to be used at the genome-wide scale. Compared to SIFTER 2.0, SIFTER-T achieved an 87-fold performance improvement using published test data sets for the known annotations recovering module and a 72.3% speed increase for the gene tree generation module in quad-core machines, as well as a major decrease in memory usage during the realignment phase. Memory optimization allowed an expanded set of proteins to be handled by SIFTER's probabilistic method. The improvement in performance and automation that we achieved allowed us to build a web server to bring the power of Bayesian phylogenomic inference to the genomics community. SIFTER-T and its online interface are freely available under GNU license at http://labpib.fmrp.usp.br/methods/SIFTER-t/ and https://github.com/dcasbioinfo/SIFTER-t.
Biswapriya Biswavas Misra's insight:

Statistical Inference of Function Through Evolutionary Relationships (SIFTER) is a powerful computational platform for probabilistic protein domain annotation. Nevertheless, SIFTER is not widely used, likely due to usability and scalability issues. Here we present SIFTER-T (SIFTER Throughput-optimized), a substantial improvement over SIFTER's original proof-of-principle implementation. SIFTER-T is optimized for better performance, allowing it to be used at the genome-wide scale. Compared to SIFTER 2.0, SIFTER-T achieved an 87-fold performance improvement using published test data sets for the known annotations recovering module and a 72.3% speed increase for the gene tree generation module in quad-core machines, as well as a major decrease in memory usage during the realignment phase. Memory optimization allowed an expanded set of proteins to be handled by SIFTER's probabilistic method. The improvement in performance and automation that we achieved allowed us to build a web server to bring the power of Bayesian phylogenomic inference to the genomics community. SIFTER-T and its online interface are freely available under GNU license at http://labpib.fmrp.usp.br/methods/SIFTER-t/ and https://github.com/dcasbioinfo/SIFTER-t.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Visualising associations between paired `omics' data sets

Abstract
Background
Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities.

Results
The multivariate statistical approaches ‘regularized Canonical Correlation Analysis’ and ‘sparse Partial Least Squares regression’ were recently developed to integrate two types of highly dimensional ‘omics’ data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two ‘omics’ data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis.

Conclusions
Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole.

Availability
The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities.

Results

The multivariate statistical approaches ‘regularized Canonical Correlation Analysis’ and ‘sparse Partial Least Squares regression’ were recently developed to integrate two types of highly dimensional ‘omics’ data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two ‘omics’ data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis.

Conclusions

Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole.

Availability

The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DGIdb - Mining the Druggable Genome

Search Interactions search for drug-gene interactions by gene name
Biswapriya Biswavas Misra's insight:
Search Interactions search for drug-gene interactions by gene name
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The COG database: a tool for genome-scale analysis of protein functions and evolution.

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.
Biswapriya Biswavas Misra's insight:

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

South Green

South Green | Databases & Softwares | Scoop.it
South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. ( read more)
Biswapriya Biswavas Misra's insight:

South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. ( read more)

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoCM - Database of Curated Mutations

DoCM - Database of Curated Mutations | Databases & Softwares | Scoop.it
DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data | Databases & Softwares | Scoop.it
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metage
Biswapriya Biswavas Misra's insight:

The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolomics Society Webinar on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST) by Dr. Oscar Yanes

Metabolomics Society Webinar on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST) by Dr. Oscar Yanes | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:
Dear Metabolomics Community,The Early-career Members Network (EMN), on behalf of the Metabolomics Society, is planning to establish a series of online webinars from January 2015 onwards.We would like to formally invite you to our first session of our series coming to you live on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST). Session 1 of the EMN webinar series will feature our expert speaker Dr. Oscar Yanes(http://www.yaneslab.com) who will provide a cutting edge 20 minute presentation regarding the complex and multidisciplinary nature of metabolomics. The experiences and research conducted in Dr. Yanes' laboratory will provide an invaluable insight into the challenges faced in modern metabolomics practice. In addition, there will be an opportunity to pose key questions to Dr. Yanes at the end of the session.Please, register using the following link: 
https://attendee.gotowebinar.com/register/2409752719256762369The first webinar is freely available for everyone courtesy of the Metabolomics Society and will be uploaded to the society's website. All subsequent sessions from our series will be available for members of the Metabolomics Society only, with the opportunity to revisit live recorded sessions at your own convenience.We look forward to having you join us!Sincerely,
The EMN
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BFA: phenotype prediction integrating metabolic models with constraints derived from experimental data

Background

Flux analysis methods lie at the core of Metabolic Engineering (ME), providing methods for phenotype simulation that allow the determination of flux distributions under different conditions. Although many constraint-based modeling software tools have been developed and published, none provides a free user-friendly application that makes available the full portfolio of flux analysis methods.
Results

This work presents Constraint-based Flux Analysis (CBFA), an open-source software application for flux analysis in metabolic models that implements several methods for phenotype prediction, allowing users to define constraints associated with measured fluxes and/or flux ratios, together with environmental conditions (e.g. media) and reaction/gene knockouts. CBFA identifies the set of applicable methods based on the constraints defined from user inputs, encompassing algebraic and constraint-based simulation methods. The integration of CBFA within the OptFlux framework for ME enables the utilization of different model formats and standards and the integration with complementary methods for phenotype simulation and visualization of results.
Conclusions

A general-purpose and flexible application is proposed that is independent of the origin of the constraints defined for a given simulation. The aim is to provide a simple to use software tool focused on the application of several flux prediction methods.
Biswapriya Biswavas Misra's insight:
Background

Flux analysis methods lie at the core of Metabolic Engineering (ME), providing methods for phenotype simulation that allow the determination of flux distributions under different conditions. Although many constraint-based modeling software tools have been developed and published, none provides a free user-friendly application that makes available the full portfolio of flux analysis methods.

Results

This work presents Constraint-based Flux Analysis (CBFA), an open-source software application for flux analysis in metabolic models that implements several methods for phenotype prediction, allowing users to define constraints associated with measured fluxes and/or flux ratios, together with environmental conditions (e.g. media) and reaction/gene knockouts. CBFA identifies the set of applicable methods based on the constraints defined from user inputs, encompassing algebraic and constraint-based simulation methods. The integration of CBFA within the OptFlux framework for ME enables the utilization of different model formats and standards and the integration with complementary methods for phenotype simulation and visualization of results.

Conclusions

A general-purpose and flexible application is proposed that is independent of the origin of the constraints defined for a given simulation. The aim is to provide a simple to use software tool focused on the application of several flux prediction methods.

more...
No comment yet.