Databases & Softw...
Follow
Find
4.3K views | +0 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

GRAST 1.0 - Genome Reduction Analysing Software Tool

GRAST 1.0 - Genome Reduction Analysing Software Tool | Databases & Softwares | Scoop.it
3D molecular model · Alignment / BLAST · Assembly Tools · Bio-chemical / Engineering · Bioinformatics Platform · Cluster Analysis · Cytometry / Cell · DNA / Genome Analysis · Education & Fun · File Conversion · Genetics & ...
more...
No comment yet.

From around the web

Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Proteome profile of the endomembrane of developing coleoptiles from switchgrass (Panicum virgatum).

Proteome profile of the endomembrane of developing coleoptiles from switchgrass (Panicum virgatum). | Databases & Softwares | Scoop.it
The cost-effective production of biofuels from lignocellulosic material will likely require manipulation of plant biomass, specifically cell walls. The North American native prairie grass Panicum virgatum (switchgrass) is seen as a potential biofuel crop with an array of genetic resources currently being developed. We have characterized the endomembrane proteome of switchgrass coleoptiles to provide additional information to the switchgrass community. In total, we identified 1750 unique proteins from two biological replicates. These data have been deposited in the ProteomeXchange with the identifier PXD001351 (http://proteomecentral.proteomexchange.org/dataset/PXD001351).
Biswapriya Biswavas Misra's insight:

The cost-effective production of biofuels from lignocellulosic material will likely require manipulation of plant biomass, specifically cell walls. The North American native prairie grass Panicum virgatum (switchgrass) is seen as a potential biofuel crop with an array of genetic resources currently being developed. We have characterized the endomembrane proteome of switchgrass coleoptiles to provide additional information to the switchgrass community. In total, we identified 1750 unique proteins from two biological replicates. These data have been deposited in the ProteomeXchange with the identifier PXD001351 (http://proteomecentral.proteomexchange.org/dataset/PXD001351).

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

PHABULOSA Controls the Quiescent Center-Independent Root Meristem Activities in Arabidopsis thaliana

PHABULOSA Controls the Quiescent Center-Independent Root Meristem Activities in  Arabidopsis thaliana | Databases & Softwares | Scoop.it
Author Summary Plant roots are programmed to grow continuously into the soil, searching for nutrients and water. The iterative process of cell division, elongation, and differentiation contributes to root growth. The quiescent center (QC) is known to maintain the root meristem, and thus ensure root growth. In this study, we report a novel aspect of root growth regulation controlled independently of the QC by PHABULOSA (PHB). In shr mutant plants, PHB, which in the meristem is actively restr
Biswapriya Biswavas Misra's insight:

Plant growth depends on stem cell niches in meristems. In the root apical meristem, the quiescent center (QC) cells form a niche together with the surrounding stem cells. Stem cells produce daughter cells that are displaced into a transit-amplifying (TA) domain of the root meristem. TA cells divide several times to provide cells for growth. SHORTROOT (SHR) and SCARECROW (SCR) are key regulators of the stem cell niche. Cytokinin controls TA cell activities in a dose-dependent manner. Although the regulatory programs in each compartment of the root meristem have been identified, it is still unclear how they coordinate one another. Here, we investigate how PHABULOSA (PHB), under the posttranscriptional control of SHR and SCR, regulates TA cell activities. The root meristem and growth defects in shr or scrmutants were significantly recovered in the shr phb or scr phb double mutant, respectively. This rescue in root growth occurs in the absence of a QC. Conversely, when the modified PHB, which is highly resistant to microRNA, was expressed throughout the stele of the wild-type root meristem, root growth became very similar to that observed in the shr; however, the identity of the QC was unaffected. Interestingly, a moderate increase in PHB resulted in a root meristem phenotype similar to that observed following the application of high levels of cytokinin. Our protoplast assay and transgenic approach using ARR10 suggest that the depletion of TA cells by high PHB in the stele occurs via the repression of B-ARR activities. This regulatory mechanism seems to help to maintain the cytokinin homeostasis in the meristem. Taken together, our study suggests that PHB can dynamically regulate TA cell activities in a QC-independent manner, and that the SHR-PHB pathway enables a robust root growth system by coordinating the stem cell niche and TA domain.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolome searcher: a high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction

Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process. Currently, identification of mass spectrometry (MS) data is commonly done using empirically derived compound databases. Unfortunately, most databases contain relatively few compounds, leaving long lists of unidentified molecules. Incorporating genome-encoded metabolism enables MS output identification that may not be included in databases. Using an organism’s genome as a database restricts metabolite identification to only those compounds that the organism can produce.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process. Currently, identification of mass spectrometry (MS) data is commonly done using empirically derived compound databases. Unfortunately, most databases contain relatively few compounds, leaving long lists of unidentified molecules. Incorporating genome-encoded metabolism enables MS output identification that may not be included in databases. Using an organism’s genome as a database restricts metabolite identification to only those compounds that the organism can produce.

Results

To address the challenge of metabolomic analysis from MS data, a web-based application to directly search genome-constructed metabolic databases was developed. The user query returns a genome-restricted list of possible compound identifications along with the putative metabolic pathways based on the name, formula, SMILES structure, and the compound mass as defined by the user. Multiple queries can be done simultaneously by submitting a text file created by the user or obtained from the MS analysis software. The user can also provide parameters specific to the experiment’s MS analysis conditions, such as mass deviation, adducts, and detection mode during the query so as to provide additional levels of evidence to produce the tentative identification. The query results are provided as an HTML page and downloadable text file of possible compounds that are restricted to a specific genome. Hyperlinks provided in the HTML file connect the user to the curated metabolic databases housed in ProCyc, a Pathway Tools platform, as well as the KEGG Pathway database for visualization and metabolic pathway analysis.

Conclusions

Metabolome Searcher, a web-based tool, facilitates putative compound identification of MS output based on genome-restricted metabolic capability. This enables researchers to rapidly extend the possible identifications of large data sets for metabolites that are not in compound databases. Putative compound names with their associated metabolic pathways from metabolomics data sets are returned to the user for additional biological interpretation and visualization. This novel approach enables compound identification by restricting the possible masses to those encoded in the genome.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Genomic data assimilation using a higher moment filtering technique for restoration of gene regulatory networks

Background As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. Results We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. Conclusions Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.
Biswapriya Biswavas Misra's insight:

Background As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. Results We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. Conclusions Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Identification of AMP-activated protein kinase targets by a consensus sequence search of the proteom

AMP-activated protein kinase (AMPK) is a heterotrimeric serine/threonine protein kinase that is activated by cellular perturbations associated with ATP depletion or stress. While AMPK modulates the activity of a variety of targets containing a specific phosphorylation consensus sequence, the number of AMPK targets and their influence over cellular processes is currently thought to be limited.
Biswapriya Biswavas Misra's insight:
AbstractBackground

AMP-activated protein kinase (AMPK) is a heterotrimeric serine/threonine protein kinase that is activated by cellular perturbations associated with ATP depletion or stress. While AMPK modulates the activity of a variety of targets containing a specific phosphorylation consensus sequence, the number of AMPK targets and their influence over cellular processes is currently thought to be limited.

Results

We queried the human and the mouse proteomes for proteins containing AMPK phosphorylation consensus sequences. Integration of this database into Gaggle software facilitated the construction of probable AMPK-regulated networks based on known and predicted molecular associations. In vitro kinase assays were conducted for preliminary validation of 12 novel AMPK targets across a variety of cellular functional categories, including transcription, translation, cell migration, protein transport, and energy homeostasis. Following initial validation, pathways that include NAD synthetase 1 (NADSYN1) and protein kinase B (AKT2) were hypothesized and experimentally tested to provide a mechanistic basis for AMPK regulation of cell migration and maintenance of cellular NAD+ concentrations during catabolic processes.

Conclusions

This study delineates an approach that encompasses both in silico procedures and in vitroexperiments to produce testable hypotheses for AMPK regulation of cellular processes.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BioTechniques - SIFTER-T: A scalable and optimized framework for the SIFTER phylogenomic method of probabilistic protein domain annotation

BioTechniques - SIFTER-T: A scalable and optimized framework for the SIFTER phylogenomic method of probabilistic protein domain annotation | Databases & Softwares | Scoop.it
Statistical Inference of Function Through Evolutionary Relationships (SIFTER) is a powerful computational platform for probabilistic protein domain annotation. Nevertheless, SIFTER is not widely used, likely due to usability and scalability issues. Here we present SIFTER-T (SIFTER Throughput-optimized), a substantial improvement over SIFTER's original proof-of-principle implementation. SIFTER-T is optimized for better performance, allowing it to be used at the genome-wide scale. Compared to SIFTER 2.0, SIFTER-T achieved an 87-fold performance improvement using published test data sets for the known annotations recovering module and a 72.3% speed increase for the gene tree generation module in quad-core machines, as well as a major decrease in memory usage during the realignment phase. Memory optimization allowed an expanded set of proteins to be handled by SIFTER's probabilistic method. The improvement in performance and automation that we achieved allowed us to build a web server to bring the power of Bayesian phylogenomic inference to the genomics community. SIFTER-T and its online interface are freely available under GNU license at http://labpib.fmrp.usp.br/methods/SIFTER-t/ and https://github.com/dcasbioinfo/SIFTER-t.
Biswapriya Biswavas Misra's insight:

Statistical Inference of Function Through Evolutionary Relationships (SIFTER) is a powerful computational platform for probabilistic protein domain annotation. Nevertheless, SIFTER is not widely used, likely due to usability and scalability issues. Here we present SIFTER-T (SIFTER Throughput-optimized), a substantial improvement over SIFTER's original proof-of-principle implementation. SIFTER-T is optimized for better performance, allowing it to be used at the genome-wide scale. Compared to SIFTER 2.0, SIFTER-T achieved an 87-fold performance improvement using published test data sets for the known annotations recovering module and a 72.3% speed increase for the gene tree generation module in quad-core machines, as well as a major decrease in memory usage during the realignment phase. Memory optimization allowed an expanded set of proteins to be handled by SIFTER's probabilistic method. The improvement in performance and automation that we achieved allowed us to build a web server to bring the power of Bayesian phylogenomic inference to the genomics community. SIFTER-T and its online interface are freely available under GNU license at http://labpib.fmrp.usp.br/methods/SIFTER-t/ and https://github.com/dcasbioinfo/SIFTER-t.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Visualising associations between paired `omics' data sets

Abstract
Background
Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities.

Results
The multivariate statistical approaches ‘regularized Canonical Correlation Analysis’ and ‘sparse Partial Least Squares regression’ were recently developed to integrate two types of highly dimensional ‘omics’ data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two ‘omics’ data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis.

Conclusions
Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole.

Availability
The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities.

Results

The multivariate statistical approaches ‘regularized Canonical Correlation Analysis’ and ‘sparse Partial Least Squares regression’ were recently developed to integrate two types of highly dimensional ‘omics’ data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two ‘omics’ data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis.

Conclusions

Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole.

Availability

The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DGIdb - Mining the Druggable Genome

Search Interactions search for drug-gene interactions by gene name
Biswapriya Biswavas Misra's insight:
Search Interactions search for drug-gene interactions by gene name
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The COG database: a tool for genome-scale analysis of protein functions and evolution.

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.
Biswapriya Biswavas Misra's insight:

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

South Green

South Green | Databases & Softwares | Scoop.it
South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. ( read more)
Biswapriya Biswavas Misra's insight:

South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. ( read more)

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoCM - Database of Curated Mutations

DoCM - Database of Curated Mutations | Databases & Softwares | Scoop.it
DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data | Databases & Softwares | Scoop.it
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metage
Biswapriya Biswavas Misra's insight:

The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolomics Society Webinar on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST) by Dr. Oscar Yanes

Metabolomics Society Webinar on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST) by Dr. Oscar Yanes | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:
Dear Metabolomics Community,The Early-career Members Network (EMN), on behalf of the Metabolomics Society, is planning to establish a series of online webinars from January 2015 onwards.We would like to formally invite you to our first session of our series coming to you live on Thursday 29 January 2015 (7:30 AM - 8:30 AM EST). Session 1 of the EMN webinar series will feature our expert speaker Dr. Oscar Yanes(http://www.yaneslab.com) who will provide a cutting edge 20 minute presentation regarding the complex and multidisciplinary nature of metabolomics. The experiences and research conducted in Dr. Yanes' laboratory will provide an invaluable insight into the challenges faced in modern metabolomics practice. In addition, there will be an opportunity to pose key questions to Dr. Yanes at the end of the session.Please, register using the following link: 
https://attendee.gotowebinar.com/register/2409752719256762369The first webinar is freely available for everyone courtesy of the Metabolomics Society and will be uploaded to the society's website. All subsequent sessions from our series will be available for members of the Metabolomics Society only, with the opportunity to revisit live recorded sessions at your own convenience.We look forward to having you join us!Sincerely,
The EMN
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Predicted protein-protein interactions in the moss Physcomitrella patens: a new bioinformatic resource

Physcomitrella patens, a haploid dominant plant, is fast becoming a useful molecular genetics and bioinformatics tool due to its key phylogenetic position as a bryophyte in the post-genomic era. Genome sequences from select reference species were compared bioinformatically to Physcomitrella patens using reciprocal blasts with the InParanoid software package. A reference protein interaction database assembled using MySQL by compiling BioGrid, BIND, DIP, and Intact databases was queried for moss orthologs existing for both interacting partners. This method has been used to successfully predict interactions for a number of angiosperm plants.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Physcomitrella patens, a haploid dominant plant, is fast becoming a useful molecular genetics and bioinformatics tool due to its key phylogenetic position as a bryophyte in the post-genomic era. Genome sequences from select reference species were compared bioinformatically toPhyscomitrella patens using reciprocal blasts with the InParanoid software package. A reference protein interaction database assembled using MySQL by compiling BioGrid, BIND, DIP, and Intact databases was queried for moss orthologs existing for both interacting partners. This method has been used to successfully predict interactions for a number of angiosperm plants.

Results

The first predicted protein-protein interactome for a bryophyte based on the interolog method contains 67,740 unique interactions from 5,695 different Physcomitrella patens proteins. Most conserved interactions among proteins were those associated with metabolic processes. Over-represented Gene Ontology categories are reported here.

Conclusion

Addition of moss, a plant representative 200 million years diverged from angiosperms to interactomic research greatly expands the possibility of conducting comparative analyses giving tremendous insight into network evolution of land plants. This work helps demonstrate the utility of “guilt-by-association” models for predicting protein interactions, providing provisional roadmaps that can be explored using experimental approaches. Included with this dataset is a method for characterizing subnetworks and investigating specific processes, such as the Calvin-Benson-Bassham cycle.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

In-source fragmentation and correlation analysis as tools for metabolite identification exemplified with CE-TOF untargeted metabolomics

In-source fragmentation and correlation analysis as tools for metabolite identification exemplified with CE-TOF untargeted metabolomics | Databases & Softwares | Scoop.it
The role of non-targeted metabolomics with its discovery power is constantly growing in many different fields of science. However, its biggest advantage of uncovering the unexpected is turning into one of its biggest bottlenecks, particularly in metabolite identification. Among different methods for metabolite identification or ID confirmation, tandem MS analysis plays a very important role. However, this method is limited to only certain types of MS analysers, making for example TOF-MS inaccessible for this type of metabolite identification. To overcome this, in-source fragmentation has been used to fragment molecules and obtain product ions. Since the molecule of interest is not isolated prior to its fragmentation, the acquired spectrum contains many different signals arising from the fragmentation of all compounds present in the sample. Therefore, to assign product ions to their precursors, a novel use of correlation analysis was tested with r ≥ 0.9 as an assignation of a product ion belonging to the precursor. This method and chosen cut-off was tested on three different sample complexity levels: conducting the analysis on a single standard, mix of co-eluting standards and on a plasma sample. Obtained results clearly proved the effectiveness of the proposed methodology for metabolite ID confirmation. Moreover, the proposed strategy can be successfully applied for semi-quantification of co-eluting molecules with the same monoisotopic mass but that differ in fragmentation pattern. The proposed methodology can greatly improve the robustness and throughput of identification in metabolomics studies by use of TOF-MS, which is crucial to obtain meaningful and trustful results.
Biswapriya Biswavas Misra's insight:

The role of non-targeted metabolomics with its discovery power is constantly growing in many different fields of science. However, its biggest advantage of uncovering the unexpected is turning into one of its biggest bottlenecks, particularly in metabolite identification. Among different methods for metabolite identification or ID confirmation, tandem MS analysis plays a very important role. However, this method is limited to only certain types of MS analysers, making for example TOF-MS inaccessible for this type of metabolite identification. To overcome this, in-source fragmentation has been used to fragment molecules and obtain product ions. Since the molecule of interest is not isolated prior to its fragmentation, the acquired spectrum contains many different signals arising from the fragmentation of all compounds present in the sample. Therefore, to assign product ions to their precursors, a novel use of correlation analysis was tested with r ≥ 0.9 as an assignation of a product ion belonging to the precursor. This method and chosen cut-off was tested on three different sample complexity levels: conducting the analysis on a single standard, mix of co-eluting standards and on a plasma sample. Obtained results clearly proved the effectiveness of the proposed methodology for metabolite ID confirmation. Moreover, the proposed strategy can be successfully applied for semi-quantification of co-eluting molecules with the same monoisotopic mass but that differ in fragmentation pattern. The proposed methodology can greatly improve the robustness and throughput of identification in metabolomics studies by use of TOF-MS, which is crucial to obtain meaningful and trustful results.

  
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RNASeqBrowser: A genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks

Abstract
Background
Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets.

Results
This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform.

Conclusions
The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously.

RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser webcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite
Biswapriya Biswavas Misra's insight:
AbstractBackground

Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets.

Results

This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform.

Conclusions

The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously.

RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowserwebcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach

Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach | Databases & Softwares | Scoop.it
Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html.
Biswapriya Biswavas Misra's insight:

Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoCM - Database of Curated Mutations

DoCM - Database of Curated Mutations | Databases & Softwares | Scoop.it
DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments | Databases & Softwares | Scoop.it
Abstract

Summary: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.

Availability: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface.

Contact: ovitek@purdue.edu
Biswapriya Biswavas Misra's insight:
Abstract

Summary: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.

Availability: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface.

Contact: ovitek@purdue.edu

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

dbSUPER: an integrated database of super-enhancers in mouse and human genome

database
Biswapriya Biswavas Misra's insight:

Super-enhancer is a newly proposed concept, which refers to clusters of enhancers that can drive cell-type-specific gene expression and are crucial in cell identity. Many disease-associated sequence variations are enriched in the super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers for more than 100 cell types in human and mouse. However, no centralized resource to integrate all these findings is available yet. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for further study of transcriptional control of cell identity and disease by archiving computationally produced data. This data can be easily send to Galaxy, GREAT and Cistrome web servers for further downstream analysis. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive searching and browsing. dbSUPER provides downloadable and exportable features in a variety of data formats, and can be visualized in UCSC genome browser while custom tracks will be added automatically. Further, dbSUPER lists genes associated with super-enhancers and links to various databases, including GeneCards, UniProt and Entrez. Our database also provides an overlap analysis tool, to check the overlap of user defined regions with the current database. We believe, dbSUPER is a valuable resource for the bioinformatics and genetics research community.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DISSECT

DISSECT was designed for being able to perform common genomic analysis on big supercomputers thus allowing to analyze very large datasets. DISSECT capabilities include analysis using mixed linear models, principal components analysis, genome-wide association analysis (testing markers individually or in together in big groups), among others. It is designed for being as easy to use as other common software tools such as PLINK or REACTA/GCTA. In addition, despite its capability of working in supercomputers, it can be used also in single computers without problems.
Biswapriya Biswavas Misra's insight:

DISSECT was designed for being able to perform common genomic analysis on big supercomputers thus allowing to analyze very large datasets. DISSECT capabilities include analysis using mixed linear models, principal components analysis, genome-wide association analysis (testing markers individually or in together in big groups), among others. It is designed for being as easy to use as other common software tools such as PLINK or REACTA/GCTA. In addition, despite its capability of working in supercomputers, it can be used also in single computers without problems.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DoGSD: the dog and wolf genome SNP database

DoGSD: the dog and wolf genome SNP database | Databases & Softwares | Scoop.it
The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.
Biswapriya Biswavas Misra's insight:

The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

South Green

South Green | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SeqFindr 0.34.0 : Python Package Index

SeqFindr 0.34.0 : Python Package Index | Databases & Softwares | Scoop.it
SeqFindr - easily create informative genomic feature plots. It’s a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.
Biswapriya Biswavas Misra's insight:

SeqFindr - easily create informative genomic feature plots. It’s a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.

 
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Plant-Microbe Symbioses
Scoop.it!

MTGD: The Medicago truncatula Genome Database

MTGD: The Medicago truncatula Genome Database | Databases & Softwares | Scoop.it
Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant ‘mines’ such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.

Via Jean-Michel Ané
more...
No comment yet.