Databases & Softw...
Follow
Find
4.0K views | +0 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

GeneCite 3.0 - High-throughput Literature and Pathway Mining

GeneCite 3.0 - High-throughput Literature and Pathway Mining | Databases & Softwares | Scoop.it

:: DESCRIPTION

GeneCite is a generalized query application that allows you to specify sophisticated sets of queries and generates a table of the number of bio-related records found for each query. The table can be presented as a Web page or in a standard spreadsheet format that will allow you to view the full output of only those queries that generate an interesting number of records. Currently, GeneCite allows you to submit ‘term’-type queries to web based PubMed database and UniSTS database, as well as PathwayScreen database stored in Microsoft Access software. You provide the application with partial queries in standard ASCII text files. These queries can then be combined in various ways to produce the set of queries that is sent to the databases, called a search. The program stores the number of citations returned for each query. The result of a search is a column (one-dimensional) or a table (two-dimensional) of those counts, depending on the type of search selected.

more...
No comment yet.

From around the web

Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SeqFindr 0.32.2 : Python Package Index

SeqFindr 0.32.2 : Python Package Index | Databases & Softwares | Scoop.it
SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.
Biswapriya Biswavas Misra's insight:

SeqFindr - easily create informative genomic feature plots. It's a bioinfomagicians tool to detect the presence or absence of genomic features given a database describing these features & a set of draft and/or complete genomes. We work with bacterial genomes & as such SeqFindr has only been tested with bacterial genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Bayesian transcriptome assembly

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.
Biswapriya Biswavas Misra's insight:

RNA-seq allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce the Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. The Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.
Biswapriya Biswavas Misra's insight:

The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HPD: an online integrated human pathway database enabling systems biology studies. : Chowbina, Sudhir R

This article is from BMC Bioinformatics, volume 10.AbstractBackground: Pathway-oriented experimental and computational studies have led to a significant...
Biswapriya Biswavas Misra's insight:

Background: Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities. Results: We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies. Conclusion: HPD http://bio.informatics.iupui.edu/HPD is a new resource for searching, managing, and studying human biological pathways. Users of HPD can search against large collections of human biological pathways, compare related pathways and their molecular entity compositions, and build high-quality, expanded-scope disease pathway models. The current HPD software can help users address a wide range of pathway-related questions in human disease biology studies.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events

Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events | Databases & Softwares | Scoop.it
The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90 696 proteins, 165 044 N-termini, 130 182 C-termini, 14 382 cleavage sites and 33 209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.
Biswapriya Biswavas Misra's insight:

The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90 696 proteins, 165 044 N-termini, 130 182 C-termini, 14 382 cleavage sites and 33 209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) Plugin | BioGPS

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation. https://decipher.sanger.ac.uk/
Biswapriya Biswavas Misra's insight:

DECIPHER (DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation. https://decipher.sanger.ac.uk/

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

Abstract
Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.
Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.
Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.

Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.

Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Welcome to Tomato Genomic Resources Database: Home Page

Welcome to Tomato Genomic Resources Database: Home Page | Databases & Softwares | Scoop.it
Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research.
Biswapriya Biswavas Misra's insight:

Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Eggplant Genome DataBase

Eggplant Genome DataBase | Databases & Softwares | Scoop.it
Eggplant Genome Database at Kazusa DNA Research Institute.
Biswapriya Biswavas Misra's insight:
The eggplant (Solanum melongena L.) is one of the most important vegetable crop species in Japan as well as in other Asian, Middle and Near Eastarn, Mediterranean and African countries. Eggplant belongs to the Solanaceae family including tomato, potato and pepper, but unlike these allies, it is endemic to the Old World. To get this unique solanaceous member up on the stage of genomics and let it act as a crucial cast in molecular genetic and physiological studies, eggplant whole-genome sequencing has been done to construct a draft genome dataset.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

Abstract (provisional)
Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.
Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.
Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.

Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.

Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

Fiona: a parallel and automatic strategy for read error correction

RT @druvus: Fiona: a parallel and automatic strategy for read error correction http://t.co/voU7wksk48

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

PRISE2: Software for designing sequence-selective PCR primers and probes

Background:
PRISE2 is a new software tool for designing sequence-selective PCR primers and probes.

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Virology and Bioinformatics from Virology.ca
Scoop.it!

AliView: a fast and lightweight alignment viewer and editor for large data sets

AliView: a fast and lightweight alignment viewer and editor for large data sets | Databases & Softwares | Scoop.it

AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview CONTACT: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.


Via Chris Upton + helpers
more...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The big data challenges of connectomics

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.
Biswapriya Biswavas Misra's insight:

The structure of the nervous system is extraordinarily complicated because individual neurons are interconnected to hundreds or even thousands of other cells in networks that can extend over large volumes. Mapping such networks at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces 'big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. Here we describe some of the key difficulties that may arise and provide suggestions for managing them.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes

The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

The 5000 arthropod genomes initiative (i5k) has tasked itself with coordinating the sequencing of 5000 insect or related arthropod genomes. The resulting influx of data, mostly from small research groups or communities with little bioinformatics experience, will require visualization, dissemination and curation, preferably from a centralized platform. The National Agricultural Library (NAL) has implemented the i5k Workspace@NAL (http://i5k.nal.usda.gov/) to help meet the i5k initiative's genome hosting needs. Any i5k member is encouraged to contact the i5k Workspace with their genome project details. Once submitted, new content will be accessible via organism pages, genome browsers and BLAST search engines, which are implemented via the open-source Tripal framework, a web interface for the underlying Chado database schema. We also implement the Web Apollo software for groups that choose to curate gene models. New content will add to the existing body of 35 arthropod species, which include species relevant for many aspects of arthropod genomic research, including agriculture, invasion biology, systematics, ecology and evolution, and developmental research.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ProteoAnnotator - Open Source Proteogenomics Annotation Software Supporting PSI Standards.

The recent massive increase in capability for sequencing genomes is producing enormous advances in our understanding of biological systems. However, there is a bottleneck in genome annotation - determining the structure of all transcribed genes. Experimental data from MS studies can play a major role in confirming and correcting gene structure - proteogenomics. However, there are some technical and practical challenges to overcome, since proteogenomics requires pipelines comprising a complex set of interconnected modules as well as bespoke routines, for example in protein inference and statistics. We are introducing a complete, open source pipeline for proteogenomics, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools. We have developed new modules for pre-processing and combining multiple search databases, for performing peptide-level statistics on mzIdentML files, for scoring grouped protein identifications matched to a given genomic locus to validate that updates to the official gene models are statistically sound, and for mapping end results back onto the genome. ProteoAnnotator is available from http://www.proteoannotator.org/. This article is protected by copyright. All rights reserved.
Biswapriya Biswavas Misra's insight:

The recent massive increase in capability for sequencing genomes is producing enormous advances in our understanding of biological systems. However, there is a bottleneck in genome annotation - determining the structure of all transcribed genes. Experimental data from MS studies can play a major role in confirming and correcting gene structure - proteogenomics. However, there are some technical and practical challenges to overcome, since proteogenomics requires pipelines comprising a complex set of interconnected modules as well as bespoke routines, for example in protein inference and statistics. We are introducing a complete, open source pipeline for proteogenomics, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools. We have developed new modules for pre-processing and combining multiple search databases, for performing peptide-level statistics on mzIdentML files, for scoring grouped protein identifications matched to a given genomic locus to validate that updates to the official gene models are statistically sound, and for mapping end results back onto the genome. ProteoAnnotator is available from http://www.proteoannotator.org/. This article is protected by copyright. All rights reserved.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification | Databases & Softwares | Scoop.it
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
Biswapriya Biswavas Misra's insight:

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

XGlycScan: An Open-source Software For N-linked Glycosite Assignmen... - PubMed - NCBI

Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.
Biswapriya Biswavas Misra's insight:

Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

canSAR: updated cancer research and drug discovery knowledgebase.

Nucleic Acids Res. 2014 Jan;42(Database issue):D1040-7. doi: 10.1093/nar/gkt1182. Epub 2013 Dec 3. Research Support, Non-U.S. Gov't
Biswapriya Biswavas Misra's insight:

canSAR (http://cansar.icr.ac.uk) is a public integrative cancer-focused knowledgebase for the support of cancer translational research and drug discovery. Through the integration of biological, pharmacological, chemical, structural biology and protein network data, it provides a single information portal to answer complex multidisciplinary questions including--among many others--what is known about a protein, in which cancers is it expressed or mutated, and what chemical tools and cell line models can be used to experimentally probe its activity? What is known about a drug, its cellular sensitivity profile and what proteins is it known to bind that may explain unusual bioactivity? Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities and new target, cancer cell line, protein family and 3D structure summaries and tools.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Investigation of therapeutic effectiveness of active components in Sini decoction by a comprehensive GC/LC-MS based metabolomics and network pharmacology approaches

Investigation of therapeutic effectiveness of active components in Sini decoction by a comprehensive GC/LC-MS based metabolomics and network pharmacology approaches | Databases & Softwares | Scoop.it
As a classical formula, Sini decoction (SND) has been fully proved to be clinically effective in treating doxorubicin (DOX)-induced cardiomyopathy. Current chemomics and pharmacology proved that the total alkaloids (TA), total gingerols (TG), total flavones and total saponins (TFS) are major active ingredients of Acontium Carmichaeli, Zingiber Officinale and Glycyrrhiza Uralensis in SND respectively. Our animal experiments in this study demonstrated that above active ingredients (TAGFS) were more effective than formulas formed by any one or two of the three individual components and nearly the same as SND. However, very little is known about the action mechanisms of TAGFS. Thus, this study aimed to use for the first time the combination of GC/LC-MS based metabolomics and network pharmacology for solving this problem. By metabolomics, it was found that TAGFS worked by regulating six primary pathways. Then, network pharmacology was applied to search specific targets. 17 potential cardiovascular related targets were found through molecular docking and 11 of which were identified by references, which demonstrated the therapeutic effectiveness of TAGFS by network pharmacology. Among these targets, four targets, including phosphoinositide 3-kinase gamma, insulin receptor, ornithine aminotransferase and glucokinase, were involved in the pathways TAGFS regulated. What is more, phosphoinositide 3-kinase gamma, insulin receptor and glucokinase were proved to be targets of active components in SND. In addition, our data indicated TA as the principal ingredients in SND formula, whereas TG and TFS served as adjuvant ingredients. We therefore suggest that dissecting the mode of action of clinically effective formulae with the combination use of metabolomics and network pharmacology may be a good strategy in exploring action mechanisms of Traditional Chinese Medicine.
Biswapriya Biswavas Misra's insight:

As a classical formula, Sini decoction (SND) has been fully proved to be clinically effective in treating doxorubicin (DOX)-induced cardiomyopathy. Current chemomics and pharmacology proved that the total alkaloids (TA), total gingerols (TG), total flavones and total saponins (TFS) are major active ingredients of Acontium Carmichaeli, Zingiber Officinale and Glycyrrhiza Uralensis in SND respectively. Our animal experiments in this study demonstrated that above active ingredients (TAGFS) were more effective than formulas formed by any one or two of the three individual components and nearly the same as SND. However, very little is known about the action mechanisms of TAGFS. Thus, this study aimed to use for the first time the combination of GC/LC-MS based metabolomics and network pharmacology for solving this problem. By metabolomics, it was found that TAGFS worked by regulating six primary pathways. Then, network pharmacology was applied to search specific targets. 17 potential cardiovascular related targets were found through molecular docking and 11 of which were identified by references, which demonstrated the therapeutic effectiveness of TAGFS by network pharmacology. Among these targets, four targets, including phosphoinositide 3-kinase gamma, insulin receptor, ornithine aminotransferase and glucokinase, were involved in the pathways TAGFS regulated. What is more, phosphoinositide 3-kinase gamma, insulin receptor and glucokinase were proved to be targets of active components in SND. In addition, our data indicated TA as the principal ingredients in SND formula, whereas TG and TFS served as adjuvant ingredients. We therefore suggest that dissecting the mode of action of clinically effective formulae with the combination use of metabolomics and network pharmacology may be a good strategy in exploring action mechanisms of Traditional Chinese Medicine.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Multicriteria global optimization for biocircuit design

Abstract
Background

One of the challenges in Synthetic Biology is to design circuits with increasing levels of complexity. While circuits in Biology are complex and subject to natural tradeoffs, most synthetic circuits are simple in terms of the number of regulatory regions, and have been designed to meet a single design criterion.
Results

In this contribution we introduce a multiobjective formulation for the design of biocircuits. We set up the basis for an advanced optimization tool for the modular and systematic design of biocircuits capable of handling high levels of complexity and multiple design criteria. Our methodology combines the efficiency of global Mixed Integer Nonlinear Programming solvers with multiobjective optimization techniques. Through a number of examples we show the capability of the method to generate non intuitive designs with a desired functionality setting up a priori the desired level of complexity.
Conclusions

The methodology presented here can be used for biocircuit design and also to explore and identify different design principles for synthetic gene circuits. The presence of more than one competing objective provides a realistic design setting where every solution represents an optimal trade-off between different criteria.
Biswapriya Biswavas Misra's insight:
AbstractBackground

One of the challenges in Synthetic Biology is to design circuits with increasing levels of complexity. While circuits in Biology are complex and subject to natural tradeoffs, most synthetic circuits are simple in terms of the number of regulatory regions, and have been designed to meet a single design criterion.

Results

In this contribution we introduce a multiobjective formulation for the design of biocircuits. We set up the basis for an advanced optimization tool for the modular and systematic design of biocircuits capable of handling high levels of complexity and multiple design criteria. Our methodology combines the efficiency of global Mixed Integer Nonlinear Programming solvers with multiobjective optimization techniques. Through a number of examples we show the capability of the method to generate non intuitive designs with a desired functionality setting up a priori the desired level of complexity.

Conclusions

The methodology presented here can be used for biocircuit design and also to explore and identify different design principles for synthetic gene circuits. The presence of more than one competing objective provides a realistic design setting where every solution represents an optimal trade-off between different criteria.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GOLD | Home

GOLD | Home | Databases & Softwares | Scoop.it
GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.
Biswapriya Biswavas Misra's insight:

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations

MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations | Databases & Softwares | Scoop.it
We have optimized and extended the widely used annotation-engine MAKER to in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, ncRNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software toolkit, MAKER-P, using the A. thaliana and Z. mays genomes. Here we demonstrate the ability of the MAKER-P toolkit to automatically update, extend, and revise the A. thaliana annotations in light of newly available data; and to annotate pseudogenes and ncRNAs absent from the TAIR10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even A. thaliana, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center (TACC). We show that this public resource can de novo annotate the entire Arabidopsis and Zea mays genomes in less than three hours, and produce annotations of comparable quality to those of the current TAIR10 and Z. mays V2 annotation builds.
Biswapriya Biswavas Misra's insight:

We have optimized and extended the widely used annotation-engine MAKER to in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, ncRNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software toolkit, MAKER-P, using the A. thaliana and Z. mays genomes. Here we demonstrate the ability of the MAKER-P toolkit to automatically update, extend, and revise the A. thaliana annotations in light of newly available data; and to annotate pseudogenes and ncRNAs absent from the TAIR10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even A. thaliana, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center (TACC). We show that this public resource can de novo annotate the entire Arabidopsis and Zea mays genomes in less than three hours, and produce annotations of comparable quality to those of the current TAIR10 and Z. mays V2 annotation builds.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme | Databases & Softwares | Scoop.it
High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data.

Via Mel Melendrez-Vallard
more...
No comment yet.