Databases & Softwares
5.6K views | +0 today
Follow
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

RNIE: genome-wide prediction of bacterial ... [Nucleic Acids Res. 2011] - PubMed - NCBI

Abstract

Bacterial Rho-independent terminators (RITs) are important genomic landmarks involved in gene regulation and terminating gene expression. In this investigation we present RNIE, a probabilistic approach for predicting RITs. The method is based upon covariance models which have been known for many years to be the most accurate computational tools for predicting homology in structural non-coding RNAs. We show that RNIE has superior performance in model species from a spectrum of bacterial phyla. Further analysis of species where a low number of RITs were predicted revealed a highly conserved structural sequence motif enriched near the genic termini of the pathogenic Actinobacteria, Mycobacterium tuberculosis. This motif, together with classical RITs, account for up to 90% of all the significantly structured regions from the termini of M. tuberculosis genic elements. The software, predictions and alignments described below are available from http://github.com/ppgardne/RNIE.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Rescooped by Biswapriya Biswavas Misra from Plant Genomics
Scoop.it!

Updates in Metabolomics Tools and Resources: 2014–2015 - Misra - ELECTROPHORESIS

Updates in Metabolomics Tools and Resources: 2014–2015 - Misra - ELECTROPHORESIS | Databases & Softwares | Scoop.it

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platform (mass spectrometry [MS] or nuclear magnetic resonance spectroscopy [NMR]-based) used for data acquisition. Improved machinery in metabolomics generate increasingly complex data sets which create the need for more and better processing and analysis software and in-silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources – in the form of tools, software, and databases - is currently lacking. Thus, here we provide an overview of freely-available, open-source, tools, algorithms and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table.

of

Via Biswapriya B Misra, Biswapriya Biswavas Misra
more...
Biswapriya B Misra's curator insight, October 25, 2015 1:40 PM
Keywords:
Annotation,Databases,Data analysis,Data processing,Data visualization,Mass spectrometry,Metabolites,Metabolomics,NMR;Statistics,Software tools

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platform (mass spectrometry [MS] or nuclear magnetic resonance spectroscopy [NMR]-based) used for data acquisition. Improved machinery in metabolomics generate increasingly complex data sets which create the need for more and better processing and analysis software and in-silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources – in the form of tools, software, and databases - is currently lacking. Thus, here we provide an overview of freely-available, open-source, tools, algorithms and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table.

Scooped by Biswapriya Biswavas Misra
Scoop.it!

MTD: a mammalian transcriptomic database to explore gene expression and regulation

MTD: a mammalian transcriptomic database to explore gene expression and regulation | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

A systematic transcriptome survey is essential for the characterization and comprehension of the molecular basis underlying phenotypic variations. Recently developed RNA-seq methodology has facilitated efficient data acquisition and information mining of transcriptomes in multiple tissues/cell lines. Current mammalian transcriptomic databases are either tissue-specific or species-specific, and they lack in-depth comparative features across tissues and species. Here, we present a mammalian transcriptomic database (MTD) that is focused on mammalian transcriptomes, and the current version contains data from humans, mice, rats and pigs. Regarding the core features, the MTD browses genes based on their neighboring genomic coordinates or joint KEGG pathway and provides expression information on exons, transcripts and genes by integrating them into a genome browser. We developed a novel nomenclature for each transcript that considers its genomic position and transcriptional features. The MTD allows a flexible search of genes or isoforms with user-defined transcriptional characteristics and provides both table-based descriptions and associated visualizations. To elucidate the dynamics of gene expression regulation, the MTD also enables comparative transcriptomic analysis in both intraspecies and interspecies manner. The MTD thus constitutes a valuable resource for transcriptomic and evolutionary studies. The MTD is freely accessible at http://mtd.cbi.ac.cn.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences.
Biswapriya Biswavas Misra's insight:
Abstract
Background

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences.

Results

We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests.

Conclusions

We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6−2 times more sensitive) and are more efficient (170−200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

FRAMA: from RNA-seq data to annotated mRNA assemblies

FRAMA: from RNA-seq data to annotated mRNA assemblies | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Background

Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification.

Results

We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations.

Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA’s gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches.

Conclusion

FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Neptune: A Tool for Rapid Microbial Genomic Signature Discovery

Neptune locates genomic signatures using an exact k-mer matching strategy while accommodating k-mer mismatches. The software identifies sequences that are sufficiently represented within "inclusion targets" and sufficiently absent from "exclusion targets". The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on Listeria monocytogenes and Escherichia coli genome data sets and found that signatures identified from these experiments are sensitive and specific to their respective data sets. In addition, the identified loci provide a catalog of differential loci for research of group-specific traits. Neptune has broad implications in microbial characterization for public health applications due to its efficient ad hoc signature discovery based upon differential genomics.
Biswapriya Biswavas Misra's insight:

Neptune locates genomic signatures using an exact k-mer matching strategy while accommodating k-mer mismatches. The software identifies sequences that are sufficiently represented within "inclusion targets" and sufficiently absent from "exclusion targets". The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on Listeria monocytogenes and Escherichia coli genome data sets and found that signatures identified from these experiments are sensitive and specific to their respective data sets. In addition, the identified loci provide a catalog of differential loci for research of group-specific traits. Neptune has broad implications in microbial characterization for public health applications due to its efficient ad hoc signature discovery based upon differential genomics.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0

Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0 | Databases & Softwares | Scoop.it
PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.
Biswapriya Biswavas Misra's insight:

PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GenomicInteractions: An R/Bioconductor package for manipulating and investigating chromatin interaction data

GenomicInteractions: An R/Bioconductor package for manipulating and investigating chromatin interaction data | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

Precise quantitative and spatiotemporal control of gene expression is necessary to ensure proper cellular differentiation and the maintenance of homeostasis. The relationship between gene expression and the spatial organisation of chromatin is highly complex, interdependent and not completely understood. The development of experimental techniques to interrogate both the higher-order structure of chromatin and the interactions between regulatory elements has recently lead to important insights on how gene expression is controlled. The ability to gain these and future insights is critically dependent on computational tools for the analysis and visualisation of data produced by these techniques.

Results and conclusion

We have developed GenomicInteractions, a freely available R/Bioconductor package designed for processing, analysis and visualisation of data generated from various types of chromosome conformation capture experiments. The package allows the easy annotation and summarisation of large genome-wide datasets at both the level of individual interactions and sets of genomic features, and provides several different methods for interrogating and visualising this type of data. We demonstrate this package’s utility by showing example analyses performed on interaction datasets generated using Hi-C and ChIA-PET.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

IRES-dependent translated genes in fungi: computational prediction, phylogenetic conservation and functional association

IRES-dependent translated genes in fungi: computational prediction, phylogenetic conservation and functional association | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

The initiation of translation via cellular internal ribosome entry sites plays an important role in the stress response and certain physiological conditions in which canonical cap-dependent translation initiation is compromised. Currently, only a limited number of these regulatory elements have been experimentally identified. Notably, cellular internal ribosome entry sites lack conservation of both the primary sequence and mRNA secondary structure, rendering their identification difficult. Despite their biological importance, the currently available computational strategies to predict them have had limited success. We developed a bioinformatic method based on a support vector machine for the prediction of internal ribosome entry sites in fungi using the 5’-UTR sequences of 20 non-redundant fungal organisms. Additionally, we performed a comparative analysis and characterization of the functional relationships among the gene products predicted to be translated by this cap-independent mechanism.

Results

Using our method, we predicted 6,532 internal ribosome entry sites in 20 non-redundant fungal organisms. Some orthologous groups were enriched with our positive predictions. This is the case of the HSP70 chaperone family, which remarkably has two verified internal ribosome entry sites, one in humans and the other in flies. A second example is the orthologous group of the eIF4G repression protein Sbp1p, which has two homologous genes known to be translated by this cap-independent mechanism, one in mice and the other in yeast. These examples emphasize the wide conservation of these regulatory elements as a result of selective pressure. In addition, we performed a protein-protein interaction network characterization of the gene products of our positive predictions using Saccharomyces cerevisiae as a model, which revealed a highly connected and modular topology, suggesting a functional association. A remarkable example of this functional association is our prediction of internal ribosome entry sites elements in three components of the RNA polymerase II mediator complex.

Conclusions

We developed a method for the prediction of cellular internal ribosome entry sites that may guide experimental and bioinformatic analyses to increase our understanding of protein translation regulation. Our analysis suggests that fungi show evolutionary conservation and functional association of proteins translated by this cap-independent mechanism.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.)

De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.) | Databases & Softwares | Scoop.it
De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.)
Biswapriya Biswavas Misra's insight:
Abstract
Background

Jute fiber, extracted from stem bast, is called golden fiber. It is essential for fiber improvement to discover the genes associated with jute development at the vegetative growth stage. However, only 858 EST sequences of jute were deposited in the GenBank database. Obviously, the public available data is far from sufficient to understand the molecular mechanism of the fiber biosynthesis. It is imperative to conduct transcriptomic sequence for jute, which can be used for the discovery of a number of new genes, especially genes involved in cellulose biosynthesis.

Results

A total of 79,754,600 clean reads (7.98 Gb) were generated using Illumina paired-end sequencing. De novo assembly yielded 48,914 unigenes with an average length of 903 bp. By sequence similarity searching for known proteins, 27,962 (57.16 %) unigenes were annotated for their function. Out of these annotated unigenes, 21,856 and 11,190 unigenes were assigned to gene ontology (GO) and euKaryotic Ortholog Groups (KOG), respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 14,216 unigenes were mapped to 268 KEGG pathways. Moreover, 5 Susy, 3 UGPase, 9 CesA, 18 CSL, 2 Kor (Korrigan), and 12 Cobra unigenes involving in cellulose biosynthesis were identified. Among these unigenes, the unigenes of comp11264_c0 (SuSy), comp24568_c0 (UGPase), comp11363_c0 (CesA), comp11363_c1 (CesA), comp24217_c0 (CesA), and comp23531_c0 (CesA), displayed relatively high expression level in stem bast using FPKM and RT-qPCR, indicating that they may have potential value of dissecting mechanism on cellulose biosynthesis in jute. In addition, a total of 12,518 putative gene-associate SNPs were called from these assembled uingenes.

Conclusion

We characterized the transcriptome of jute, discovered a broad survey of unigenes associated with vegetative growth and development, developed large-scale SNPs, and analyzed the expression patterns of genes involved in cellulose biosynthesis for bast fiber. All these provides a valuable genomics resource, which will accelerate the understanding of the mechanism of fiber development in jute.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Leveraging global gene expression patterns to predict expression of unmeasured genes

Leveraging global gene expression patterns to predict expression of unmeasured genes | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes.

Results

We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP|) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1–3), and |rP| (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP|.

Conclusions

GGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied to any organism, model system, disease, or tissue type for which whole genome gene expression data exists.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Neptune: A Tool for Rapid Genomic Signature Discovery

bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Biswapriya Biswavas Misra's insight:

Neptune locates genomic signatures using an exact k-mer matching strategy while accommodating k-mer mismatches. The software identifies sequences that are sufficiently represented within inclusion targets and sufficiently absent from exclusion targets. The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on Listeria monocytogenes and Escherichia coli data sets and found that signatures identified from these experiments are highly sensitive and specific to their respective data sets. Neptune has broad implications in bacterial characterization for public health applications due to its efficient signature discovery based upon differential genomics. In addition, the identified loci may also provide a source material for research leading to investigations of group-specific traits.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A survey of computational tools for downstream analysis of proteomic and other omic datasets

Proteomics is an expanding area of research into biological systems with significance for biomedical and therapeutic applications ranging from understanding the molecular basis of diseases to testing new treatments, studying the toxicity of drugs, or biotechnological improvements in agriculture. Progress in proteomic technologies and growing interest has resulted in rapid accumulation of proteomic data, and consequently, a great number of tools have become available. In this paper, we review the well-known and ready-to-use tools for classification, clustering and validation, interpretation, and generation of biological information from experimental data. We suggest some rules of thumb for the reader on choosing the best suitable learning method for a particular dataset and conclude with pathway and functional analysis and then provide information about submitting final results to a repository.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

TransVar: a multilevel variant annotator for precision genomics

TransVar: a multilevel variant annotator for precision genomics | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

To facilitate standardization and reveal inconsistencies in existing variant annotations, we have designed a novel variant annotator, TransVar (http://www.transvar.net), to perform three main functions supporting diverse reference genomes and transcript databases (Fig. 1a): (i) forward annotation, which annotates all potential effects of a genomic variant on mRNAs and…

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Seqping: Gene Prediction Pipeline for Plant Genomes using Self-Trained Gene Models and Transcriptomic Data

bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Biswapriya Biswavas Misra's insight:

Although various software are available for gene prediction, none of the currently available gene-finders have a universal Hidden Markov Models (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. Here, we report an automated pipeline that performs gene prediction using self-trained HMM models and transcriptomic data. The program processes the genome and transcriptome sequences of a target species through GlimmerHMM, SNAP, and AUGUSTUS training pipeline that ends with the program MAKER2 combining the predictions from the three models in association with the transcriptomic evidence. The pipeline generates species-specific HMMs and is able to predict genes that are not biased to other model organisms. Our evaluation of the program revealed that it performed better than the use of the closest related HMM from a standalone program.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BISQUE 1.0 - a human Genomic, Proteomic, and Transcriptomic Conversion tool

BISQUE 1.0 - a human Genomic, Proteomic, and Transcriptomic Conversion tool | Databases & Softwares | Scoop.it
BISQUE 1.0
:: DESCRIPTION

BISQUE (The Biological Sequence Exchange) is a bioinformatics tool enabling locus and variant-specific conversion among human gene, transcript, and protein identifiers from several popular d
Biswapriya Biswavas Misra's insight:

BISQUE (The Biological Sequence Exchange) is a bioinformatics tool enabling locus and variant-specific conversion among human gene, transcript, and protein identifiers from several popular databases.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

TROM: A Testing-based Method for Finding Transcriptomic Similarity of Biological Samples

Biswapriya Biswavas Misra's insight:

Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and differentiation of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective to interpret transcriptomic similarity in contrast to traditional correlation analyses. Specifically, the TROM method focuses on identifying associated genes that capture molecular characteristics of biological samples, and subsequently comparing the biological samples by testing the overlap of their associated genes. We use simulation and real data studies to demonstrate that TROM is more powerful in identifying similar transcriptomes and more robust to stochastic gene expression noise than Pearson and Spearman correlations. We apply TROM to compare the developmental stages of six Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and find interesting correspondence patterns that imply conserved gene expression programs in the development of these species. The TROM method is available as an R package on CRAN (this http URL) with manuals and source codes available at this http URL jingyi.li/software-and-data/trom.html.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

svclassify: a method to establish benchmark structural variant calls

svclassify: a method to establish benchmark structural variant calls | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.

Results

We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.

Conclusions

We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Network-based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis

High-throughput mRNA sequencing (RNA-Seq) is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-Seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ) to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA), the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification. Net-RSTQ toolbox is available at http://compbio.cs.umn.edu/Net-RSTQ
Biswapriya Biswavas Misra's insight:

High-throughput mRNA sequencing (RNA-Seq) is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-Seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ) to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA), the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification. Net-RSTQ toolbox is available at http://compbio.cs.umn.edu/Net-RSTQ

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

OASIS: web-based platform for exploring cancer multi-omics data : Nature Methods : Nature Publishing Group

OASIS: web-based platform for exploring cancer multi-omics data : Nature Methods : Nature Publishing Group | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Genomics data are valuable for studying the molecular bases of diseases as well as developing effective treatments. Advances in high-throughput technologies have given rise to a proliferation of cancer genomics data in both academia and industry. There is an ever-increasing demand for broadly available informatics tools integrated with…

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Comprehensive characterization of a time-course transcriptional response induced by autotoxins in Panax ginseng using RNA-Seq

Comprehensive characterization of a time-course transcriptional response induced by autotoxins in Panax ginseng using RNA-Seq | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

As a valuable medicinal plant, the yield of Panax ginseng is seriously affected by autotoxicity, which is a common phenomenon due to continuous cropping. However, the mechanism of autotoxicity in P. ginseng is still unknown.

Results

In total, high throughput sequencing of 18 RNA-Seq libraries produced 996,000 000 100-nt reads that were assembled into 72,732 contigs. Compared with control, 3697 and 2828 genes were significantly up- and down-regulated across different tissues and time points, respectively. Gene Ontology enrichment analysis showed that ‘enzyme inhibitor activity’, ‘carboxylesterase activity’, ‘pectinesterase activity’, ‘centrosome cycle and duplication’ and ‘mitotic spindle elongation’ were enriched for the up-regulated genes. Transcription factors including AP2s/ERFs, MYBs, and WRKYs were up-regulated in roots after benzoic acid treatment. Moreover, reactive oxygen species, peroxidases and superoxide dismutase contigs were up-regulated in roots after benzoic acid treatment. Physiological and biochemical indexes showed that the proline and malondialdehyde content were restored to lower levels at a later stage after benzoic acid treatment. Benzoic acid inhibited the root hair development in a dose-dependent manner, and several differential expressed genes potentially involved in hair development were identified. Several key contigs in the flavonoid and ginsenoside biosynthesis pathways were repressed. Finally, 58,518 alternative splicing (AS) events from 12,950 genes were found after benzoic acid treatment. Interestingly, contigs in the ginsenoside biosynthetic pathway underwent AS, providing useful information about post-transcriptional regulation in P. ginseng.

Conclusions

This study revealed the stress-response molecular mechanisms in P. ginseng induced by benzoic acid.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ChromContact: A web tool for analyzing spatial contact of chromosomes from Hi-C data

ChromContact: A web tool for analyzing spatial contact of chromosomes from Hi-C data | Databases & Softwares | Scoop.it
BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
Biswapriya Biswavas Misra's insight:
Abstract
Background

Hi-C analysis has revealed the three-dimensional architecture of chromosomes in the nucleus. Although Hi-C data contains valuable information on long-range interactions of chromosomes, the data is not yet widely utilized by molecular biologists because of the quantity of data.

Results

We developed a web tool, ChromContact, to utilize the information obtained by Hi-C. The web tool is designed to be simple and easy to use. By specifying a locus of interest, ChromContact calculates contact profiles and generates links to the UCSC Genome Browser, enabling users to visually examine the contact information with various annotations.

Conclusion

ChromContact provides wide-range of molecular biologists with a user-friendly means to access high-resolution Hi-C data. One of the possible applications of ChromContact is investigating novel long-range promoter-enhancer interactions. This facilitates the functional interpretation of statistically significant markers identified by GWAS or ChIP-seq peaks that are located far from any annotated genes. ChromContact is freely accessible at http://bioinfo.sls.kyushu-u.ac.jp/chromcontact/.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.)

De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.) | Databases & Softwares | Scoop.it
De novo transcriptome sequence and identification of major bast-related genes involved in cellulose biosynthesis in jute (Corchorus capsularis L.)
Biswapriya Biswavas Misra's insight:
Abstract
Background

Jute fiber, extracted from stem bast, is called golden fiber. It is essential for fiber improvement to discover the genes associated with jute development at the vegetative growth stage. However, only 858 EST sequences of jute were deposited in the GenBank database. Obviously, the public available data is far from sufficient to understand the molecular mechanism of the fiber biosynthesis. It is imperative to conduct transcriptomic sequence for jute, which can be used for the discovery of a number of new genes, especially genes involved in cellulose biosynthesis.

Results

A total of 79,754,600 clean reads (7.98 Gb) were generated using Illumina paired-end sequencing. De novo assembly yielded 48,914 unigenes with an average length of 903 bp. By sequence similarity searching for known proteins, 27,962 (57.16 %) unigenes were annotated for their function. Out of these annotated unigenes, 21,856 and 11,190 unigenes were assigned to gene ontology (GO) and euKaryotic Ortholog Groups (KOG), respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 14,216 unigenes were mapped to 268 KEGG pathways. Moreover, 5 Susy, 3 UGPase, 9 CesA, 18 CSL, 2 Kor (Korrigan), and 12 Cobra unigenes involving in cellulose biosynthesis were identified. Among these unigenes, the unigenes of comp11264_c0 (SuSy), comp24568_c0 (UGPase), comp11363_c0 (CesA), comp11363_c1 (CesA), comp24217_c0 (CesA), and comp23531_c0 (CesA), displayed relatively high expression level in stem bast using FPKM and RT-qPCR, indicating that they may have potential value of dissecting mechanism on cellulose biosynthesis in jute. In addition, a total of 12,518 putative gene-associate SNPs were called from these assembled uingenes.

Conclusion

We characterized the transcriptome of jute, discovered a broad survey of unigenes associated with vegetative growth and development, developed large-scale SNPs, and analyzed the expression patterns of genes involved in cellulose biosynthesis for bast fiber. All these provides a valuable genomics resource, which will accelerate the understanding of the mechanism of fiber development in jute.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

iRDA: a new filter towards predictive, stable, and enriched candidate genes

iRDA: a new filter towards predictive, stable, and enriched candidate genes | Databases & Softwares | Scoop.it
Abstract
Background

Gene expression profiling using high-throughput screening (HTS) technologies allows clinical researchers to find prognosis gene signatures that could better discriminate between different phenotypes and serve as potential biological markers in disease diagnoses. In recent years, many feature selection methods have been devised for finding such discriminative genes, and more recently information theoretic filters have also been introduced for capturing feature-to-class relevance and feature-to-feature correlations in microarray-based classification.
Methods

In this paper, we present and fully formulate a new multivariate filter, iRDA, for the discovery of HTS gene-expression candidate genes. The filter constitutes a four-step framework and includes feature relevance, feature redundancy, and feature interdependence in the context of feature-pairs. The method is based upon approximate Markov blankets, information theory, several heuristic search strategies with forward, backward and insertion phases, and the method is aiming at higher order gene interactions.
Results

To show the strengths of iRDA, three performance measures, two evaluation schemes, two stability index sets, and the gene set enrichment analysis (GSEA) are all employed in our experimental studies. Its effectiveness has been validated by using seven well-known cancer gene-expression benchmarks and four other disease experiments, including a comparison to three popular information theoretic filters. In terms of classification performance, candidate genes selected by iRDA perform better than the sets discovered by the other three filters. Two stability measures indicate that iRDA is the most robust with the least variance. GSEA shows that iRDA produces more statistically enriched gene sets on five out of the six benchmark datasets.
Conclusions

Through the classification performance, the stability performance, and the enrichment analysis, iRDA is a promising filter to find predictive, stable, and enriched gene-expression candidate genes.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing

iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing | Databases & Softwares | Scoop.it
Commons
Biswapriya Biswavas Misra's insight:

The integrative Japanese Genome Variation Database (iJGVD; http://ijgvd.megabank.tohoku.ac.jp/) provides genomic variation data detected by whole-genome sequencing (WGS) of Japanese individuals. Specifically, the database contains variants detected by WGS of 1,070 individuals who participated in a genome cohort study of the Tohoku Medical Megabank Project. In the first release, iJGVD includes >4,300,000 autosomal single nucleotide variants (SNVs) whose minor allele frequencies are >5.0%.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Computational discovery of transcription factors associated with drug response

Computational discovery of transcription factors associated with drug response | Databases & Softwares | Scoop.it
This study integrates gene expression, genotype and drug response data in lymphoblastoid cell lines with transcription factor (TF)-binding sites from ENCODE (Encyclopedia of Genomic Elements) in a novel methodology that elucidates regulatory contexts associated with cytotoxicity. The method, GENMi (Gene Expression iN the Middle), postulates that single-nucleotide polymorphisms within TF-binding sites putatively modulate its regulatory activity, and the resulting variation in gene expression leads to variation in drug response. Analysis of 161 TFs and 24 treatments revealed 334 significantly associated TF–treatment pairs. Investigation of 20 selected pairs yielded literature support for 13 of these associations, often from studies where perturbation of the TF expression changes drug response. Experimental validation of significant GENMi associations in taxanes and anthracyclines across two triple-negative breast cancer cell lines corroborates our findings. The method is shown to be more sensitive than an alternative, genome-wide association study-based approach that does not use gene expression. These results demonstrate the utility of GENMi in identifying TFs that influence drug response and provide a number of candidates for further testing.
Biswapriya Biswavas Misra's insight:

This study integrates gene expression, genotype and drug response data in lymphoblastoid cell lines with transcription factor (TF)-binding sites from ENCODE (Encyclopedia of Genomic Elements) in a novel methodology that elucidates regulatory contexts associated with cytotoxicity. The method, GENMi (Gene Expression iN the Middle), postulates that single-nucleotide polymorphisms within TF-binding sites putatively modulate its regulatory activity, and the resulting variation in gene expression leads to variation in drug response. Analysis of 161 TFs and 24 treatments revealed 334 significantly associated TF–treatment pairs. Investigation of 20 selected pairs yielded literature support for 13 of these associations, often from studies where perturbation of the TF expression changes drug response. Experimental validation of significant GENMi associations in taxanes and anthracyclines across two triple-negative breast cancer cell lines corroborates our findings. The method is shown to be more sensitive than an alternative, genome-wide association study-based approach that does not use gene expression. These results demonstrate the utility of GENMi in identifying TFs that influence drug response and provide a number of candidates for further testing.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Conditional robustness analysis for fragility discovery and target identification in biochemical networks and in cancer systems biology

The study of cancer therapy is a key issue in the field of oncology research and the development of target therapies is one of the main problems currently under investigation. This is particularly relevant in different types of tumor where traditional chemotherapy approaches often fail, such as lung cancer.
Biswapriya Biswavas Misra's insight:
Abstract
Background

The study of cancer therapy is a key issue in the field of oncology research and the development of target therapies is one of the main problems currently under investigation. This is particularly relevant in different types of tumor where traditional chemotherapy approaches often fail, such as lung cancer.

Results

We started from the general definition of robustness introduced by Kitano and applied it to the analysis of dynamical biochemical networks, proposing a new algorithm based on moment independent analysis of input/output uncertainty. The framework utilizes novel computational methods which enable evaluating the model fragility with respect to quantitative performance measures and parameters such as reaction rate constants and initial conditions. The algorithm generates a small subset of parameters that can be used to act on complex networks and to obtain the desired behaviors. We have applied the proposed framework to the EGFR-IGF1R signal transduction network, a crucial pathway in lung cancer, as an example of Cancer Systems Biology application in drug discovery. Furthermore, we have tested our framework on a pulse generator network as an example of Synthetic Biology application, thus proving the suitability of our methodology to the characterization of the input/output synthetic circuits.

Conclusions

The achieved results are of immediate practical application in computational biology, and while we demonstrate their use in two specific examples, they can in fact be used to study a wider class of biological systems.

Keywords:

Robustness analysis; Cancer robustness; Target therapies; Lung cancer; Drug discovery; Cancer systems biology; EGFR-IGF1R networks

more...
No comment yet.