Databases & Softw...
Follow
Find
3.7K views | +6 today
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

PLOS Collections: CAVER 3.0: A Tool for the Analysis of Transport Pathways in Dynamic Protein Structures

PLOS Collections: CAVER 3.0: A Tool for the Analysis of Transport Pathways in Dynamic Protein Structures | Databases & Softwares | Scoop.it

Abstract Top

Tunnels and channels facilitate the transport of small molecules, ions and water solvent in a large variety of proteins. Characteristics of individual transport pathways, including their geometry, physico-chemical properties and dynamics are instrumental for understanding of structure-function relationships of these proteins, for the design of new inhibitors and construction of improved biocatalysts. CAVER is a software tool widely used for the identification and characterization of transport pathways in static macromolecular structures. Herein we present a new version of CAVER enabling automatic analysis of tunnels and channels in large ensembles of protein conformations. CAVER 3.0 implements new algorithms for the calculation and clustering of pathways. A trajectory from a molecular dynamics simulation serves as the typical input, while detailed characteristics and summary statistics of the time evolution of individual pathways are provided in the outputs. To illustrate the capabilities of CAVER 3.0, the tool was applied for the analysis of molecular dynamics simulation of the microbial enzyme haloalkane dehalogenase DhaA. CAVER 3.0 safely identified and reliably estimated the importance of all previously published DhaA tunnels, including the tunnels closed in DhaA crystal structures. Obtained results clearly demonstrate that analysis of molecular dynamics simulation is essential for the estimation of pathway characteristics and elucidation of the structural basis of the tunnel gating. CAVER 3.0 paves the way for the study of important biochemical phenomena in the area of molecular transport, molecular recognition and enzymatic catalysis. The software is freely available as a multiplatform command-line application at http://www.caver.cz.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

he tree fruit Genome Database Resources (tfGDR)

he tree fruit Genome Database Resources (tfGDR) | Databases & Softwares | Scoop.it
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Expression-based network biology identifies immune-related functional modules involved in plant defense

Plants respond to diverse environmental cues including microbial perturbations by coordinated regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Plants respond to diverse environmental cues including microbial perturbations by coordinated regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response.

Results

We have generated a large-scale immune co-expression network using a comprehensive set of Arabidopsis thaliana (hereafter Arabidopsis) transcriptomic data, which consists of a wide spectrum of immune responses to pathogens or pathogen-mimicking stimuli treatments. We employed both linear and non-linear models to generate Arabidopsis immune co-expression regulatory (AICR) network. We computed network topological properties and ascertained that this newly constructed immune network is densely connected, possesses hubs, exhibits high modularity, and displays hallmarks of a "real" biological network. We partitioned the network and identified 156 novel modules related to immune functions. Gene Ontology (GO) enrichment analyses provided insight into the key biological processes involved in determining finely tuned immune responses. We also developed novel software called OCCEAN (One Click Cis-regulatory Elements ANalysis) to discover statistically enriched promoter elements in the upstream regulatory regions of Arabidopsis at a whole genome level. We demonstrated that OCCEAN exhibits higher precision than the existing promoter element discovery tools. In light of known and newly discovered cis-regulatory elements, we evaluated biological significance of two key immune-related functional modules and proposed mechanism(s) to explain how large sets of diverse GO genes coherently function to mount effective immune responses.

Conclusions

We used a network-based, top-down approach to discover immune-related modules from transcriptomic data in Arabidopsis. Detailed analyses of these functional modules reveal new insight into the topological properties of immune co-expression networks and a comprehensive understanding of multifaceted plant defense responses. We present evidence that our newly developed software, OCCEAN, could become a popular tool for Arabidopsis research community as well as potentially expand to analyze other eukaryotic genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Analytical utility of mass spectral binning in proteomic experiments by SPectral Immonium Ion Detection (SPIID).

PubMed comprises more than 23 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
Biswapriya Biswavas Misra's insight:

Unambiguous identification of tandem mass spectra is a cornerstone in mass spectrometry (MS)-based proteomics. As the study of post-translational modifications (PTMs) by shotgun proteomics progresses in depth and coverage, the ability to correctly identify PTM-bearing peptides is essential, increasing the demand for advanced data interpretation. Several PTMs are known to generate unique fragment ions during tandem mass spectrometry (MS/MS), the so-called diagnostic ions, which unequivocally identifies that a given mass spectrum relates to a specific PTM. Although such ions hold tremendous analytical advantages, algorithms to decipher MS/MS spectra for the presence of diagnostic ions in an unbiased manner are currently lacking. Here, we present a systematic spectral pattern-based approach for the discovery of diagnostic ions, and new fragmentation mechanisms in shotgun proteomics datasets. The developed software tool is designed to analyze large sets of high resolution peptide fragmentation spectra independent of the fragmentation method, instrument type or protease employed. To benchmark the software tool we have analyzed large HCD datasets of phosphorylation, ubiquitylation, SUMOylation, formylation and lysine acetylation containing samples. Using the developed software tool we are able to identify known diagnostic ions by comparing histograms of modified and unmodified peptide spectra. Since the investigated tandem mass spectra data are acquired with high mass accuracy, unambiguous interpretation and determination of the chemical composition for the majority of detected fragment ions is feasible. Collectively we present a freely available software tool that allows for comprehensive and automatic analysis of analogous product ions in tandem mass spectra, and systematic mapping of fragmentation mechanisms related to common amino acids.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Arabidopsis Information Portal

Arabidopsis Information Portal | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Welcome to Arabidopsis Information Portal (AIP), a new resource to bring together the ever-increasing amounts of Arabidopsis data into a single, user-friendly location using the latest web technologies and web services. It will adopt a modular, federated model which ensures that responsibility for generation and maintenance of valuable data remains in the hands of the individual data providers and spreads the burden of supporting such resources across a potentially wider range of funding agencies and countries. The AIP will be developed by a team with deep experience in scientific infrastructure, data integration, and community engagement, and will take advantage of significant NSF investments in the plant biology research community. Key elements of the new AIP include the development of modular, community-extensible web-based interface that will include user work spaces that can be configured with data retrieval, analysis, and visualization applications, implementation of an Arabidopsis-specific instance of InterMine, a data integration platform that is widely accepted in the animal model organism database community, and the design and construction of a web services layer that facilitates data access, integration with iPlant Collaborative resources, federation with other data providers, and development of analytical workflows. The project will implement a sustainability strategy that embraces adoption of existing scientific infrastructure, use of virtualization, federated provision of data, collaborative development of new resources, and pursuit of alternative funding sources. Not only will AIP modernize the bioinformatics capacity of the Arabidopsis community, it will provide a foundation for multi-agency, multi-national collaboration in building and funding biological informatics capabilities.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Network topology-based detection of differential gene regulation and regulatory switches in cell metabolism and signaling

Common approaches to pathway analysis treat pathways merely as lists of genes disregarding their topological structures, that is, ignoring the genes' interactions on which a pathway's cellular function depends. In contrast, PathWave has been developed for the analysis of high-throughput gene expression data that explicitly takes the topology of networks into account to identify both global dysregulation of and localized (switch-like) regulatory shifts within metabolic and signaling pathways. For this purpose, it applies adjusted wavelet transforms on optimized 2D grid representations of curated pathway maps.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Common approaches to pathway analysis treat pathways merely as lists of genes disregarding their topological structures, that is, ignoring the genes' interactions on which a pathway's cellular function depends. In contrast, PathWave has been developed for the analysis of high-throughput gene expression data that explicitly takes the topology of networks into account to identify both global dysregulation of and localized (switch-like) regulatory shifts within metabolic and signaling pathways. For this purpose, it applies adjusted wavelet transforms on optimized 2D grid representations of curated pathway maps.

Results

Here, we present the new version of PathWave with several substantial improvements including a new method for optimally mapping pathway networks unto compact 2D lattice grids, a more flexible and user-friendly interface, and pre-arranged 2D grid representations. These pathway representations are assembled for several species now comprising H. sapiens, M. musculus, D. melanogaster, D. rerio, C. elegans, and E. coli. We show that PathWave is more sensitive than common approaches and apply it to RNA-seq expression data, identifying crucial metabolic pathways in lung adenocarcinoma, as well as microarray expression data, identifying pathways involved in longevity of Drosophila.

Conclusions

PathWave is a generic method for pathway analysis complementing established tools like GSEA, and the update comprises efficient new features. In contrast to the tested commonly applied approaches which do not take network topology into account, PathWave enables identifying pathways that are either known be involved in or very likely associated with such diverse conditions as human lung cancer or aging of D. melanogaster. The PathWave R package is freely available at http://www.ichip.de/software/pathwave.html webcite.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GigaDB Dataset: The Rice 3000 Genomes Project Data.

GigaDB Dataset: The Rice 3000 Genomes Project Data. | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:


Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, rice production must increase by at least 25% to keep pace with population growth. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land and to ensure global food supply.
Here, we include data from an international effort resequencing a core collection of 3,000 rice accessions from 89 countries as a global public good. The 3,000 sequenced rice genomes had an average sequencing depth of 14X, average genome coverage and mapping rates of 94.0% and 92.5%, respectively.
This data provides a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand at a higher level of detail the genomic diversity within O. sativa. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study | Databases & Softwares | Scoop.it
PLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific studies from all disciplines freely available to the whole world.
Biswapriya Biswavas Misra's insight:

Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge.

Results

We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready.

Conclusions

We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SFGD: a comprehensive platform for mining functional information from soybean transcriptome data and its use in identifying acyl-lipid metabolism pathways

Soybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways.
Biswapriya Biswavas Misra's insight:
Background

Soybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways.

Description

In this study, we describe our construction of the Soybean Functional Genomics Database (SFGD) using Generic Genome Browser (Gbrowse) as the core platform. We integrated microarray expression profiling with 255 samples from 14 groups’ experiments and mRNA-seq data with 30 samples from four groups’ experiments, including spatial and temporal transcriptome data for different soybean development stages and environmental stresses. The SFGD includes a gene co-expression regulatory network containing 23,267 genes and 1873 miRNA-target pairs, and a group of acyl-lipid pathways containing 221 enzymes and more than 1550 genes. The SFGD also provides some key analysis tools, i.e. BLAST search, expression pattern search and cis-element significance analysis, as well as gene ontology information search and single nucleotide polymorphism display.

Conclusion

The SFGD is a comprehensive database integrating genome and transcriptome data, and also for soybean acyl-lipid metabolism pathways. It provides useful toolboxes for biologists to improve the accuracy and robustness of soybean functional genomics analysis, further improving understanding of gene regulatory networks for effective crop improvement. The SFGD is publically accessible at http://bioinformatics.cau.edu.cn/SFGD/ webcite, with all data available for downloading.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A Multistep Screening Method to Identify Genes Using Evolutionary Tran

A Multistep Screening Method to Identify Genes Using Evolutionary Tran | Databases & Softwares | Scoop.it
A Multistep Screening Method to Identify Genes Using Evolutionary Transcriptome of Plants
Biswapriya Biswavas Misra's insight:
Abstract

We introduced a multistep screening method to identify the genes in plants using microarrays and ribonucleic acid (RNA)-seq transcriptome data. Our method describes the process for identifying genes using the salt-tolerance response pathways of the potato (Solanum tuberosum) plant. Gene expression was analyzed using microarrays and RNA-seq experiments that examined three potato lines (high, intermediate, and low salt tolerance) under conditions of salt stress. We screened the orthologous genes and pathway genes involved in salinity-related biosynthetic pathways, and identified nine potato genes that were candidates for salinity-tolerance pathways. The nine genes were selected to characterize their phylogenetic reconstruction with homologous genes of Arabidopsis thaliana, and a Circos diagram was generated to understand the relationships among the selected genes. The involvement of the selected genes in salt-tolerance pathways was verified by reverse transcription polymerase chain reaction analysis. One candidate potato gene was selected for physiological validation by generating dehydration-responsive element-binding 1 (DREB1)-overexpressing transgenic potato plants. The DREB1 overexpression lines exhibited increased salt tolerance and plant growth when compared to that of the control. Although the nine genes identified by our multistep screening method require further characterization and validation, this study demonstrates the power of our screening strategy after the initial identification of genes using microarrays and RNA-seq experiments.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:
Abstract

Summary: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.

Availability: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

KEMREP: A New Qualitative Method for the Assessment of an Analyst’s Ability to Generate a Metabolomics Data Matrix by Gas Chromatography– Mass Spectrometry

Biswapriya Biswavas Misra's insight:

The analytical procedures required to generate a quantified metabolomics data matrix include many and widely different potential sources of error, complicating the generation of reliable data. The methods generally used to assess precision of such data all have distinct merits but some clear limitations as well. In this paper we describe KEMREP (kernel method for the assessment of repeatability and reproducibility), a new method with the advantage and focus aimed specifically at analysis of the reliability of metabolomics data. Repeatability and reproducibility were assessed on gas chromatography- mass spectrometry (GC-MS) generated metabolomics data matrices produced by and between analysts and across laboratories, using cerebrospinal fluid (CSF) and urine as biological samples for analysis. KEMREP provides a visual overlay of the smoothed and scaled versions of the data from repeated samples for a direct and easy qualitative assessment of repeatability or reproducibility of a distinct chromatographic region (univariate) or for the experiment as a whole (multivariate). The KEMREP method can also be extended by the imposition of confidence bounds which provide lower and upper limits that indicate quantitatively whether the experiment was repeatable or reproducible at a predefined input coefficient of variation (CV). KEMREP is thus a novel approach which supplements existing methods of assessment of reliability of metabolomics data; provides a benchmark for assessing the quality of practical work performed by analysts; monitors the sequence of data pre-treatment steps; and tests the robustness of an experimentally designed protocol for metabolomics.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A survey of plant and algal genomes and transcriptomes reveals new insights into the evolution and function of the cellulose synthase superfamily

Enzymes of the cellulose synthase (CesA) family and CesA-like (Csl) families are responsible for the synthesis of celluloses and hemicelluloses, and thus are of great interest to bioenergy research. We studied the occurrences and phylogenies of CesA/Csl families in diverse plants and algae by comprehensive data mining of 82 genomes and transcriptomes.
Biswapriya Biswavas Misra's insight:
Background

Enzymes of the cellulose synthase (CesA) family and CesA-like (Csl) families are responsible for the synthesis of celluloses and hemicelluloses, and thus are of great interest to bioenergy research. We studied the occurrences and phylogenies of CesA/Csl families in diverse plants and algae by comprehensive data mining of 82 genomes and transcriptomes.

Results

We found that 1) charophytic green algae (CGA) have orthologous genes in CesA, CslC and CslD families; 2) liverwort genes are found in the CesA, CslA, CslC and CslD families; 3) The fern Pteridium aquilinum not only has orthologs in these conserved families but also in the CslB, CslH and CslE families; 4) basal angiosperms, e.g. Aristolochia fimbriata, have orthologs in these families too; 5) gymnosperms have genes forming clusters ancestral to CslB/H and to CslE/J/G respectively; 6) CslG is found in switchgrass and basal angiosperms; 7) CslJ is widely present in dicots and monocots; 8) CesA subfamilies have already diversified in ferns.

Conclusions

We speculate that: (i) ferns and horsetails might both have CslH enzymes, responsible for the synthesis of mixed-linkage glucans and (ii) CslD and similar genes might be responsible for the synthesis of mannans in CGA. Our findings led to a more detailed model of cell wall evolution and suggested that gene loss played an important role in the evolution of Csl families. We also demonstrated the usefulness of transcriptome data in the study of plant cell wall evolution and diversity.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

REGNET: mining context-specific human transcription networks using composite genomic information

Genome-wide expression profiles reflect the transcriptional networks specific to the given cell context. However, most statistical models try to estimate the average connectivity of the networks from a collection of gene expression data, and are unable to characterize the context-specific transcriptional regulations. We propose an approach for mining context-specific transcription networks from a large collection of gene expression fold-change profiles and composite gene-set information.
Biswapriya Biswavas Misra's insight:

Abstract (provisional)Background

Genome-wide expression profiles reflect the transcriptional networks specific to the given cell context. However, most statistical models try to estimate the average connectivity of the networks from a collection of gene expression data, and are unable to characterize the context-specific transcriptional regulations. We propose an approach for mining context-specific transcription networks from a large collection of gene expression fold-change profiles and composite gene-set information.

Results

Using a composite gene-set analysis method, we combine the information of transcription factor binding sites, Gene Ontology or pathway gene sets and gene expression fold-change profiles for a variety of cell conditions. We then collected all the significant patterns and constructed a database of context-specific transcription networks for human (REGNET). As a result, context-specific roles of transcription factors as well as their functional targets are readily explored. To validate the approach, nine predicted targets of E2F1 in HeLa cells were tested using chromatin immunoprecipitation assay. Among them, five (Gadd45b, Dusp6, Mll5, Bmp2 and E2f3) were successfully bound by E2F1. c-JUN and the EMT transcription networks were also validated from literature.

Conclusions

REGNET is a useful tool for exploring the ternary relationships among the transcription factors, their functional targets and the corresponding cell conditions. It is able to provide useful clues for novel cell-specific transcriptional regulations. The REGNET database is available at http://mgrc.kribb.re.kr/regnet

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data

miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.

Results

We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline's superior performances, flexibility, and practical use in research and biomarker discovery.

Conclusions

CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CoSBI – Identify Combinatorial Chromatin Modification Patterns across Genomic Loci

CoSBI – Identify Combinatorial Chromatin Modification Patterns across Genomic Loci | Databases & Softwares | Scoop.it
CoSBI :: DESCRIPTION CoSBI (Coherent and Shifted Bicluster Identification) is a scalable subspace clustering algorithm  to identify the complete set of combinatorial chromatin modification patterns across the entire (CoSBI – Identify Combinatorial...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision

SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:
Abstract

Many time-consuming analyses of next generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics due to their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying.

The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customised ad hoc secondary analyses and iterative machine learning algorithms. This paper demonstrates its scalability and overall very fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can to be tuned for the optimal performance on multiple worker nodes.

Availability and Implementation: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Proteomics DB

Proteomics DB | Databases & Softwares | Scoop.it
ProteomicsDB
Biswapriya Biswavas Misra's insight:
ProteomicsDB is a joint effort of the Technische Universität München (TUM) and SAP AG. It is dedicated to expedite the identification of the human proteome and its use across the scientific community.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HeteroGenome: database of genome periodicity

HeteroGenome: database of genome periodicity | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome.

Database URL: http://www.jcbi.ru/lp_baze/

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BMC Genomics | Abstract | Predicting the fungal CUG codon translation with Bagheera

Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the "Candida", translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the ?Candida?, translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme.

Results

In order to improve the prediction of genes in yeast species by providing the correct CUG decoding scheme we implemented a web server, called Bagheera, that allows determining the most probable CUG codon translation for a given transcriptome or genome assembly based on extensive reference data. As reference data we use 2071 manually assembled and annotated sequences from 38 cytoskeletal and motor proteins belonging to 79 yeast species. The web service includes a pipeline, which starts with predicting and aligning homologous genes to the reference data. CUG codon positions within the predicted genes are analysed with respect to amino acid similarity and CUG codon conservation in related species. In addition, the tRNACAG gene is predicted in genomic data and compared to known leu-tRNACAG and ser-tRNACAG genes. Bagheera can also be used to evaluate any mRNA and protein sequence data with the codon usage of the respective species. The usage of the system has been demonstrated by analysing six genomes not included in the reference data.

Conclusions

Gene prediction and consecutive comparison with reference data from other Saccharomycetes are sufficient to predict the most probable decoding scheme for CUG codons. This approach has been implemented into Bagheera (http://www.motorprotein.de/bagheera).

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Parenclitic networks: uncovering new functions in biological data

Parenclitic networks: uncovering new functions in biological data | Databases & Softwares | Scoop.it

e introduce a novel method to represent time independent, scalar data sets as complex networks. We apply our method to investigate gene expression in the response to osmotic stress of Arabidopsis thaliana. In the proposed network representation, the most important genes for the plant response turn out to be the nodes with highest centrality in appropriately reconstructed networks. We also performed a target experiment, in which the predicted genes were artificially induced one by one, and the growth of the corresponding phenotypes compared to that of the wild-type. The joint application of the network reconstruction method and of the in vivo experiments allowed identifying 15 previously unknown key genes, and provided models of their mutual relationships. This novel representation extends the use of graph theory to data sets hitherto considered outside of the realm of its application, vastly simplifying the characterization of their underlying structure.

Biswapriya Biswavas Misra's insight:

Neurospora crassa has a long history as an excellent model for genetic, cellular, and biochemical research. Although this fungus is known as a saprotroph, it normally appears on burned vegetations or trees after forest fires. However, due to a lack of experimental evidence, the nature of its association with living plants remains enigmatic. Here we report that Scots pine (Pinus sylvestris) is a host plant for N. crassa. The endophytic lifestyle of N. crassa was found in its interaction with Scots pine. Moreover, the fungus can switch to a pathogenic state when its balanced interaction with the host is disrupted. Our data reveal previously unknown lifestyles of N. crassa, which are likely controlled by both environmental and host factors. Switching among the endophytic, pathogenic, and saprotrophic lifestyles confers upon fungi phenotypic plasticity in adapting to changing environments and drives the evolution of fungi and associated plants.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments | Databases & Softwares | Scoop.it
Zotero is a powerful, easy-to-use research tool that
helps you gather, organize, and analyze sources and then
share the results of your research.
Biswapriya Biswavas Misra's insight:

SUMMARY: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models. AVAILABILITY: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface. CONTACT: ovitek@purdue.edu.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from plant cell genetics
Scoop.it!

Genome-wide analysis of heat-sensitive alternative splicing in Physcomitrella patens


Via Jean-Pierre Zryd
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Transcriptome sequencing and metabolite analysis reveals the role of delphinidin metabolism in flower colour in grape hyacinth.

PubMed comprises more than 23 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
Biswapriya Biswavas Misra's insight:
Abstract

Grape hyacinth (Muscari) is an important ornamental bulbous plant with an extraordinary blue colour. Muscari armeniacum, whose flowers can be naturally white, provides an opportunity to unravel the complex metabolic networks underlying certain biochemical traits, especially colour. A blue flower cDNA library of M. armeniacum and a white flower library of M. armeniacum f. album were used for transcriptome sequencing. A total of 89 926 uni-transcripts were isolated, 143 of which could be identified as putative homologues of colour-related genes in other species. Based on a comprehensive analysis relating colour compounds to gene expression profiles, the mechanism of colour biosynthesis was studied in M. armeniacum. Furthermore, a new hypothesis explaining the lack of colour phenotype of the grape hyacinth flower is proposed. Alteration of the substrate competition between flavonol synthase (FLS) and dihydroflavonol 4-reductase (DFR) may lead to elimination of blue pigmentation while the multishunt from the limited flux in the cyanidin (Cy) synthesis pathway seems to be the most likely reason for the colour change in the white flowers of M. armeniacum. Moreover, mass sequence data obtained by the deep sequencing of M. armeniacum and its white variant provided a platform for future function and molecular biological research on M. armeniacum.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution.

Results

We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not.

Conclusions

By examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches.

more...
No comment yet.