Databases & Softw...
Follow
Find
3.8K views | +0 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

PyCogent - Analyze genome sequences with this Python tool.

PyCogent - Analyze genome sequences with this Python tool. | Databases & Softwares | Scoop.it

PyCogent - Analyze genome sequences with this Python tool.

Analyze genome sequences with this Python tool.

PyCogent was created as an open source library that can be used in genomic biology.

PyCogent is a tool that comes included with controllers for third-party apps, connectors to remote databases and generalized probabilistic techniques for working with biological sequences.

Furthermore, PyCogent is an utility that allows you to perform codon alignments. New methods of analyzing genomic data are frequently added.

Requirements:

- Python 2.6 or higher
- NumPy 1.3 or higherPyCogent 1.5.3 is licensed as Freeware for the Windows operating system / platform. PyCogent is provided as a free download for all software users (Freeware).

 

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions.

Results

In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs.

Conclusion

GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. GACT software: www.uvm.edu/genomics/software/gact

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Analysis of the Protein Domain and Domain Architecture Content in Fungi and Its Application in the Search of New Antifungal Targets

Analysis of the Protein Domain and Domain Architecture Content in Fungi and Its Application in the Search of New Antifungal Targets | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:
Abstract

Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HelicoBase: a Helicobacter genomic resource and analysis platform

Helicobacter is a genus of Gram-negative bacteria, possessing a characteristic helical shape that has been associated with a wide spectrum of human diseases. Although much research has been done on Helicobacter and many genomes have been sequenced, currently there is no specialized Helicobacter genomic resource and analysis platform to facilitate analysis of these genomes. With the increasing number of Helicobacter genomes being sequenced, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of diseases caused by Helicobacter pathogens.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Helicobacter is a genus of Gram-negative bacteria, possessing a characteristic helical shape that has been associated with a wide spectrum of human diseases. Although much research has been done on Helicobacter and many genomes have been sequenced, currently there is no specialized Helicobacter genomic resource and analysis platform to facilitate analysis of these genomes. With the increasing number of Helicobacter genomes being sequenced, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of diseases caused by Helicobacter pathogens.

Description: To facilitate the ongoing research on Helicobacter, a specialized central repository and analysis platform for the Helicobacter research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data, particularly comparative analysis. Here we present HelicoBase, a user-friendly Helicobacter resource platform with diverse functionality for the analysis of Helicobacter genomic data for the Helicobacter research communities. HelicoBase hosts a total of 13 species and 166 genome sequences of Helicobacter spp. Genome annotations such as gene/protein sequences, protein function and sub-cellular localisation are also included. Our web implementation supports diverse query types and seamless searching of annotations using an AJAX-based real-time searching system. JBrowse is also incorporated to allow rapid and seamless browsing of Helicobacter genomes and annotations. Advanced bioinformatics analysis tools consisting of standard BLAST for similarity search, VFDB BLAST for sequence similarity search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis are also included to facilitate the analysis of Helicobacter genomic data.

Conclusions

HelicoBase offers access to a range of genomic resources as well as tools for the analysis of Helicobacter genome data. HelicoBase can be accessed at http://helicobacter.um.edu.my.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

he tree fruit Genome Database Resources (tfGDR)

he tree fruit Genome Database Resources (tfGDR) | Databases & Softwares | Scoop.it
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Expression-based network biology identifies immune-related functional modules involved in plant defense

Plants respond to diverse environmental cues including microbial perturbations by coordinated regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Plants respond to diverse environmental cues including microbial perturbations by coordinated regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response.

Results

We have generated a large-scale immune co-expression network using a comprehensive set of Arabidopsis thaliana (hereafter Arabidopsis) transcriptomic data, which consists of a wide spectrum of immune responses to pathogens or pathogen-mimicking stimuli treatments. We employed both linear and non-linear models to generate Arabidopsis immune co-expression regulatory (AICR) network. We computed network topological properties and ascertained that this newly constructed immune network is densely connected, possesses hubs, exhibits high modularity, and displays hallmarks of a "real" biological network. We partitioned the network and identified 156 novel modules related to immune functions. Gene Ontology (GO) enrichment analyses provided insight into the key biological processes involved in determining finely tuned immune responses. We also developed novel software called OCCEAN (One Click Cis-regulatory Elements ANalysis) to discover statistically enriched promoter elements in the upstream regulatory regions of Arabidopsis at a whole genome level. We demonstrated that OCCEAN exhibits higher precision than the existing promoter element discovery tools. In light of known and newly discovered cis-regulatory elements, we evaluated biological significance of two key immune-related functional modules and proposed mechanism(s) to explain how large sets of diverse GO genes coherently function to mount effective immune responses.

Conclusions

We used a network-based, top-down approach to discover immune-related modules from transcriptomic data in Arabidopsis. Detailed analyses of these functional modules reveal new insight into the topological properties of immune co-expression networks and a comprehensive understanding of multifaceted plant defense responses. We present evidence that our newly developed software, OCCEAN, could become a popular tool for Arabidopsis research community as well as potentially expand to analyze other eukaryotic genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Analytical utility of mass spectral binning in proteomic experiments by SPectral Immonium Ion Detection (SPIID).

PubMed comprises more than 23 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
Biswapriya Biswavas Misra's insight:

Unambiguous identification of tandem mass spectra is a cornerstone in mass spectrometry (MS)-based proteomics. As the study of post-translational modifications (PTMs) by shotgun proteomics progresses in depth and coverage, the ability to correctly identify PTM-bearing peptides is essential, increasing the demand for advanced data interpretation. Several PTMs are known to generate unique fragment ions during tandem mass spectrometry (MS/MS), the so-called diagnostic ions, which unequivocally identifies that a given mass spectrum relates to a specific PTM. Although such ions hold tremendous analytical advantages, algorithms to decipher MS/MS spectra for the presence of diagnostic ions in an unbiased manner are currently lacking. Here, we present a systematic spectral pattern-based approach for the discovery of diagnostic ions, and new fragmentation mechanisms in shotgun proteomics datasets. The developed software tool is designed to analyze large sets of high resolution peptide fragmentation spectra independent of the fragmentation method, instrument type or protease employed. To benchmark the software tool we have analyzed large HCD datasets of phosphorylation, ubiquitylation, SUMOylation, formylation and lysine acetylation containing samples. Using the developed software tool we are able to identify known diagnostic ions by comparing histograms of modified and unmodified peptide spectra. Since the investigated tandem mass spectra data are acquired with high mass accuracy, unambiguous interpretation and determination of the chemical composition for the majority of detected fragment ions is feasible. Collectively we present a freely available software tool that allows for comprehensive and automatic analysis of analogous product ions in tandem mass spectra, and systematic mapping of fragmentation mechanisms related to common amino acids.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Arabidopsis Information Portal

Arabidopsis Information Portal | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Welcome to Arabidopsis Information Portal (AIP), a new resource to bring together the ever-increasing amounts of Arabidopsis data into a single, user-friendly location using the latest web technologies and web services. It will adopt a modular, federated model which ensures that responsibility for generation and maintenance of valuable data remains in the hands of the individual data providers and spreads the burden of supporting such resources across a potentially wider range of funding agencies and countries. The AIP will be developed by a team with deep experience in scientific infrastructure, data integration, and community engagement, and will take advantage of significant NSF investments in the plant biology research community. Key elements of the new AIP include the development of modular, community-extensible web-based interface that will include user work spaces that can be configured with data retrieval, analysis, and visualization applications, implementation of an Arabidopsis-specific instance of InterMine, a data integration platform that is widely accepted in the animal model organism database community, and the design and construction of a web services layer that facilitates data access, integration with iPlant Collaborative resources, federation with other data providers, and development of analytical workflows. The project will implement a sustainability strategy that embraces adoption of existing scientific infrastructure, use of virtualization, federated provision of data, collaborative development of new resources, and pursuit of alternative funding sources. Not only will AIP modernize the bioinformatics capacity of the Arabidopsis community, it will provide a foundation for multi-agency, multi-national collaboration in building and funding biological informatics capabilities.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Network topology-based detection of differential gene regulation and regulatory switches in cell metabolism and signaling

Common approaches to pathway analysis treat pathways merely as lists of genes disregarding their topological structures, that is, ignoring the genes' interactions on which a pathway's cellular function depends. In contrast, PathWave has been developed for the analysis of high-throughput gene expression data that explicitly takes the topology of networks into account to identify both global dysregulation of and localized (switch-like) regulatory shifts within metabolic and signaling pathways. For this purpose, it applies adjusted wavelet transforms on optimized 2D grid representations of curated pathway maps.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Common approaches to pathway analysis treat pathways merely as lists of genes disregarding their topological structures, that is, ignoring the genes' interactions on which a pathway's cellular function depends. In contrast, PathWave has been developed for the analysis of high-throughput gene expression data that explicitly takes the topology of networks into account to identify both global dysregulation of and localized (switch-like) regulatory shifts within metabolic and signaling pathways. For this purpose, it applies adjusted wavelet transforms on optimized 2D grid representations of curated pathway maps.

Results

Here, we present the new version of PathWave with several substantial improvements including a new method for optimally mapping pathway networks unto compact 2D lattice grids, a more flexible and user-friendly interface, and pre-arranged 2D grid representations. These pathway representations are assembled for several species now comprising H. sapiens, M. musculus, D. melanogaster, D. rerio, C. elegans, and E. coli. We show that PathWave is more sensitive than common approaches and apply it to RNA-seq expression data, identifying crucial metabolic pathways in lung adenocarcinoma, as well as microarray expression data, identifying pathways involved in longevity of Drosophila.

Conclusions

PathWave is a generic method for pathway analysis complementing established tools like GSEA, and the update comprises efficient new features. In contrast to the tested commonly applied approaches which do not take network topology into account, PathWave enables identifying pathways that are either known be involved in or very likely associated with such diverse conditions as human lung cancer or aging of D. melanogaster. The PathWave R package is freely available at http://www.ichip.de/software/pathwave.html webcite.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GigaDB Dataset: The Rice 3000 Genomes Project Data.

GigaDB Dataset: The Rice 3000 Genomes Project Data. | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:


Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, rice production must increase by at least 25% to keep pace with population growth. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land and to ensure global food supply.
Here, we include data from an international effort resequencing a core collection of 3,000 rice accessions from 89 countries as a global public good. The 3,000 sequenced rice genomes had an average sequencing depth of 14X, average genome coverage and mapping rates of 94.0% and 92.5%, respectively.
This data provides a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand at a higher level of detail the genomic diversity within O. sativa. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study | Databases & Softwares | Scoop.it
PLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific studies from all disciplines freely available to the whole world.
Biswapriya Biswavas Misra's insight:

Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge.

Results

We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready.

Conclusions

We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SFGD: a comprehensive platform for mining functional information from soybean transcriptome data and its use in identifying acyl-lipid metabolism pathways

Soybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways.
Biswapriya Biswavas Misra's insight:
Background

Soybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways.

Description

In this study, we describe our construction of the Soybean Functional Genomics Database (SFGD) using Generic Genome Browser (Gbrowse) as the core platform. We integrated microarray expression profiling with 255 samples from 14 groups’ experiments and mRNA-seq data with 30 samples from four groups’ experiments, including spatial and temporal transcriptome data for different soybean development stages and environmental stresses. The SFGD includes a gene co-expression regulatory network containing 23,267 genes and 1873 miRNA-target pairs, and a group of acyl-lipid pathways containing 221 enzymes and more than 1550 genes. The SFGD also provides some key analysis tools, i.e. BLAST search, expression pattern search and cis-element significance analysis, as well as gene ontology information search and single nucleotide polymorphism display.

Conclusion

The SFGD is a comprehensive database integrating genome and transcriptome data, and also for soybean acyl-lipid metabolism pathways. It provides useful toolboxes for biologists to improve the accuracy and robustness of soybean functional genomics analysis, further improving understanding of gene regulatory networks for effective crop improvement. The SFGD is publically accessible at http://bioinformatics.cau.edu.cn/SFGD/ webcite, with all data available for downloading.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

A Multistep Screening Method to Identify Genes Using Evolutionary Tran

A Multistep Screening Method to Identify Genes Using Evolutionary Tran | Databases & Softwares | Scoop.it
A Multistep Screening Method to Identify Genes Using Evolutionary Transcriptome of Plants
Biswapriya Biswavas Misra's insight:
Abstract

We introduced a multistep screening method to identify the genes in plants using microarrays and ribonucleic acid (RNA)-seq transcriptome data. Our method describes the process for identifying genes using the salt-tolerance response pathways of the potato (Solanum tuberosum) plant. Gene expression was analyzed using microarrays and RNA-seq experiments that examined three potato lines (high, intermediate, and low salt tolerance) under conditions of salt stress. We screened the orthologous genes and pathway genes involved in salinity-related biosynthetic pathways, and identified nine potato genes that were candidates for salinity-tolerance pathways. The nine genes were selected to characterize their phylogenetic reconstruction with homologous genes of Arabidopsis thaliana, and a Circos diagram was generated to understand the relationships among the selected genes. The involvement of the selected genes in salt-tolerance pathways was verified by reverse transcription polymerase chain reaction analysis. One candidate potato gene was selected for physiological validation by generating dehydration-responsive element-binding 1 (DREB1)-overexpressing transgenic potato plants. The DREB1 overexpression lines exhibited increased salt tolerance and plant growth when compared to that of the control. Although the nine genes identified by our multistep screening method require further characterization and validation, this study demonstrates the power of our screening strategy after the initial identification of genes using microarrays and RNA-seq experiments.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties

Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties | Databases & Softwares | Scoop.it
PLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific studies from all disciplines freely available to the whole world.
Biswapriya Biswavas Misra's insight:

Abstract

 

Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smirnov test rejects the power law as a plausible model for degree distribution. For the major MeSH network the average path length was 2.63 edges with a diameter of seven edges and clustering coefficient of 0.15. The Kolmogorov-Smirnov test failed to reject the power law as a plausible model. The power-law exponent was 5.07. In both networks it was evident that nodes with a lower degree exhibit higher clustering than those with a higher degree. After simulated attack, where we removed 10% of nodes with the highest degrees, the giant component of each of the two networks contains about 90% of all nodes. Because of small average path length and high degree of clustering the MeSH network is small-world. A power-law distribution is not a plausible model for the degree distribution. The network is highly modular, highly resistant to targeted and random attack and with minimal dissortativity.

 

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

LipidWrapper: An Algorithm for Generating Large-Scale Membrane Models of Arbitrary Geometry

LipidWrapper: An Algorithm for Generating Large-Scale Membrane Models of Arbitrary Geometry | Databases & Softwares | Scoop.it
PLOS Computational Biology is an open-access
Biswapriya Biswavas Misra's insight:
Abstract

As ever larger and more complex biological systems are modeled in silico, approximating physiological lipid bilayers with simple planar models becomes increasingly unrealistic. In order to build accurate large-scale models of subcellular environments, models of lipid membranes with carefully considered, biologically relevant curvature will be essential. In the current work, we present a multi-scale utility called LipidWrapper capable of creating curved membrane models with geometries derived from various sources, both experimental and theoretical. To demonstrate its utility, we use LipidWrapper to examine an important mechanism of influenza virulence. A copy of the program can be downloaded free of charge under the terms of the open-source FreeBSD License from http://nbcr.ucsd.edu/lipidwrapper. LipidWrapper has been tested on all major computer operating systems.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

piRNAQuest: searching the piRNAome for silencers

PIWI-interacting RNA (piRNA) is a novel and emerging class of small non-coding RNA (sncRNA). Ranging in length from 26-32 nucleotides, this sncRNA is a potent player in guiding the vital regulatory processes within a cellular system. Inspite of having such a wide role within cellular systems, piRNAs are not well organized and classified, so that a researcher can pool out the biologically relevant information concerning this class.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

PIWI-interacting RNA (piRNA) is a novel and emerging class of small non-coding RNA (sncRNA). Ranging in length from 26-32 nucleotides, this sncRNA is a potent player in guiding the vital regulatory processes within a cellular system. Inspite of having such a wide role within cellular systems, piRNAs are not well organized and classified, so that a researcher can pool out the biologically relevant information concerning this class.

Description: Here we present piRNAQuest- a unified and comprehensive database of 41749 human, 890078 mouse and 66758 rat piRNAsobtained from NCBI and different small RNA sequence experiments. This database provides piRNA annotation based on their localization in gene, intron, intergenic, CDS, 5/UTR, 3/UTR and repetitive regions which has not been done so far. We have also annotated piRNA clusters and have elucidated characteristic motifs within them. We have looked for the presence of piRNAs and piRNA clusters in pseudogenes, which are known to regulate the expression of protein coding transcripts by generating small RNAs. All these will help researchers progress towards solving the unanswered queries on piRNA biogenesis and their mode of action. Further, expression profile for piRNA in different tissues and from different developmental stages has been provided. In addition, we have provided several tools like 'homology search', 'dynamic cluster search' and 'pattern search'. Overall, piRNAQuest will serve as a useful resource for exploring human, mouse and rat piRNAome. The database is freely accessible and available at http://bicresources.jcbose.ac.in/zhumur/pirnaquest/.

Conclusion

piRNAs play a remarkable role in stem cell self-renewal and various vital processes of developmental biology. Although researchers are mining different features on piRNAs, the exact regulatory mechanism is still fuzzy. Thus, understanding the true potential of these small regulatory molecules with respect to their origin, localization and mode of biogenesis is crucial. piRNAQuest will provide us with a better insight on piRNA origin and function which will help to explore the true potential of these sncRNAs.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

REGNET: mining context-specific human transcription networks using composite genomic information

Genome-wide expression profiles reflect the transcriptional networks specific to the given cell context. However, most statistical models try to estimate the average connectivity of the networks from a collection of gene expression data, and are unable to characterize the context-specific transcriptional regulations. We propose an approach for mining context-specific transcription networks from a large collection of gene expression fold-change profiles and composite gene-set information.
Biswapriya Biswavas Misra's insight:

Abstract (provisional)Background

Genome-wide expression profiles reflect the transcriptional networks specific to the given cell context. However, most statistical models try to estimate the average connectivity of the networks from a collection of gene expression data, and are unable to characterize the context-specific transcriptional regulations. We propose an approach for mining context-specific transcription networks from a large collection of gene expression fold-change profiles and composite gene-set information.

Results

Using a composite gene-set analysis method, we combine the information of transcription factor binding sites, Gene Ontology or pathway gene sets and gene expression fold-change profiles for a variety of cell conditions. We then collected all the significant patterns and constructed a database of context-specific transcription networks for human (REGNET). As a result, context-specific roles of transcription factors as well as their functional targets are readily explored. To validate the approach, nine predicted targets of E2F1 in HeLa cells were tested using chromatin immunoprecipitation assay. Among them, five (Gadd45b, Dusp6, Mll5, Bmp2 and E2f3) were successfully bound by E2F1. c-JUN and the EMT transcription networks were also validated from literature.

Conclusions

REGNET is a useful tool for exploring the ternary relationships among the transcription factors, their functional targets and the corresponding cell conditions. It is able to provide useful clues for novel cell-specific transcriptional regulations. The REGNET database is available at http://mgrc.kribb.re.kr/regnet

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data

miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.

Results

We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline's superior performances, flexibility, and practical use in research and biomarker discovery.

Conclusions

CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

CoSBI – Identify Combinatorial Chromatin Modification Patterns across Genomic Loci

CoSBI – Identify Combinatorial Chromatin Modification Patterns across Genomic Loci | Databases & Softwares | Scoop.it
CoSBI :: DESCRIPTION CoSBI (Coherent and Shifted Bicluster Identification) is a scalable subspace clustering algorithm  to identify the complete set of combinatorial chromatin modification patterns across the entire (CoSBI – Identify Combinatorial...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision

SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:
Abstract

Many time-consuming analyses of next generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics due to their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying.

The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customised ad hoc secondary analyses and iterative machine learning algorithms. This paper demonstrates its scalability and overall very fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can to be tuned for the optimal performance on multiple worker nodes.

Availability and Implementation: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Proteomics DB

Proteomics DB | Databases & Softwares | Scoop.it
ProteomicsDB
Biswapriya Biswavas Misra's insight:
ProteomicsDB is a joint effort of the Technische Universität München (TUM) and SAP AG. It is dedicated to expedite the identification of the human proteome and its use across the scientific community.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

HeteroGenome: database of genome periodicity

HeteroGenome: database of genome periodicity | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome.

Database URL: http://www.jcbi.ru/lp_baze/

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BMC Genomics | Abstract | Predicting the fungal CUG codon translation with Bagheera

Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the "Candida", translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the ?Candida?, translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme.

Results

In order to improve the prediction of genes in yeast species by providing the correct CUG decoding scheme we implemented a web server, called Bagheera, that allows determining the most probable CUG codon translation for a given transcriptome or genome assembly based on extensive reference data. As reference data we use 2071 manually assembled and annotated sequences from 38 cytoskeletal and motor proteins belonging to 79 yeast species. The web service includes a pipeline, which starts with predicting and aligning homologous genes to the reference data. CUG codon positions within the predicted genes are analysed with respect to amino acid similarity and CUG codon conservation in related species. In addition, the tRNACAG gene is predicted in genomic data and compared to known leu-tRNACAG and ser-tRNACAG genes. Bagheera can also be used to evaluate any mRNA and protein sequence data with the codon usage of the respective species. The usage of the system has been demonstrated by analysing six genomes not included in the reference data.

Conclusions

Gene prediction and consecutive comparison with reference data from other Saccharomycetes are sufficient to predict the most probable decoding scheme for CUG codons. This approach has been implemented into Bagheera (http://www.motorprotein.de/bagheera).

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Parenclitic networks: uncovering new functions in biological data

Parenclitic networks: uncovering new functions in biological data | Databases & Softwares | Scoop.it

e introduce a novel method to represent time independent, scalar data sets as complex networks. We apply our method to investigate gene expression in the response to osmotic stress of Arabidopsis thaliana. In the proposed network representation, the most important genes for the plant response turn out to be the nodes with highest centrality in appropriately reconstructed networks. We also performed a target experiment, in which the predicted genes were artificially induced one by one, and the growth of the corresponding phenotypes compared to that of the wild-type. The joint application of the network reconstruction method and of the in vivo experiments allowed identifying 15 previously unknown key genes, and provided models of their mutual relationships. This novel representation extends the use of graph theory to data sets hitherto considered outside of the realm of its application, vastly simplifying the characterization of their underlying structure.

Biswapriya Biswavas Misra's insight:

Neurospora crassa has a long history as an excellent model for genetic, cellular, and biochemical research. Although this fungus is known as a saprotroph, it normally appears on burned vegetations or trees after forest fires. However, due to a lack of experimental evidence, the nature of its association with living plants remains enigmatic. Here we report that Scots pine (Pinus sylvestris) is a host plant for N. crassa. The endophytic lifestyle of N. crassa was found in its interaction with Scots pine. Moreover, the fungus can switch to a pathogenic state when its balanced interaction with the host is disrupted. Our data reveal previously unknown lifestyles of N. crassa, which are likely controlled by both environmental and host factors. Switching among the endophytic, pathogenic, and saprotrophic lifestyles confers upon fungi phenotypic plasticity in adapting to changing environments and drives the evolution of fungi and associated plants.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments | Databases & Softwares | Scoop.it
Zotero is a powerful, easy-to-use research tool that
helps you gather, organize, and analyze sources and then
share the results of your research.
Biswapriya Biswavas Misra's insight:

SUMMARY: MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental workflows, and data dependent, targeted and data independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models. AVAILABILITY: The code, the documentation, and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org, and used in a R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2013) and used via graphical user interface. CONTACT: ovitek@purdue.edu.

more...
No comment yet.