Databases & Softw...
Follow
Find
4.8K views | +2 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

jORCA 201207 - Easily integrating bioinformatics web services

jORCA 201207 - Easily integrating bioinformatics web services | Databases & Softwares | Scoop.it

:: DESCRIPTION

jORCA is a desktop client able to efficiently integrate different type of web-services repositories mapping metadata over a general definition to support scalable service discovery and to achieve flexible inter-communication between tools. jORCA manages repositories heterogeneity supported by the Modular-API that provides a uniform view of metadata (e.g. GRID-based, WSDL–services, BioMoby and others), making the integration of bioinformatics Web-Services easier.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Phenolyzer: phenotype-based prioritization of candidate genes for human diseases

Phenolyzer: phenotype-based prioritization of candidate genes for human diseases | Databases & Softwares | Scoop.it
Prior biological knowledge and phenotype information may help to identify disease genes from human whole-genome and whole-exome sequencing studies. We developed Phenolyzer (http://phenolyzer.usc.edu), a tool that uses prior information to implicate genes involved in diseases. Phenolyzer exhibits superior performance over competing methods for prioritizing Mendelian and complex disease genes, based on disease or phenotype terms entered as free text.
Biswapriya Biswavas Misra's insight:

Prior biological knowledge and phenotype information may help to identify disease genes from human whole-genome and whole-exome sequencing studies. We developed Phenolyzer (http://phenolyzer.usc.edu), a tool that uses prior information to implicate genes involved in diseases. Phenolyzer exhibits superior performance over competing methods for prioritizing Mendelian and complex disease genes, based on disease or phenotype terms entered as free text.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Plant roots and rhizosphere
Scoop.it!

CRISPR/Cas9-Mediated Genome Editing in Soybean Hairy Roots

CRISPR/Cas9-Mediated Genome Editing in Soybean Hairy Roots | Databases & Softwares | Scoop.it
As a new technology for gene editing, the CRISPR (clustered regularly interspaced short palindromic repeat)/Cas (CRISPR-associated) system has been rapidly and widely used for genome engineering in various organisms. In the present study, we successfully applied type II CRISPR/Cas9 system to generate and estimate genome editing in the desired target genes in soybean (Glycine max (L.) Merrill.). The single-guide RNA (sgRNA) and Cas9 cassettes were assembled on one vector to improve transformation efficiency, and we designed a sgRNA that targeted a transgene (bar) and six sgRNAs that targeted different sites of two endogenous soybean genes (GmFEI2 and GmSHR). The targeted DNA mutations were detected in soybean hairy roots. The results demonstrated that this customized CRISPR/Cas9 system shared the same efficiency for both endogenous and exogenous genes in soybean hairy roots. We also performed experiments to detect the potential of CRISPR/Cas9 system to simultaneously edit two endogenous soybean genes using only one customized sgRNA. Overall, generating and detecting the CRISPR/Cas9-mediated genome modifications in target genes of soybean hairy roots could rapidly assess the efficiency of each target loci. The target sites with higher efficiencies can be used for regular soybean transformation. Furthermore, this method provides a powerful tool for root-specific functional genomics studies in soybean.

Via Christophe Jacquet
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites

Background

Identifying the transcription start sites (TSS) of genes is essential for characterizing promoter regions. Several protocols have been developed to capture the 5′ end of transcripts via Cap Analysis of Gene Expression (CAGE) or linker-ligation strategies such as Paired-End Analysis of Transcription Start Sites (PEAT), but often require large amounts of tissue. More recently, nanoCAGE was developed for sequencing on the Illumina GAIIx to overcome these difficulties.
Results

Here we present the first publicly available adaptation of nanoCAGE for sequencing on recent ultra-high throughput platforms such as Illumina HiSeq-2000, and CapFilter, a computational pipeline that greatly increases confidence in TSS identification. We report excellent gene coverage, reproducibility, and precision in transcription start site discovery for samples from Arabidopsis thaliana roots.
Conclusion

nanoCAGE-XL together with CapFilter allows for genome wide identification of high confidence transcription start sites in large eukaryotic genomes.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Transcriptome analysis of the Holly mangrove Acanthus ilicifolius and its terrestrial relative, Acanthus leucostachyus, provides insights into adaptatio...

Acanthus is a unique genus consisting of both true mangrove and terrestrial species; thus, it represents an ideal system for studying the origin and adaptive evolution of mangrove plants to intertidal environments. However, little is known regarding the two respects of mangrove species in Acanthus. In this study, we sequenced the transcriptomes of the pooled roots and leaves tissues for a mangrove species, Acanthus ilicifolius, and its terrestrial congener, A. leucostachyus, to illustrate the origin of the mangrove species in this genus and their adaptive evolution to harsh habitats.
Biswapriya Biswavas Misra's insight:
Background

Acanthus is a unique genus consisting of both true mangrove and terrestrial species; thus, it represents an ideal system for studying the origin and adaptive evolution of mangrove plants to intertidal environments. However, little is known regarding the two respects of mangrove species in Acanthus. In this study, we sequenced the transcriptomes of the pooled roots and leaves tissues for a mangrove species, Acanthus ilicifolius, and its terrestrial congener, A. leucostachyus, to illustrate the origin of the mangrove species in this genus and their adaptive evolution to harsh habitats.

Results

We obtained 73,039 and 69,580 contigs with N50 values of 741 and 1557 bp for A. ilicifolius and A. leucostachyus, respectively. Phylogenetic analyses based on four nuclear segments and three chloroplast fragments revealed that mangroves and terrestrial species in Acanthus fell into different clades, indicating a single origin of the mangrove species in Acanthus. Based on 6634 orthologs, A. ilicifolius and A. leucostachyus were found to be highly divergent, with a peak of synonymous substitution rate (Ks) distribution of 0.145 and an estimated divergence time of approximately 16.8 million years ago (MYA). The transgression in the Early to Middle Miocene may be the major reason for the entry of the mangrove lineage of Acanthus into intertidal environments. Gene ontology (GO) classifications of the full transcriptomes did not show any apparent differences between A. ilicifolius and A. leucostachyus, suggesting the absence of gene components specific to the mangrove transcriptomes. A total of 99 genes in A. ilicifolius were identified with signals of positive selection. Twenty-three of the 99 positively selected genes (PSGs) were found to be involved in salt, heat and ultraviolet stress tolerance, seed germination and embryo development under periodic inundation. These stress-tolerance related PSGs may be crucial for the adaptation of the mangrove species in this genus to stressful marine environments and may contribute to speciation in Acanthus.

Conclusions

We characterized the transcriptomes of one mangrove species of Acanthus, A. ilicifolius, and its terrestrial relative, A. leucostachyus, and provided insights into the origin of the mangrove Acanthus species and their adaptive evolution to abiotic stresses in intertidal environments.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids

Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information.
Biswapriya Biswavas Misra's insight:
Background

Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information.

Results

The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening.

Conclusions

A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

miRegulome: a knowledge-base of miRNA regulomics and analysis : Scientific Reports

miRegulome: a knowledge-base of miRNA regulomics and analysis : Scientific Reports | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

miRNAs regulate post transcriptional gene expression by targeting multiple mRNAs and hence can modulate multiple signalling pathways, biological processes, and patho-physiologies. Therefore, understanding of miRNA regulatory networks is essential in order to modulate the functions of a miRNA. The focus of several existing databases is to provide information on specific aspects of miRNA regulation. However, an integrated resource on the miRNA regulome is currently not available to facilitate the exploration and understanding of miRNA regulomics. miRegulome attempts to bridge this gap. The current version of miRegulome v1.0 provides details on the entire regulatory modules of miRNAs altered in response to chemical treatments and transcription factors, based on validated data manually curated from published literature. Modules of miRegulome (upstream regulators, downstream targets, miRNA regulated pathways, functions, diseases, etc) are hyperlinked to an appropriate external resource and are displayed visually to provide a comprehensive understanding. Four analysis tools are incorporated to identify relationships among different modules based on user specified datasets. miRegulome and its tools are helpful in understanding the biology of miRNAs and will also facilitate the discovery of biomarkers and therapeutics. With added features in upcoming releases, miRegulome will be an essential resource to the scientific community. Availability: http://bnet.egr.vcu.edu/miRegulome.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping

We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements.
Biswapriya Biswavas Misra's insight:
AbstractBackground

We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements.

Results

Multiplex sequencing of transcript 3′ ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts.

Conclusions

This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Stochastic simulation of Boolean rxncon models: towards quantitative analysis of large signaling networks

Background

Cellular decision-making is governed by molecular networks that are highly complex. An integrative understanding of these networks on a genome wide level is essential to understand cellular health and disease. In most cases however, such an understanding is beyond human comprehension and requires computational modeling. Mathematical modeling of biological networks at the level of biochemical details has hitherto relied on state transition models. These are typically based on enumeration of all relevant model states, and hence become very complex unless severely – and often arbitrarily – reduced. Furthermore, the parameters required for genome wide networks will remain underdetermined for the conceivable future. Alternatively, networks can be simulated by Boolean models, although these typically sacrifice molecular detail as well as distinction between different levels or modes of activity. However, the modeling community still lacks methods that can simulate genome scale networks on the level of biochemical reaction detail in a quantitative or semi quantitative manner.
Results

Here, we present a probabilistic bipartite Boolean modeling method that addresses these issues. The method is based on the reaction-contingency formalism, and enables fast simulation of large networks. We demonstrate its scalability by applying it to the yeast mitogen-activated protein kinase (MAPK) network consisting of 140 proteins and 608 nodes.
Conclusion

The probabilistic Boolean model can be generated and parameterized automatically from a rxncon network description, using only two global parameters, and its qualitative behavior is robust against order of magnitude variation in these parameters. Our method can hence be used to simulate the outcome of large signal transduction network reconstruction, with little or no overhead in model creation or parameterization.
Biswapriya Biswavas Misra's insight:
Background

Cellular decision-making is governed by molecular networks that are highly complex. An integrative understanding of these networks on a genome wide level is essential to understand cellular health and disease. In most cases however, such an understanding is beyond human comprehension and requires computational modeling. Mathematical modeling of biological networks at the level of biochemical details has hitherto relied on state transition models. These are typically based on enumeration of all relevant model states, and hence become very complex unless severely – and often arbitrarily – reduced. Furthermore, the parameters required for genome wide networks will remain underdetermined for the conceivable future. Alternatively, networks can be simulated by Boolean models, although these typically sacrifice molecular detail as well as distinction between different levels or modes of activity. However, the modeling community still lacks methods that can simulate genome scale networks on the level of biochemical reaction detail in a quantitative or semi quantitative manner.

Results

Here, we present a probabilistic bipartite Boolean modeling method that addresses these issues. The method is based on the reaction-contingency formalism, and enables fast simulation of large networks. We demonstrate its scalability by applying it to the yeast mitogen-activated protein kinase (MAPK) network consisting of 140 proteins and 608 nodes.

Conclusion

The probabilistic Boolean model can be generated and parameterized automatically from a rxncon network description, using only two global parameters, and its qualitative behavior is robust against order of magnitude variation in these parameters. Our method can hence be used to simulate the outcome of large signal transduction network reconstruction, with little or no overhead in model creation or parameterization.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites

IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites | Databases & Softwares | Scoop.it
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules.
Biswapriya Biswavas Misra's insight:

In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MetazSecKB: the human and animal secretome and subcellular proteome knowledgebase

MetazSecKB: the human and animal secretome and subcellular proteome knowledgebase | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

The subcellular location of a protein is a key factor in determining the molecular function of the protein in an organism. MetazSecKB is a secretome and subcellular proteome knowledgebase specifically designed for metazoan, i.e. human and animals. The protein sequence data, consisting of over 4 million entries with 121 species having a complete proteome, were retrieved from UniProtKB. Protein subcellular locations including secreted and 15 other subcellular locations were assigned based on either curated experimental evidence or prediction using seven computational tools. The protein or subcellular proteome data can be searched and downloaded using several different types of identifiers, gene name or keyword(s), and species. BLAST search and community annotation of subcellular locations are also supported. Our primary analysis revealed that the proteome sizes, secretome sizes and other subcellular proteome sizes vary tremendously in different animal species. The proportions of secretomes vary from 3 to 22% (average 8%) in metazoa species. The proportions of other major subcellular proteomes ranged approximately 21–43% (average 31%) in cytoplasm, 20–37% (average 30%) in nucleus, 3–19% (average 12%) as plasma membrane proteins and 3–9% (average 6%) in mitochondria. We also compared the protein families in secretomes of different primates. The Gene Ontology and protein family domain analysis of human secreted proteins revealed that these proteins play important roles in regulation of human structure development, signal transduction, immune systems and many other biological processes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes

Detection of genes evolving under positive Darwinian evolution in genome-scale data is nowadays a prevailing strategy in comparative genomics studies to identify genes potentially involved in adaptation processes. Despite the large number of studies aiming to detect and contextualize such gene sets, there is virtually no software available to perform this task in a general, automatic, large-scale and reliable manner. This certainly occurs due to the computational challenges involved in this task, such as the appropriate modeling of data under analysis, the computation time to perform several of the required steps when dealing with genome-scale data and the highly error-prone nature of the sequence and alignment data structures needed for genome-wide positive selection detection.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Detection of genes evolving under positive Darwinian evolution in genome-scale data is nowadays a prevailing strategy in comparative genomics studies to identify genes potentially involved in adaptation processes. Despite the large number of studies aiming to detect and contextualize such gene sets, there is virtually no software available to perform this task in a general, automatic, large-scale and reliable manner. This certainly occurs due to the computational challenges involved in this task, such as the appropriate modeling of data under analysis, the computation time to perform several of the required steps when dealing with genome-scale data and the highly error-prone nature of the sequence and alignment data structures needed for genome-wide positive selection detection.

Results

We present POTION, an open source, modular and end-to-end software for genome-scale detection of positive Darwinian selection in groups of homologous coding sequences. Our software represents a key step towards genome-scale, automated detection of positive selection, from predicted coding sequences and their homology relationships to high-quality groups of positively selected genes. POTION reduces false positives through several sophisticated sequence and group filters based on numeric, phylogenetic, quality and conservation criteria to remove spurious data and through multiple hypothesis corrections, and considerably reduces computation time thanks to a parallelized design. Our software achieved a high classification performance when used to evaluate a curated dataset of Trypanosoma brucei paralogs previously surveyed for positive selection. When used to analyze predicted groups of homologous genes of 19 strains of Mycobacterium tuberculosis as a case study we demonstrated the filters implemented in POTION to remove sources of errors that commonly inflate errors in positive selection detection. A thorough literature review found no other software similar to POTION in terms of customization, scale and automation.

Conclusion

To the best of our knowledge, POTION is the first tool to allow users to construct and check hypotheses regarding the occurrence of site-based evidence of positive selection in non-curated, genome-scale data within a feasible time frame and with no human intervention after initial configuration. POTION is available at http://www.lmb.cnptia.embrapa.br/share/POTION/.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Analyzing allele specific RNA expression using mixture models

Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically.

Results

Here we propose a new strategy for AEI analysis using RNA-seq data. Under the null hypothesis of no AEI, a group of SNPs (possibly across multiple genes) is considered comparable if their respective total sums of the allelic reads are of similar magnitude. Within each group of “comparable” SNPs, we identify SNPs with AEI signal by fitting a mixture of folded Skellam distributions to the absolute values of read differences. By applying this methodology to RNA-Seq data from human autopsy brain tissues, we identified numerous instances of moderate to strong imbalanced allelic RNA expression at heterozygous SNPs. Findings with SLC1A3 mRNA exhibiting known expression differences are discussed as examples.

Conclusion

The folded Skellam mixture model searches for SNPs with significant difference between reference and variant allele reads (adjusted for different library sizes), using information from a group of “comparable” SNPs across multiple genes. This model is particularly suitable for performing AEI analysis on genes with few heterozygous SNPs available from RNA-seq, and it can fit over-dispersed read counts without specifying the direction of the correlation between reference and variant alleles.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

The Medicago sativa gene index 1.2: a web-accessible gene expression atlas for investigating expression differences between Medicago sativa subspecies

Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement.

Results

A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified.

Conclusions

The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/, a publicly available genomic resource for alfalfa improvement and legume research.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BEACON: automated tool for B acterial G E nome A nnotation C omparis ON

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs).
Biswapriya Biswavas Misra's insight:
AbstractBackground

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs).

Results

The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Agalma 0.6.0 - A Transcriptome Assembly and Phylogenetic Analysis Environment

Agalma 0.6.0 - A Transcriptome Assembly and Phylogenetic Analysis Environment | Databases & Softwares | Scoop.it
Agalma is an automated tool that constructs matrices for phylogenomic analyses. It builds alignments of homologous genes and preliminary species trees from genomic and transcriptome data.
Biswapriya Biswavas Misra's insight:

Agalma is an automated tool that constructs matrices for phylogenomic analyses.  It builds alignments of homologous genes and preliminary species trees from genomic and transcriptome data.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Evolutionary patchwork of an insecticidal toxin shared between plant-associated pseudomonads and the insect pathogens Photorhabdus and Xenorhabdus

Root-colonizing fluorescent pseudomonads are known for their excellent abilities to protect plants against soil-borne fungal pathogens. Some of these bacteria produce an insecticidal toxin (Fit) suggesting that they may exploit insect hosts as a secondary niche. However, the ecological relevance of insect toxicity and the mechanisms driving the evolution of toxin production remain puzzling.
Biswapriya Biswavas Misra's insight:
Background

Root-colonizing fluorescent pseudomonads are known for their excellent abilities to protect plants against soil-borne fungal pathogens. Some of these bacteria produce an insecticidal toxin (Fit) suggesting that they may exploit insect hosts as a secondary niche. However, the ecological relevance of insect toxicity and the mechanisms driving the evolution of toxin production remain puzzling.

Results

Screening a large collection of plant-associated pseudomonads for insecticidal activity and presence of the Fit toxin revealed that Fit is highly indicative of insecticidal activity and predicts that Pseudomonas protegens and P. chlororaphis are exclusive Fit producers. A comparative evolutionary analysis of Fit toxin-producing Pseudomonas including the insect-pathogenic bacteria Photorhabdus and Xenorhadus, which produce the Fit related Mcf toxin, showed that fit genes are part of a dynamic genomic region with substantial presence/absence polymorphism and local variation in GC base composition. The patchy distribution and phylogenetic incongruence of fit genes indicate that the Fit cluster evolved via horizontal transfer, followed by functional integration of vertically transmitted genes, generating a unique Pseudomonas-specific insect toxin cluster.

Conclusions

Our findings suggest that multiple independent evolutionary events led to formation of at least three versions of the Mcf/Fit toxin highlighting the dynamic nature of insect toxin evolution.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites

Identifying the transcription start sites (TSS) of genes is essential for characterizing promoter regions. Several protocols have been developed to capture the 5′ end of transcripts via Cap Analysis of Gene Expression (CAGE) or linker-ligation strategies such as Paired-End Analysis of Transcription Start Sites (PEAT), but often require large amounts of tissue. More recently, nanoCAGE was developed for sequencing on the Illumina GAIIx to overcome these difficulties.
Biswapriya Biswavas Misra's insight:
Background

Identifying the transcription start sites (TSS) of genes is essential for characterizing promoter regions. Several protocols have been developed to capture the 5′ end of transcripts via Cap Analysis of Gene Expression (CAGE) or linker-ligation strategies such as Paired-End Analysis of Transcription Start Sites (PEAT), but often require large amounts of tissue. More recently, nanoCAGE was developed for sequencing on the Illumina GAIIx to overcome these difficulties.

Results

Here we present the first publicly available adaptation of nanoCAGE for sequencing on recent ultra-high throughput platforms such as Illumina HiSeq-2000, and CapFilter, a computational pipeline that greatly increases confidence in TSS identification. We report excellent gene coverage, reproducibility, and precision in transcription start site discovery for samples from Arabidopsis thaliana roots.

Conclusion

nanoCAGE-XL together with CapFilter allows for genome wide identification of high confidence transcription start sites in large eukaryotic genomes.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Development of genome-wide insertion and deletion markers for maize, based on next-generation sequencing data

Insertions and deletions (indels) are the most abundant form of structural variation in all genomes. Indels have been increasingly recognized as an important source of molecular markers due to high-density occurrence, cost-effectiveness, and ease of genotyping. Coupled with developments in bioinformatics, next-generation sequencing (NGS) platforms enable the discovery of millions of indel polymorphisms by comparing the whole genome sequences of individuals within a species.
Biswapriya Biswavas Misra's insight:
Background

Insertions and deletions (indels) are the most abundant form of structural variation in all genomes. Indels have been increasingly recognized as an important source of molecular markers due to high-density occurrence, cost-effectiveness, and ease of genotyping. Coupled with developments in bioinformatics, next-generation sequencing (NGS) platforms enable the discovery of millions of indel polymorphisms by comparing the whole genome sequences of individuals within a species.

Results

A total of 1,973,746 unique indels were identified in 345 maize genomes, with an overall density of 958.79 indels/Mbp, and an average allele number of 2.76, ranging from 2 to 107. There were 264,214 indels with polymorphism information content (PIC) values greater than or equal to 0.5, accounting for 13.39 % of overall indels. Of these highly polymorphic indels, we designed primer pairs for 83,481 and 29,403 indels with major allele differences (i.e. the size difference between the most and second most frequent alleles) greater than or equal to 3 and 8 bp, respectively, based on the differing resolution capabilities of gel electrophoresis. The accuracy of our indel markers was experimentally validated, and among 100 indel markers, average accuracy was approximately 90 %. In addition, we also validated the polymorphism of the indel markers. Of 100 highly polymorphic indel markers, all had polymorphisms with average PIC values of 0.54.

Conclusions

The maize genome is rich in indel polymorphisms. Intriguingly, the level of polymorphism in genic regions of the maize genome was higher than that in intergenic regions. The polymorphic indel markers developed from this study may enhance the efficiency of genetic research and marker-assisted breeding in maize.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Changepoint detection in base-resolution methylome data reveals a robust signature of methylated domain landscape

Base-resolution methylome data generated by whole-genome bisulfite sequencing (WGBS) is often used to segment the genome into domains with distinct methylation levels. However, most segmentation methods include many parameters to be carefully tuned and/or fail to exploit the unsurpassed resolution of the data. Furthermore, there is no simple method that displays the composition of the domains to grasp global trends in each methylome.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Base-resolution methylome data generated by whole-genome bisulfite sequencing (WGBS) is often used to segment the genome into domains with distinct methylation levels. However, most segmentation methods include many parameters to be carefully tuned and/or fail to exploit the unsurpassed resolution of the data. Furthermore, there is no simple method that displays the composition of the domains to grasp global trends in each methylome.

Results

We propose to use changepoint detection for domain demarcation based on base-resolution methylome data. While the proposed method segments the methylome in a largely comparable manner to conventional approaches, it has only a single parameter to be tuned. Furthermore, it fully exploits the base-resolution of the data to enable simultaneous detection of methylation changes in even contrasting size ranges, such as focal hypermethylation and global hypomethylation in cancer methylomes. We also propose a simple plot termed methylated domain landscape (MDL) that globally displays the size, the methylation level and the number of the domains thus defined, thereby enabling one to intuitively grasp trends in each methylome. Since the pattern of MDL often reflects cell lineages and is largely unaffected by data size, it can serve as a novel signature of methylome.

Conclusions

Changepoint detection in base-resolution methylome data followed by MDL plotting provides a novel method for methylome characterization and will facilitate global comparison among various WGBS data differing in size and even species origin.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.
Biswapriya Biswavas Misra's insight:
AbstractBackground

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

 
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Cellular phenotype database: a repository for systems microscopy data

Cellular phenotype database: a repository for systems microscopy data | Databases & Softwares | Scoop.it
Biswapriya Biswavas Misra's insight:

Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field.

Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms.

Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Cellular phenotype database: a repository for systems microscopy data

Cellular phenotype database: a repository for systems microscopy data | Databases & Softwares | Scoop.it
Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field.

Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms.

Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym.
Biswapriya Biswavas Misra's insight:

Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field.

Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms.

Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data

High-throughput sequencing has revolutionized biology by enhancing our ability to perform genome-wide studies. However, due to lack of bioinformatics expertise, modern technologies are still beyond the capabilities of many laboratories. Herein, we present the BioWardrobe platform, which allows users to store, visualize and analyze epigenomics and transcriptomics data using a biologist-friendly web interface, without the need for programming expertise. Predefined pipelines allow users to download data, visualize results on a genome browser, calculate RPKMs (reads per kilobase per million) and identify peaks. Advanced capabilities include differential gene expression and binding analysis, and creation of average tag -density profiles and heatmaps. BioWardrobe can be found at http://biowardrobe.com.
Biswapriya Biswavas Misra's insight:

High-throughput sequencing has revolutionized biology by enhancing our ability to perform genome-wide studies. However, due to lack of bioinformatics expertise, modern technologies are still beyond the capabilities of many laboratories. Herein, we present the BioWardrobe platform, which allows users to store, visualize and analyze epigenomics and transcriptomics data using a biologist-friendly web interface, without the need for programming expertise. Predefined pipelines allow users to download data, visualize results on a genome browser, calculate RPKMs (reads per kilobase per million) and identify peaks. Advanced capabilities include differential gene expression and binding analysis, and creation of average tag -density profiles and heatmaps. BioWardrobe can be found at http://biowardrobe.com.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MAC: identifying and correcting annotation for multi-nucleotide variations

Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome.
Biswapriya Biswavas Misra's insight:
AbstractBackground

Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome.

Results

Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls.

Conclusions

MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Protein functional features are reflected in the patterns of mRNA translation speed

The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These “synonymous mRNAs” may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of “silent” single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins.
Biswapriya Biswavas Misra's insight:

AbstractBackground

The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These “synonymous mRNAs” may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of “silent” single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins.

Results

We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein’s important structural and functional features.

Conclusions

This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein’s functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.

more...
No comment yet.