Databases & Softw...
Follow
Find
3.9K views | +2 today
 
Scooped by Biswapriya Biswavas Misra
onto Databases & Softwares
Scoop.it!

Engineering of plants with improved properties as biofuels feedstocks by vessel-specific complementation of xylan biosynthesis mutants

Abstract (provisional)

Background

Cost-efficient generation of second-generation biofuels requires plant biomass that can easily be degraded into sugars and further fermented into fuels. However, lignocellulosic biomass is inherently recalcitrant toward deconstruction technologies due to the abundant lignin and cross-linked hemicelluloses. Furthermore, lignocellulosic biomass has a high content of pentoses, which are more difficult to ferment into fuels than hexoses. Engineered plants with decreased amounts of xylan in their secondary walls have the potential to render plant biomass a more desirable feedstock for biofuel production.

Results

Xylan is the major non-cellulosic polysaccharide in secondary cell walls, and the xylan deficient irregular xylem (irx) mutants irx7, irx8 and irx9 exhibit severe dwarf growth phenotypes. The main reason for the growth phenotype appears to be xylem vessel collapse and the resulting impaired transport of water and nutrients. We developed a xylan-engineering approach to reintroduce xylan biosynthesis specifically into the xylem vessels in the Arabidopsis irx7, irx8 and irx9 mutant backgrounds by driving the expression of the respective glycosyltransferases with the vessel-specific promoters of the VND6 and VND7 transcription factor genes. The growth phenotype, stem breaking strength, and irx morphology was recovered to varying degrees. Some of the plants even exhibited increased stem strength compared to the wild type. We obtained Arabidopsis plants with up to 23% reduction in xylose levels and 18% reduction in lignin content compared to wild-type plants, while exhibiting wild-type growth patterns and morphology, as well as normal xylem vessels. These plants showed a 42% increase in saccharification yield after hot water pretreatment. The VND7 promoter yielded a more complete complementation of the irx phenotype than the VND6 promoter.

Conclusions

Spatial and temporal deposition of xylan in the secondary cell wall of Arabidopsis can be manipulated by using the promoter regions of vessel-specific genes to express xylan biosynthetic genes. The expression of xylan specifically in the xylem vessels is sufficient to complement the irx phenotype of xylan deficient mutants, while maintaining low overall amounts of xylan and lignin in the cell wall. This engineering approach has the potential to yield bioenergy crop plants that are more easily deconstructed and fermented into biofuels.

more...
No comment yet.
Databases & Softwares
Genomic, Proteomic, Transcriptomic, Metabolomic Softwares and Databases
Your new post is loading...
Your new post is loading...
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Welcome to Tomato Genomic Resources Database: Home Page

Welcome to Tomato Genomic Resources Database: Home Page | Databases & Softwares | Scoop.it
Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research.
Biswapriya Biswavas Misra's insight:

Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Eggplant Genome DataBase

Eggplant Genome DataBase | Databases & Softwares | Scoop.it
Eggplant Genome Database at Kazusa DNA Research Institute.
Biswapriya Biswavas Misra's insight:
The eggplant (Solanum melongena L.) is one of the most important vegetable crop species in Japan as well as in other Asian, Middle and Near Eastarn, Mediterranean and African countries. Eggplant belongs to the Solanaceae family including tomato, potato and pepper, but unlike these allies, it is endemic to the Old World. To get this unique solanaceous member up on the stage of genomics and let it act as a crucial cast in molecular genetic and physiological studies, eggplant whole-genome sequencing has been done to construct a draft genome dataset.
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

Abstract (provisional)
Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.
Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.
Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.

Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.

Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

Fiona: a parallel and automatic strategy for read error correction

RT @druvus: Fiona: a parallel and automatic strategy for read error correction http://t.co/voU7wksk48

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

PRISE2: Software for designing sequence-selective PCR primers and probes

Background:
PRISE2 is a new software tool for designing sequence-selective PCR primers and probes.

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures

Sushi.R looks pretty sweet. http://t.co/aXHXaejzjr

Via Mel Melendrez-Vallard
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RPPanalyzer Toolbox: An improved R package for analysis of reverse phase protein array data

RPPanalyzer Toolbox: An improved R package for analysis of reverse phase protein array data | Databases & Softwares | Scoop.it
Analysis of large-scale proteomic data sets requires specialized software tools, tailored toward the requirements of individual approaches. Here we introduce an extension of an open-source software solution for analyzing reverse phase protein array (RPPA) data. The R package RPPanalyzer was designed for data preprocessing followed by basic statistical analyses and proteomic data visualization. In this update, we merged relevant data preprocessing steps into a single user-friendly function and included a new method for background noise correction as well as new methods for noise estimation and averaging of replicates to transform data in such a way that they can be used as input for a new time course plotting function. We demonstrate the robustness of our enhanced RPPanalyzer platform by analyzing longitudinal RPPA data of MET receptor signaling upon stimulation with different hepatocyte growth factor concentrations.
Biswapriya Biswavas Misra's insight:

Analysis of large-scale proteomic data sets requires specialized software tools, tailored toward the requirements of individual approaches. Here we introduce an extension of an open-source software solution for analyzing reverse phase protein array (RPPA) data. The R package RPPanalyzer was designed for data preprocessing followed by basic statistical analyses and proteomic data visualization. In this update, we merged relevant data preprocessing steps into a single user-friendly function and included a new method for background noise correction as well as new methods for noise estimation and averaging of replicates to transform data in such a way that they can be used as input for a new time course plotting function. We demonstrate the robustness of our enhanced RPPanalyzer platform by analyzing longitudinal RPPA data of MET receptor signaling upon stimulation with different hepatocyte growth factor concentrations.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

InvFEST, a database integrating information of polymorphic inversions in the human genome. - Abstract - Europe PubMed Central

Abstract: The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ExaBayes: Massively Parallel Bayesian Tree Inference for the Whole-Genome Era

ExaBayes: Massively Parallel Bayesian Tree Inference for the Whole-Genome Era | Databases & Softwares | Scoop.it
RT @RamiroHojas: Nice, a software to run Bayesian phylogenetic analyses for genomic datasets! http://t.co/IhAgHZn5pc
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. - ncbi.nlm.nih.gov

PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. - ncbi.nlm.nih.gov | Databases & Softwares | Scoop.it
PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. (PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from ...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes.

Abstract: ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) is a new bioinformatic tool that was created to detect existing and putative new antibiotic...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

RAMONA: a web application for gene set analysis on multilevel omics data

RAMONA: a web application for gene set analysis on multilevel omics data | Databases & Softwares | Scoop.it
Summary: Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds.
Biswapriya Biswavas Misra's insight:

Summary: Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties

Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties | Databases & Softwares | Scoop.it
PLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific studies from all disciplines freely available to the whole world.
Biswapriya Biswavas Misra's insight:

Abstract

 

Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smirnov test rejects the power law as a plausible model for degree distribution. For the major MeSH network the average path length was 2.63 edges with a diameter of seven edges and clustering coefficient of 0.15. The Kolmogorov-Smirnov test failed to reject the power law as a plausible model. The power-law exponent was 5.07. In both networks it was evident that nodes with a lower degree exhibit higher clustering than those with a higher degree. After simulated attack, where we removed 10% of nodes with the highest degrees, the giant component of each of the two networks contains about 90% of all nodes. Because of small average path length and high degree of clustering the MeSH network is small-world. A power-law distribution is not a plausible model for the degree distribution. The network is highly modular, highly resistant to targeted and random attack and with minimal dissortativity.

 

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Multicriteria global optimization for biocircuit design

Abstract
Background

One of the challenges in Synthetic Biology is to design circuits with increasing levels of complexity. While circuits in Biology are complex and subject to natural tradeoffs, most synthetic circuits are simple in terms of the number of regulatory regions, and have been designed to meet a single design criterion.
Results

In this contribution we introduce a multiobjective formulation for the design of biocircuits. We set up the basis for an advanced optimization tool for the modular and systematic design of biocircuits capable of handling high levels of complexity and multiple design criteria. Our methodology combines the efficiency of global Mixed Integer Nonlinear Programming solvers with multiobjective optimization techniques. Through a number of examples we show the capability of the method to generate non intuitive designs with a desired functionality setting up a priori the desired level of complexity.
Conclusions

The methodology presented here can be used for biocircuit design and also to explore and identify different design principles for synthetic gene circuits. The presence of more than one competing objective provides a realistic design setting where every solution represents an optimal trade-off between different criteria.
Biswapriya Biswavas Misra's insight:
AbstractBackground

One of the challenges in Synthetic Biology is to design circuits with increasing levels of complexity. While circuits in Biology are complex and subject to natural tradeoffs, most synthetic circuits are simple in terms of the number of regulatory regions, and have been designed to meet a single design criterion.

Results

In this contribution we introduce a multiobjective formulation for the design of biocircuits. We set up the basis for an advanced optimization tool for the modular and systematic design of biocircuits capable of handling high levels of complexity and multiple design criteria. Our methodology combines the efficiency of global Mixed Integer Nonlinear Programming solvers with multiobjective optimization techniques. Through a number of examples we show the capability of the method to generate non intuitive designs with a desired functionality setting up a priori the desired level of complexity.

Conclusions

The methodology presented here can be used for biocircuit design and also to explore and identify different design principles for synthetic gene circuits. The presence of more than one competing objective provides a realistic design setting where every solution represents an optimal trade-off between different criteria.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GOLD | Home

GOLD | Home | Databases & Softwares | Scoop.it
GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.
Biswapriya Biswavas Misra's insight:

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations

MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations | Databases & Softwares | Scoop.it
We have optimized and extended the widely used annotation-engine MAKER to in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, ncRNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software toolkit, MAKER-P, using the A. thaliana and Z. mays genomes. Here we demonstrate the ability of the MAKER-P toolkit to automatically update, extend, and revise the A. thaliana annotations in light of newly available data; and to annotate pseudogenes and ncRNAs absent from the TAIR10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even A. thaliana, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center (TACC). We show that this public resource can de novo annotate the entire Arabidopsis and Zea mays genomes in less than three hours, and produce annotations of comparable quality to those of the current TAIR10 and Z. mays V2 annotation builds.
Biswapriya Biswavas Misra's insight:

We have optimized and extended the widely used annotation-engine MAKER to in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, ncRNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software toolkit, MAKER-P, using the A. thaliana and Z. mays genomes. Here we demonstrate the ability of the MAKER-P toolkit to automatically update, extend, and revise the A. thaliana annotations in light of newly available data; and to annotate pseudogenes and ncRNAs absent from the TAIR10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even A. thaliana, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center (TACC). We show that this public resource can de novo annotate the entire Arabidopsis and Zea mays genomes in less than three hours, and produce annotations of comparable quality to those of the current TAIR10 and Z. mays V2 annotation builds.

more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme | Databases & Softwares | Scoop.it
High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data.

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision http://t.co/1UfXL3r4Ud

Via Mel Melendrez-Vallard
more...
No comment yet.
Rescooped by Biswapriya Biswavas Misra from Bioinformatics Software: Sequence Analysis
Scoop.it!

[1409.7208] MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly
http://t.co/hZE5V294CR (ht @homolog_us)

Via Mel Melendrez-Vallard
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

TRIPATH: A Biological Genetic and Genomic Database of Three Economically Important Fungal Pathogen of Wheat - Rust: Smut: Bunt.

Biswapriya Biswavas Misra's insight:

Wheat, the major source of vegetable protein in human diet, provides staple food globally for a large proportion of the human population. With higher protein content than other major cereals, wheat has great socio- economic importance. Nonetheless for wheat, three important fungal pathogens i.e. rust, smut and bunt are major cause of significant yield losses throughout the world. Researchers are putting up a strong fight against devastating wheat pathogens, and have made progress in tracking and controlling disease outbreaks from East Africa to South Asia. The aim of the present work hence was to develop a fungal pathogens database dedicated to wheat, gathering information about different pathogen species and linking them to their biological classification, distribution and control. Towards this end, we developed an open access database Tripath: A biological, genetic and genomic database of economically important wheat fungal pathogens - rust: smut: bunt. Data collected from peer-reviewed publications and fungal pathogens were added to the customizable database through an extended relational design. The strength of this resource is in providing rapid retrieval of information from large volumes of text at a high degree of accuracy. Database TRIPATH is freely accessible.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

cddApp 1.1 – Integration between Cytoscape and the NCBI Conserved Domain Database

cddApp 1.1 – Integration between Cytoscape and the NCBI Conserved Domain Database | Databases & Softwares | Scoop.it
cddApp 1.1 :: DESCRIPTION cddApp is a Cytoscape3 extension that supports the annotation of protein networks with information about domains and specific functional sites (features) from the National Center for Biotech (cddApp 1.1 – Integration...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

BambooGDB: a bamboo genome database with functional annotation and an analysis platform.

Abstract: Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

Tomato genomic resources database: an integrated repository of useful tomato genomic... - Abstract - Europe PubMed Central

Abstract: Tomato Genomic Resources Database (TGRD) allows interactive browsing of tomato genes, micro RNAs, simple sequence repeats (SSRs), important...
more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

XSAnno: a framework for building ortholog models in cross-species transcriptome comparisons

XSAnno: a framework for building ortholog models in cross-species transcriptome comparisons | Databases & Softwares | Scoop.it
Abstract
Background

The accurate characterization of RNA transcripts and expression levels across species is critical for understanding transcriptome evolution. As available RNA-seq data accumulate rapidly, there is a great demand for tools that build gene annotations for cross-species RNA-seq analysis. However, prevailing methods of ortholog annotation for RNA-seq analysis between closely-related species do not take inter-species variation in mappability into consideration.
Results

Here we present XSAnno, a computational framework that integrates previous approaches with multiple filters to improve the accuracy of inter-species transcriptome comparisons. The implementation of this approach in comparing RNA-seq data of human, chimpanzee, and rhesus macaque brain transcriptomes has reduced the false discovery of differentially expressed genes, while maintaining a low false negative rate.
Conclusion

The present study demonstrates the utility of the XSAnno pipeline in building ortholog annotations and improving the accuracy of cross-species transcriptome comparisons.
Biswapriya Biswavas Misra's insight:
AbstractBackground

The accurate characterization of RNA transcripts and expression levels across species is critical for understanding transcriptome evolution. As available RNA-seq data accumulate rapidly, there is a great demand for tools that build gene annotations for cross-species RNA-seq analysis. However, prevailing methods of ortholog annotation for RNA-seq analysis between closely-related species do not take inter-species variation in mappability into consideration.

Results

Here we present XSAnno, a computational framework that integrates previous approaches with multiple filters to improve the accuracy of inter-species transcriptome comparisons. The implementation of this approach in comparing RNA-seq data of human, chimpanzee, and rhesus macaque brain transcriptomes has reduced the false discovery of differentially expressed genes, while maintaining a low false negative rate.

Conclusion

The present study demonstrates the utility of the XSAnno pipeline in building ortholog annotations and improving the accuracy of cross-species transcriptome comparisons.

more...
No comment yet.
Scooped by Biswapriya Biswavas Misra
Scoop.it!

GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions.
Biswapriya Biswavas Misra's insight:
Abstract (provisional)Background

Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions.

Results

In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs.

Conclusion

GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. GACT software: www.uvm.edu/genomics/software/gact

more...
No comment yet.