The science toolbox
12.5K views | +4 today
Follow
 
Rescooped by Niklaus Grunwald from Plant Pathogenomics
onto The science toolbox
Scoop.it!

bioRxiv: Crowdsourced analysis of ash and ash dieback through the Open Ash Dieback project: A year 1 report on datasets and analyses contributed by a self-organising community (2014)

bioRxiv: Crowdsourced analysis of ash and ash dieback through the Open Ash Dieback project: A year 1 report on datasets and analyses contributed by a self-organising community (2014) | The science toolbox | Scoop.it

Ash dieback is a fungal disease of ash trees caused by Hymenoscyphus pseudoalbidus that has swept across Europe in the last two decades and is a significant threat to the ash population. This emergent pathogen has been relatively poorly studied and little is known about its genetic make-up. In response to the arrival of this dangerous pathogen in the UK we took the unusual step of providing an open access database and initial sequence datasets to the scientific community for analysis prior to performing an analysis of our own. Our goal was to crowdsource genomic and other analyses and create a community analysing this pathogen. In this report on the evolution of the community and data and analysis obtained in the first year of this activity, we describe the nature and the volume of the contributions and reveal some preliminary insights into the genome and biology of H. pseudoalbidus that emerged. In particular our nascent community generated a first-pass genome assembly containing abundant collapsed AT-rich repeats indicating a typically complex genome structure. Our open science and crowdsourcing effort has brought a wealth of new knowledge about this emergent pathogen within a short time-frame. Our community endeavour highlights the positive impact that open, collaborative approaches can have on fast, responsive modern science.


Via Kamoun Lab @ TSL
Niklaus Grunwald's insight:

An example of crowdsourcing genomics ...

more...
No comment yet.
The science toolbox
Publishing, open source tools, bioinformatics, biology.
Your new post is loading...
Your new post is loading...
Scooped by Niklaus Grunwald
Scoop.it!

How to avoid the stigma of a retracted paper? Don't call it a retraction

How to avoid the stigma of a retracted paper? Don't call it a retraction | The science toolbox | Scoop.it
New terms would make it easier for researchers to correct the literature after an honest mistake
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

asymptoticMK: A Web-Based Tool for the Asymptotic McDonald–Kreitman Test

asymptoticMK: A Web-Based Tool for the Asymptotic McDonald–Kreitman Test | The science toolbox | Scoop.it
The McDonald–Kreitman (MK) test is a widely used method for quantifying the role of positive selection in molecular evolution. One key shortcoming of this test lies in its sensitivity to the presence of slightly deleterious mutations, which can severely bias its estimates. An asymptotic version of the MK test was recently introduced that addresses this problem by evaluating polymorphism levels for different mutation frequencies separately, and then extrapolating a function fitted to that data. Here, we present asymptoticMK, a web-based implementation of this asymptotic MK test. Our web service provides a simple R-based interface into which the user can upload the required data (polymorphism and divergence data for the genomic test region and a neutrally evolving reference region). The web service then analyzes the data and provides plots of the test results. This service is free to use, open-source, and available at <http://benhaller.com/messerlab/asymptoticMK.html>. We provide results from simulations to illustrate the performance and robustness of the asymptoticMK test under a wide range of model parameters.

Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Science Needs a Solution for the Temptation of Positive Results

Science Needs a Solution for the Temptation of Positive Results | The science toolbox | Scoop.it

A few years back, scientists at the biotechnology company Amgen set out to replicate 53 landmark studies that argued for new approaches to treat cancers using both existing and new molecules. They were able to replicate the findings of the original research only 11 percent of the time. Science has a reproducibility problem. And the ramifications are widespread.

more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Best practices for population genetic analyses | Phytopathology

Best practices for population genetic analyses | Phytopathology | The science toolbox | Scoop.it
Population genetic analysis is a powerful tool to understand how pathogens emerge and adapt. However, determining the genetic structure of populations requires complex knowledge on a range of subtle skills that are often not explicitly stated in book chapters or review articles on population genetics. What is a good sampling strategy? How many isolates should I sample? How do I include positive and negative controls in my molecular assays? What marker system should I use? This review will attempt to address many of these practical questions that are often not readily answered from reading books or reviews on the topic, but emerge from discussions with colleagues and from practical experience. A further complication for microbial or pathogen populations is the frequent observation of clonality or partial clonality. Clonality invariably makes analyses of population data difficult because many assumptions underlying the theory from which analysis methods were derived are often violated. This review provides practical guidance on how to navigate through the complex web of data analyses of pathogens that may violate typical population genetics assumptions. We also provide resources and examples for analysis in the R programming environment.
more...
Elizabeth A Bowman's curator insight, May 27, 5:10 PM
Share your insight
Scooped by Niklaus Grunwald
Scoop.it!

Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits | Molecular Biology and Evolution | Oxford Academic

Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

The K=2 conundrum

Assessments of population genetic structure have become an increasing focus as they can provide valuable insight into patterns of migration and gene flow. STRUCTURE, the most highly cited of several clustering-based methods, was developed to provide robust estimates without the need for populations to be determined a priori. STRUCTURE introduces the problem of selecting the optimal number of clusters and as a result the ΔK method was proposed to assist in the identification of the ‘true’ number of clusters. In our review of 1,264 studies using STRUCTURE to explore population subdivision, studies that used ΔK were more likely to identify K=2 (54%, 443/822) than studies that did not use ΔK (21%, 82/386). A troubling finding was that very few studies performed the hierarchical analysis recommended by the authors of both ΔK and STRUCTURE to fully explore population subdivision. Furthermore, extensions of earlier simulations indicate that, with a representative number of markers, ΔK frequently identifies K=2 as the top level of hierarchical structure, even when more subpopulations are present. This review suggests that many studies may have been over- or underestimating population genetic structure; both scenarios have serious consequences, particularly with respect to conservation and management. We recommend publication standards for population structure results so that readers can assess the implications of the results given their own understanding of the species biology.


Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Judging science: Assessing the importance of scientific work | The Economist

Judging science: Assessing the importance of scientific work | The Economist | The science toolbox | Scoop.it
ONE role academic journals have come to play that was not, as it were, part of their original job-description of disseminating scientific results (see article), is as indicators of a researcher’s prowess, and thus determinants of academic careers.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks

Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks | The science toolbox | Scoop.it
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These IBD blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulae for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite likelihood approach to fit these formulae. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the POPRES data set. We show that ancestry diffusing with a rate of σ ≈ 50–100 km/√gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance.

Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Metacoder: An R package for visualization and manipulation of community taxonomic diversity data

Metacoder: An R package for visualization and manipulation of community taxonomic diversity data | The science toolbox | Scoop.it
Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Multispecies coalescent delimits structure, not species

Multispecies coalescent delimits structure, not species | The science toolbox | Scoop.it

The multispecies coalescent model underlies many approaches used for species delimitation. In previous work assessing the performance of species delimitation under this model, speciation was treated as an instantaneous event rather than as an extended process involving distinct phases of speciation initiation (structuring) and completion. Here, we use data under simulations that explicitly model speciation as an extended process rather than an instantaneous event and carry out species delimitation inference on these data under the multispecies coalescent. We show that the multispecies coalescent diagnoses genetic structure, not species, and that it does not statistically distinguish structure associated with population isolation vs. species boundaries. Because of the misidentification of population structure as putative species, our work raises questions about the practice of genome-based species discovery, with cascading consequences in other fields. Specifically, all fields that rely on species as units of analysis, from conservation biology to studies of macroevolutionary dynamics, will be impacted by inflated estimates of the number of species, especially as genomic resources provide unprecedented power for detecting increasingly finer-scaled genetic structure under the multispecies coalescent. As such, our work also represents a general call for systematic study to reconsider a reliance on genomic data alone. Until new methods are developed that can discriminate between structure due to population-level processes and that due to species boundaries, genomic-based results should only be considered a hypothesis that requires validation of delimited species with multiple data types, such as phenotypic and ecological information.


Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Positive biodiversity-productivity relationship predominant in global forests

Positive biodiversity-productivity relationship predominant in global forests | The science toolbox | Scoop.it
### INTRODUCTION

The biodiversity-productivity relationship (BPR; the effect of biodiversity on ecosystem productivity) is foundational to our understanding of the global extinction crisis and its impacts on the functioning of natural ecosystems. The BPR has been a prominent research topic within ecology in recent decades, but it is only recently that we have begun to develop a global perspective.

### RATIONALE

Forests are the most important global repositories of terrestrial biodiversity, but deforestation, forest degradation, climate change, and other factors are threatening approximately one half of tree species worldwide. Although there have been substantial efforts to strengthen the preservation and sustainable use of forest biodiversity throughout the globe, the consequences of this diversity loss pose a major uncertainty for ongoing international forest management and conservation efforts. The forest BPR represents a critical missing link for accurate valuation of global biodiversity and successful integration of biological conservation and socioeconomic development. Until now, there have been limited tree-based diversity experiments, and the forest BPR has only been explored within regional-scale observational studies. Thus, the strength and spatial variability of this relationship remains unexplored at a global scale.

### RESULTS

We explored the effect of tree species richness on tree volume productivity at the global scale using repeated forest inventories from 777,126 permanent sample plots in 44 countries containing more than 30 million trees from 8737 species spanning most of the global terrestrial biomes. Our findings reveal a consistent positive concave-down effect of biodiversity on forest productivity across the world, showing that a continued biodiversity loss would result in an accelerating decline in forest productivity worldwide.

The BPR shows considerable geospatial variation across the world. The same percentage of biodiversity loss would lead to a greater relative (that is, percentage) productivity decline in the boreal forests of North America, Northeastern Europe, Central Siberia, East Asia, and scattered regions of South-central Africa and South-central Asia. In the Amazon, West and Southeastern Africa, Southern China, Myanmar, Nepal, and the Malay Archipelago, however, the same percentage of biodiversity loss would lead to greater absolute productivity decline.

### CONCLUSION

Our findings highlight the negative effect of biodiversity loss on forest productivity and the potential benefits from the transition of monocultures to mixed-species stands in forestry practices. The BPR we discover across forest ecosystems worldwide corresponds well with recent theoretical advances, as well as with experimental and observational studies on forest and nonforest ecosystems. On the basis of this relationship, the ongoing species loss in forest ecosystems worldwide could substantially reduce forest productivity and thereby forest carbon absorption rate to compromise the global forest carbon sink. We further estimate that the economic value of biodiversity in maintaining commercial forest productivity alone is $166 billion to $490 billion per year. Although representing only a small percentage of the total value of biodiversity, this value is two to six times as much as it would cost to effectively implement conservation globally. These results highlight the necessity to reassess biodiversity valuation and the potential benefits of integrating and promoting biological conservation in forest resource management and forestry practices worldwide.

![Figure][1]

Global effect of tree species diversity on forest productivity.
Ground-sourced data from 777,126 global forest biodiversity permanent sample plots (dark blue dots, left), which cover a substantial portion of the global forest extent (white), reveal a consistent positive and concave-down biodiversity-productivity relationship across forests worldwide (red line with pink bands representing 95% confidence interval, right).



The biodiversity-productivity relationship (BPR) is foundational to our understanding of the global extinction crisis and its impacts on ecosystem functioning. Understanding BPR is critical for the accurate valuation and effective conservation of biodiversity. Using ground-sourced data from 777,126 permanent plots, spanning 44 countries and most terrestrial biomes, we reveal a globally consistent positive concave-down BPR, showing that continued biodiversity loss would result in an accelerating decline in forest productivity worldwide. The value of biodiversity in maintaining commercial forest productivity alone—US$166 billion to 490 billion per year according to our estimation—is more than twice what it would cost to implement effective global conservation. This highlights the need for a worldwide reassessment of biodiversity values, forest management strategies, and conservation priorities.

[1]: pending:yes
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Science’s 1%: How income inequality is getting worse in research

Science’s 1%: How income inequality is getting worse in research | The science toolbox | Scoop.it
Wages for top scientists are shooting skywards while others are being left behind.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Agricultural R&D is on the move

Agricultural R&D is on the move | The science toolbox | Scoop.it
Big shifts in where research and development in food and agriculture is carried out will shape future global food production, write Philip G.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species

GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species | The science toolbox | Scoop.it
As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration.

Via Pierre Gladieux
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Ten simple rules to consider regarding preprint submission

For the purposes of these rules, a preprint is defined as a complete written description of a body of scientific work that has yet to be published in a journal. Typically, a preprint is a research article, editorial, review, etc. that is ready to be submitted to a journal for peer review or is under review. It could also be a commentary, a report of negative results, a large data set and its description, and more. Finally, it could also be a paper that has been peer reviewed and either is awaiting formal publication by a journal or was rejected, but the authors are willing to make the content public. In short, a preprint is a research output that has not completed a typical publication pipeline but is of value to the community and deserving of being easily discovered and accessed. We also note that the term preprint is an anomaly, since there may not be a print version at all. The rules that follow relate to all these preprint types unless otherwise noted. In 1991, physics (and later, other disciplines, including mathematics, computer science, and quantitative biology) began a tradition of making preprints available through arXiv [1]. arXiv currently contains well over 1 million preprints. While late to the game [2], the availability of preprints in biomedicine has gained significant community attention recently [3,4] and led to the formation of a scientist-driven effort, ASAPbio [5], to promote their use. As a result of an ASAPbio meeting held in February of 2016, a paper was published [6] that describes the pros and cons of preprints from the perspective of the stakeholders—scientists, publishers, and funders. Here, we formulate the message specifically for scientists in the form of ten simple rules for considering using preprints as a communication mechanism.

Via Pierre Gladieux
more...
No comment yet.
Rescooped by Niklaus Grunwald from Plant Microbe Interactions, and some more...
Scoop.it!

A Phylogenetic Method To Perform Genome-Wide Association Studies In Microbes That Accounts For Population Structure And Recombination

A Phylogenetic Method To Perform Genome-Wide Association Studies In Microbes That Accounts For Population Structure And Recombination | The science toolbox | Scoop.it
Genome-Wide Association Studies (GWAS) in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications to Neisseria meningitidis illustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available at https://github.com/caitiecollins/treeWAS.

Via Ryohei Thomas Nakano
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models | Molecular Biology and Evolution | Oxford Academic

Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models | Molecular Biology and Evolution | Oxford Academic | The science toolbox | Scoop.it
We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method’s accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

BuddySuite: Command-Line Toolkits for Manipulating Sequences, Alignments, and Phylogenetic Trees | Molecular Biology and Evolution | Oxford Academic

The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Open science: The findings of medical research are disseminated too slowly | The Economist

Open science: The findings of medical research are disseminated too slowly | The Economist | The science toolbox | Scoop.it
ON JANUARY 1st the Bill & Melinda Gates Foundation did something that may help to change the practice of science.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Medical research: The shackles of scientific journals | The Economist

Medical research: The shackles of scientific journals | The Economist | The science toolbox | Scoop.it
SCIENCE advances fastest when data and conclusions are shared as quickly as possible. Yet it is common practice for medical researchers to hoard results for months or years until research is published in an academic journal. Even then, the data underpinning a study are often not made public.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Normalization and microbial differential abundance strategies depend upon data characteristics

Normalization and microbial differential abundance strategies depend upon data characteristics | The science toolbox | Scoop.it
Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses. Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate. These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Consensus statement: Virus taxonomy in the age of metagenomics : Nature Reviews Microbiology : Nature Research

Consensus statement: Virus taxonomy in the age of metagenomics : Nature Reviews Microbiology : Nature Research | The science toolbox | Scoop.it
Empty description
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Towards an integrated ecosystem of R packages for the analysis of population genetic data

Population genetics is facing a number of challenges and opportunities from thenext-generation sequencing revolution, both technical and scientific. On the technicalside, how to efficiently deal with the massive amounts of data generated bynext-generation sequencing will remain an ongoing problem. This includes the questionof how data generated by different technologies are best analysed in combination. On thescientific side, the different drivers of genomic selection in different scenarios ofpopulation evolution have traditionally been investigated separately, using differentgenetic markers. Today’s possibilites for genetic data collection offer the opportunity toassess how different portions of a genome are linked and evolve under different selectivepressures in different environments. It will be particularly exciting to see how R and itspackages will evolve to meet these challenges in the years to come

Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Genome sequences of six Phytophthora species threatening forest ecosystems

Genome sequences of six Phytophthora species threatening forest ecosystems | The science toolbox | Scoop.it
The Phytophthora genus comprises of some of the most destructive plant pathogens and attack a wide range of hosts including economically valuable tree species, both angiosperm and gymnosperm. Many known species of Phytophthora are invasive and have been introduced through nursery and agricultural trade. As part of a larger project aimed at utilizing genomic data for forest disease diagnostics, pathogen detection and monitoring (The TAIGA project: Tree Aggressors Identification using Genomic Approaches; http://taigaforesthealth.com/), we sequenced the genomes of six important Phytophthora species that are important invasive pathogens of trees and a serious threat to the international trade of forest products. This genomic data was used to develop highly sensitive and specific detection assays and for genome comparisons and to make evolutionary inferences and will be useful to the broader plant and tree health community. These WGS data have been deposited in the International Nucleotide Sequence Database Collaboration (DDBJ/ENA/GenBank) under the accession numbers AUPN01000000, AUVH01000000, AUWJ02000000, AUUF02000000, AWVV02000000 and AWVW02000000.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Altmetrics: diversifying the understanding of influential scholarship

Altmetrics: diversifying the understanding of influential scholarship | The science toolbox | Scoop.it
The increase in the availability of data about how research is discussed, used, rated, recommend, saved and read online has allowed researchers to reconsider the mechanisms by which scholarship is evaluated. It is now possible to better track the influence of research beyond academia, though the measures by which we can do so are not yet mature enough to stand on their own. In this article, we examine a new class of data (commonly called “altmetrics”) and describe its benefits, limitations and recommendations for its use and interpretation in the context of research assessment. This article is published as part of a collection on the future of research assessment.
more...
No comment yet.