The science toolbox
14.8K views | +0 today
Follow
The science toolbox
Publishing, open source tools, bioinformatics, biology.
Your new post is loading...
Your new post is loading...
Rescooped by Niklaus Grunwald from Plant pathogenic fungi
Scoop.it!

Scaling up: A guide to high‐throughput genomic approaches for biodiversity analysis

Scaling up: A guide to high‐throughput genomic approaches for biodiversity analysis | The science toolbox | Scoop.it
The purpose of this review is to present the most common and emerging DNA‐based methods used to generate data for biodiversity and biomonitoring studies. As environmental assessment and monitoring programmes may require biodiversity information at multiple levels, we pay particular attention to the DNA metabarcoding method and discuss a number of bioinformatic tools and considerations for producing DNA‐based indicators using operational taxonomic units (OTUs), taxa at a variety of ranks and community composition. By developing the capacity to harness the advantages provided by the newest technologies, investigators can “scale up” by increasing the number of samples and replicates processed, the frequency of sampling over time and space, and even the depth of sampling such as by sequencing more reads per sample or more markers per sample. The ability to scale up is made possible by the reduced hands‐on time and cost per sample provided by the newest kits, platforms and software tools. Results gleaned from broad‐scale monitoring will provide opportunities to address key scientific questions linked to biodiversity and its dynamics across time and space as well as being more relevant for policymakers, enabling science‐based decision‐making, and provide a greater socio‐economic impact. As genomic approaches are continually evolving, we provide this guide to methods used in biodiversity genomics.

Via Francis Martin, Steve Marek
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture

Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture | The science toolbox | Scoop.it

Anatomically modern humans interbred with Neanderthals and with a related archaic population known as Denisovans. Genomes of several Neanderthals and one Denisovan have been sequenced, and these reference genomes have been used to detect introgressed genetic material in present-day human genomes. Segments of introgression also can be detected without use of reference genomes, and doing so can be advantageous for finding introgressed segments that are less closely related to the sequenced archaic genomes. We apply a new reference-free method for detecting archaic introgression to 5,639 whole-genome sequences from Eurasia and Oceania. We find Denisovan ancestry in populations from East and South Asia and Papuans. Denisovan ancestry comprises two components with differing similarity to the sequenced Altai Denisovan individual. This indicates that at least two distinct instances of Denisovan admixture into modern humans occurred, involving Denisovan populations that had different levels of relatedness to the sequenced Altai Denisovan.


Via Pierre Gladieux
more...
No comment yet.
Rescooped by Niklaus Grunwald from Adaptive Evolution and Speciation
Scoop.it!

STAG: Species Tree Inference from All Genes - biorxiv

STAG: Species Tree Inference from All Genes - biorxiv | The science toolbox | Scoop.it
Species tree inference is fundamental to our understanding of the evolution of life on earth. However, species tree inference from molecular sequence data is complicated by gene duplication events that limit the availably of suitable data for phylogenetic reconstruction. Here we propose a novel method for species tree inference called STAG that is specifically designed to leverage data from multi-copy gene families. By application to 12 real species datasets sampled from across the eukaryotic domain we demonstrate that species trees inferred from multi-copy gene families are comparable in accuracy to species trees inferred from single-copy orthologues. We further show that the ability to utilise data from multi-copy gene families increases the amount of data available for species tree inference by an average of 8 fold. We reveal that on real species datasets STAG has higher accuracy than other leading methods for species tree inference; including concatenated alignments of protein sequences, ASTRAL & NJst. Finally we show that STAG is fast, memory efficient and scalable and thus suitable for analysis of large multispecies datasets.

Via Ronny Kellner
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Genomic evidence of speciation reversal in ravens

Genomic evidence of speciation reversal in ravens | The science toolbox | Scoop.it

Many species, including humans, have emerged via complex reticulate processes involving hybridisation. Under certain circumstances, hybridisation can cause distinct lineages to collapse into a single lineage with an admixed mosaic genome. Most known cases of such ‘speciation reversal’ or ‘lineage fusion’ involve recently diverged lineages and anthropogenic perturbation. Here, we show that in western North America, Common Ravens (Corvus corax) have admixed mosaic genomes formed by the fusion of non-sister lineages (‘California’ and ‘Holarctic’) that diverged ~1.5 million years ago. Phylogenomic analyses and concordant patterns of geographic structuring in mtDNA, genome-wide SNPs and nuclear introns demonstrate long-term admixture and random interbreeding between the non-sister lineages. In contrast, our genomic data support reproductive isolation between Common Ravens and Chihuahuan Ravens (C. cryptoleucus) despite extensive geographic overlap and a sister relationship between Chihuahuan Ravens and the California lineage. These data suggest that the Common Raven genome was formed by secondary lineage fusion and most likely represents a case of ancient speciation reversal that occurred without anthropogenic causes.


Via Pierre Gladieux
more...
Scooped by Niklaus Grunwald
Scoop.it!

Data visualization tools drive interactivity and reproducibility in online publishing

Data visualization tools drive interactivity and reproducibility in online publishing | The science toolbox | Scoop.it
New tools for building interactive figures and software make scientific data more accessible, and reproducible.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Bayesian Inference of Species Networks from Multilocus Sequence Data | Molecular Biology and Evolution | Oxford Academic

Bayesian Inference of Species Networks from Multilocus Sequence Data | Molecular Biology and Evolution | Oxford Academic | The science toolbox | Scoop.it
Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

A fast likelihood solution to the genetic clustering problem

A fast likelihood solution to the genetic clustering problem | The science toolbox | Scoop.it
Abstract
1.The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model-based methods, which are usually computer-intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster.

2.Here, we introduce snapclust, a fast maximum-likelihood solution to the genetic clustering problem, which allies the advantages of both model-based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations using a combination of geometric approach and fast likelihood optimization using the Expectation-Maximization (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness-of-fit statistics can also be used to guide the choice of the retained number of clusters.

3.Using extensive simulations, we show that snapclust performs comparably to current gold-standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model-based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset.

4.snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of codominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer-efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Bayesian molecular dating: opening up the black box

Bayesian molecular dating: opening up the black box | The science toolbox | Scoop.it

Molecular dating analyses allow evolutionary timescales to be estimated from genetic data, offering an unprecedented capacity for investigating the evolutionary past of all species. These methods require us to make assumptions about the relationship between genetic change and evolutionary time, often referred to as a ‘molecular clock’. Although initially regarded with scepticism, molecular dating has now been adopted in many areas of biology. This broad uptake has been due partly to the development of Bayesian methods that allow complex aspects of molecular evolution, such as variation in rates of change across lineages, to be taken into account. But in order to do this, Bayesian dating methods rely on a range of assumptions about the evolutionary process, which vary in their degree of biological realism and empirical support. These assumptions can have substantial impacts on the estimates produced by molecular dating analyses. The aim of this review is to open the ‘black box’ of Bayesian molecular dating and have a look at the machinery inside. We explain the components of these dating methods, the important decisions that researchers must make in their analyses, and the factors that need to be considered when interpreting results. We illustrate the effects that the choices of different models and priors can have on the outcome of the analysis, and suggest ways to explore these impacts. We describe some major research directions that may improve the reliability of Bayesian dating. The goal of our review is to help researchers to make informed choices when using Bayesian phylogenetic methods to estimate evolutionary rates and timescales.


Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Scientists have isolated the very first rust pathogen gene that wheat plants detect to 'switch on' resistance

Scientists have isolated the very first rust pathogen gene that wheat plants detect to 'switch on' resistance | The science toolbox | Scoop.it
Famine may be largely a thing of the past but in recent years the re-emergence of a disease that can kill wheat - which provides a fifth of humanity's food - has threatened food security; now a breakthrough is being announce
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

It's Gonna Get a Lot Easier to Break Science Journal Paywalls

It's Gonna Get a Lot Easier to Break Science Journal Paywalls | The science toolbox | Scoop.it
Scientific search engines are the Napster of academic papers—and they're only getting more powerful.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Rapid re-identification of human samples using portable DNA sequencing

Rapid re-identification of human samples using portable DNA sequencing | The science toolbox | Scoop.it
DNA re-identification is used for a broad suite of applications, ranging from cell line authentication to forensics. However, current re-identification schemes suffer from high latency and limited access. Here, we describe a rapid, inexpensive, and portable strategy to robustly re-identify human DNA called 'MinION sketching'. MinION sketching requires as few as 3 min of sequencing and 60-300 random SNPs to re-identify a sample enabling near real-time applications of DNA re-identification. Our method capitalizes on the rapidly growing availability of genomic reference data for cell lines, tissues in biobanks, and individuals. This empowers the application of MinION sketching in research and clinical settings for periodic cell line and tissue authentication. Importantly, our method enables considerably faster and more robust cell line authentication relative to current practices and could help to minimize the amount of irreproducible research caused by mix-ups and contamination in human cell and tissue cultures.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Amazing Science
Scoop.it!

Slaughterbots: Disturbing video depicts near-future ubiquitous lethal autonomous weapons

Slaughterbots: Disturbing video depicts near-future ubiquitous lethal autonomous weapons | The science toolbox | Scoop.it

In response to growing concerns about autonomous weapons, the Campaign to Stop Killer Robots, a coalition of AI researchers and advocacy organizations, has released a fictional video that depicts a disturbing future in which lethal autonomous weapons have become cheap and ubiquitous worldwide.

 

UC Berkeley AI researcher Stuart Russell presented the video at the United Nations Convention on Certain Conventional Weapons in Geneva, hosted by the Campaign to Stop Killer Robots earlier this week. Russell, in an appearance at the end of the video, warns that the technology described in the film already exists* and that the window to act is closing fast.

 

Support for a ban against autonomous weapons has been mounting. On Nov. 2, more than 200 Canadian scientists and more than 100 Australian scientists in academia and industry penned open letters to Prime Minister Justin Trudeau and Malcolm Turnbull urging them to support the ban. Earlier this summer, more than 130 leaders of AI companies signed a letter in support of this week’s discussions. These letters follow a 2015 open letter released by the Future of Life Institute and signed by more than 20,000 AI/robotics researchers and others, including Elon Musk and Stephen Hawking.

 

“Many of the world’s leading AI researchers worry that if these autonomous weapons are ever developed, they could dramatically lower the threshold for armed conflict, ease and cheapen the taking of human life, empower terrorists, and create global instability,” according to an article published by the Future of Life Institute, which funded the video. “The U.S. and other nations have used drones and semi-automated systems to carry out attacks for several years now, but fully removing a human from the loop is at odds with international humanitarian and human rights law.”

 

“The Campaign to Stop Killer Robots is not trying to stifle innovation in artificial intelligence and robotics and it does not wish to ban autonomous systems in the civilian or military world,” explained Noel Sharkey of the International Committee for Robot Arms Control. Rather we see an urgent need to prevent automation of the critical functions for selecting targets and applying violent force without human deliberation and to ensure meaningful human control for every attack.”


Via Dr. Stefan Gruenwald
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

The Fallacy of Open-Access Publication

The Fallacy of Open-Access Publication | The science toolbox | Scoop.it
It’s clearly not open to all if scholars are required to pay to publish their results.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Bioinformatics and genomics
Scoop.it!

davidemms/OrthoFinder: Accurate inference of orthologous gene groups made easy. "OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthologous gene group inf...

davidemms/OrthoFinder: Accurate inference of orthologous gene groups made easy. "OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthologous gene group inf... | The science toolbox | Scoop.it
GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.

Via Jesper Svedberg
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Runs of homozygosity: windows into population history and trait architecture

Runs of homozygosity: windows into population history and trait architecture | The science toolbox | Scoop.it
The inheritance of identical haplotypes from a common ancestor creates long tracts of homozygous genotypes known as runs of homozygosity (ROH).

ROH are ubiquitous across human populations, and they correlate with pedigree inbreeding. Larger populations have fewer, shorter ROH, whereas isolated or bottlenecked populations have more, somewhat longer ROH. Admixed groups have the fewest ROH, whereas consanguineous communities carry very long ROH. Native American populations have the highest burdens of ROH in the world.

ROH can be detected in microarray or whole-genome sequencing (WGS) data, using either observational approaches, for example, that implemented in PLINK, or model-based approaches. Simulations show that PLINK outperforms many other methods.

ROH are non-randomly distributed across the genome, being more prevalent in areas of low recombination, but are also concentrated in small regions called ROH islands.

Quantitative traits related to stature and cognition have been robustly associated with ROH burden, implying recessive variants contribute to their genetic architecture. Case–control analyses of ROH, on the other hand, appear more easily confounded by socioeconomic or cultural factors.

Both megacohorts and special populations are now being used to investigate diverse aspects of the scope and mechanism of inbreeding depression in humans.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Taxa: An R package implementing data standards and methods for taxonomic data - F1000Research

Taxa: An R package implementing data standards and methods for taxonomic data - F1000Research | The science toolbox | Scoop.it
The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages, which provide broadly useful functionality that we hope will speed adoption by users and developers.
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Genome-Enhanced Detection and Identification (GEDI) of plant pathogens [PeerJ]

Genome-Enhanced Detection and Identification (GEDI) of plant pathogens [PeerJ] | The science toolbox | Scoop.it
Plant diseases caused by fungi and Oomycetes represent worldwide threats to crops and forest ecosystems. Effective prevention and appropriate management of emerging diseases rely on rapid detection and identification of the causal pathogens. The increase in genomic resources makes it possible to generate novel genome-enhanced DNA detection assays that can exploit whole genomes to discover candidate genes for pathogen detection. A pipeline was developed to identify genome regions that discriminate taxa or groups of taxa and can be converted into PCR assays. The modular pipeline is comprised of four components: (1) selection and genome sequencing of phylogenetically related taxa, (2) identification of clusters of orthologous genes, (3) elimination of false positives by filtering, and (4) assay design. This pipeline was applied to some of the most important plant pathogens across three broad taxonomic groups: Phytophthoras (Stramenopiles, Oomycota), Dothideomycetes (Fungi, Ascomycota) and Pucciniales (Fungi, Basidiomycota). Comparison of 73 fungal and Oomycete genomes led the discovery of 5,939 gene clusters that were unique to the targeted taxa and an additional 535 that were common at higher taxonomic levels. Approximately 28% of the 299 tested were converted into qPCR assays that met our set of specificity criteria. This work demonstrates that a genome-wide approach can efficiently identify multiple taxon-specific genome regions that can be converted into highly specific PCR assays. The possibility to easily obtain multiple alternative regions to design highly specific qPCR assays should be of great help in tackling challenging cases for which higher taxon-resolution is needed.
more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

Reliable ABC model choice via random forests | Bioinformatics | Oxford Academic

Motivation: Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. Results: We propose a novel approach based on a machine learning tool named random forests (RF) to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with RF and postponing the approximation of the posterior probability of the selected model for a second stage also relying on RF. Compared with earlier implementations of ABC model choice, the ABC RF approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least 50) and (iv) it includes an approximation of the posterior probability of the selected model. The call to RF will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. Availability and implementation: The proposed methodology is implemented in the R package abcrf available on the CRAN. Contact:jean-michel.marin@umontpellier.fr Supplementary information:Supplementary data are available at Bioinformatics online.


Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing

dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing | The science toolbox | Scoop.it

Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy–Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data—therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format—genlight from the adegenet package—as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub.

more...
No comment yet.
Rescooped by Niklaus Grunwald from Pathogens, speciation, domestication, genomics, fungi, biotic interactions
Scoop.it!

The Python Graph Gallery

The Python Graph Gallery | The science toolbox | Scoop.it
Visualizing data - with Python

Via Pierre Gladieux
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

In defence of model‐based inference in phylogeography

In defence of model‐based inference in phylogeography | The science toolbox | Scoop.it

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.

more...
No comment yet.
Rescooped by Niklaus Grunwald from MycorWeb Plant-Microbe Interactions
Scoop.it!

CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences

CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences | The science toolbox | Scoop.it

Background
One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The internal transcribed spacer (ITS) region of the ribosomal DNA is the conventional marker region for fungal community studies. While bioinformatics pipelines that cluster reads into OTUs have received much attention in the literature, less attention has been given to the taxonomic classification of these sequences, upon which biological inference is dependent.


Results
Here we compare how three common fungal OTU taxonomic assignment tools (RDP Classifier, UTAX, and SINTAX) handle ITS fungal sequence data. The classification power, defined as the proportion of assigned OTUs at a given taxonomic rank, varied among the classifiers. Classifiers were generally consistent (assignment of the same taxonomy to a given OTU) across datasets and ranks; a small number of OTUs were assigned unique classifications across programs. We developed CONSTAX (CONSensus TAXonomy), a Python tool that compares taxonomic classifications of the three programs and merges them into an improved consensus taxonomy. This tool also produces summary classification outputs that are useful for downstream analyses.


Conclusions
Our results demonstrate that independent taxonomy assignment tools classify unique members of the fungal community, and greater classification power is realized by generating consensus taxonomy of available classifiers with CONSTAX.


Via Francis Martin
more...
No comment yet.
Scooped by Niklaus Grunwald
Scoop.it!

Explore the skills that can open career doors after your doctoral training

Explore the skills that can open career doors after your doctoral training | The science toolbox | Scoop.it
The skills you develop as a Ph.D. student and those you need for jobs have some overlap, but there are also important gaps
more...
No comment yet.
Rescooped by Niklaus Grunwald from Adaptive Evolution and Speciation
Scoop.it!

Synima: a Synteny imaging tool for annotated genome assemblies - BMC Bioinformatics

Synima: a Synteny imaging tool for annotated genome assemblies - BMC Bioinformatics | The science toolbox | Scoop.it

Background
Ortholog prediction and synteny visualization across whole genomes are valuable methods for detecting and representing a range of evolutionary processes such as genome expansion, chromosomal rearrangement, and chromosomal translocation. Few standalone methods are currently available to visualize synteny across any number of annotated genomes.

Results
Here, I present a Synteny Imaging tool (Synima) written in Perl, which uses the graphical features of R. Synima takes orthologues computed from reciprocal best BLAST hits or OrthoMCL, and DAGchainer, and outputs an overview of genome-wide synteny in PDF. Each of these programs are included with the Synima package, and a pipeline for their use. Synima has a range of graphical parameters including size, colours, order, and labels, which are specified in a config file generated by the first run of Synima – and can be subsequently edited. Synima runs quickly on a command line to generate informative and publication quality figures. Synima is open source and freely available from https://github.com/rhysf/Synima under the MIT License.

Conclusions
Synima should be a valuable tool for visualizing synteny between two or more annotated genome assemblies.


Via Ronny Kellner
more...
Scooped by Niklaus Grunwald
Scoop.it!

A Bayesian method for detecting pairwise associations in compositional data

A Bayesian method for detecting pairwise associations in compositional data | The science toolbox | Scoop.it

Data from many fields are available primarily in the form of proportions, also referred to as compositions, which impose mathematical constraints on identifying interactions among components in the underlying systems. In particular, correlations cannot be calculated directly from proportions or from count data that give rise to them. Methods that work around this difficulty generally do so by imposing strong assumptions about the distribution of underlying data or associated correlations, and these in turn often prevent quantifying uncertainty in the resulting estimates of correlation. We developed a statistical model (BAnOCC: Bayesian Analysis of Compositional Covariance) that both estimates correlations between counts or proportions and provides a posterior distribution for each correlation that quantifies how uncertain the estimate is. BAnOCC does well at controlling the number of false positives in simulated data and can be practically applied to a wide range of proportional data types.

more...
No comment yet.