Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.
Scott Jackson, Jeremy Schmutz, Phillip McClean and colleagues report the genome sequence of the common bean (Phaseolus vulgaris) and resequenced wild individuals and landraces from Mesoamerican and Andean gene pools, showing that common bean underwent two independent domestications.
Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus.
To feed the world's rapidly-expanding population in the coming decades, agriculture must produce more. Big data holds one of the keys for farmers, but it's also a weapon that could be used against them.
Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, tassel-gbs, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The tassel-gbs pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8–16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished “pseudo-reference” consisting of numerous contigs. We describe the tassel-gbs pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays), where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the tassel-gbs pipeline provide robust tools for studying genomic diversity.
Synthetic biology is an emerging field uniting scientists from all disciplines with the aim of designing or re-designing biological processes. Initially, synthetic biology breakthroughs came from microbiology, chemistry, physics, computer science, materials science, mathematics, and engineering disciplines. A transition to multicellular systems is the next logical step for synthetic biologists and plants will provide an ideal platform for this new phase of research. This meeting report highlights some of the exciting plant synthetic biology projects, and tools and resources, presented and discussed at the 2013 GARNet workshop on plant synthetic biology.
Natural variants of crops are generated from wild progenitor plants under both natural and human selection. Diverse crops that are able to adapt to various environmental conditions are valuable resources for crop improvements to meet the food demands of the increasing human population. With the completion of reference genome sequences, the advent of high-throughput sequencing technology now enables rapid and accurate resequencing of a large number of crop genomes to detect the genetic basis of phenotypic variations in crops. Comprehensive maps of genome variations facilitate genome-wide association studies of complex traits and functional investigations of evolutionary changes in crops. These advances will greatly accelerate studies on crop designs via genomics-assisted breeding. Here, we first discuss crop genome studies and describe the development of sequencingbased genotyping and genome-wide association studies in crops. We then review sequencing-based crop domestication studies and offer a perspective on genomics-driven crop designs.
In plants, male sterility can be caused either by mitochondrial genes with coupled nuclear genes or by nuclear genes alone; the resulting conditions are known as cytoplasmic male sterility (CMS) and genic male sterility (GMS), respectively. CMS and GMS facilitate hybrid seed production for many crops and thus allow breeders to harness yield gains associated with hybrid vigor (heterosis). In CMS, layers of interaction between mitochondrial and nuclear genes control its male specificity, occurrence, and restoration of fertility. Environment-sensitive GMS (EGMS) mutants may involve epigenetic control by noncoding RNAs and can revert to fertility under different growth conditions, making them useful breeding materials in the hybrid seed industry. Here, we review recent research on CMS and EGMS systems in crops, summarize general models of male sterility and fertility restoration, and discuss the evolutionary significance of these reproductive systems.
The human microbiota represents the trillions of bacteria that live on the skin, in the oral, nasal, and aural cavities, and throughout the gastrointestinal tract. The species that live in the gastrointestinal tract, the gut microbiota, closely interact with host cells and have a profound impact on health. To develop tools to effectively monitor the gut microbiota and ultimately help in disease diagnosis, we have engineered Escherichia coli to sense and record environmental stimuli, and demonstrated that E. coli with such memory systems can survive and function in the mammalian gut. This work demonstrates that E. coli can be engineered into living diagnostics capable of nondestructively probing the mammalian gut.
We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.
PLOS Biology is an open-access, peer-reviewed journal that features works of exceptional significance in all areas of biological science, from molecules to ecosystems, including works at the interface with other disciplines.
Millions of tons. That’s how much plastic should be floating in the world’s oceans, given our ubiquitous use of the stuff. But a new study finds that 99% of this plastic is missing. One disturbing possibility: Fish are eating it.
If that’s the case, “there is potential for this plastic to enter the global ocean food web,” says Carlos Duarte, an oceanographer at the University of Western Australia, Crawley. “And we are part of this food web.”
Humans produce almost 300 million tons of plastic each year. Most of this ends up in landfills or waste pits, but a 1970s National Academy of Sciences study estimated that 0.1% of all plastic washes into the oceans from land, carried by rivers, floods, or storms, or dumped by maritime vessels. Some of this material becomes trapped in Arctic ice and some, landing on beaches, can even turn into rocks made of plastic. But the vast majority should still be floating out there in the sea, trapped in midocean gyres—large eddies in the center of oceans, like theGreat Pacific Garbage Patch.
To figure out how much refuse is floating in those garbage patches, four ships of the Malaspina expedition, a global research project studying the oceans, fished for plastic across all five major ocean gyres in 2010 and 2011. After months of trailing fine mesh nets around the world, the vessels came up light—by a lot. Instead of the millions of tons scientists had expected, the researchers calculated the global load of ocean plastic to be about only 40,000 tons at the most, the researchers report online today in the Proceedings of the National Academy of Sciences. “We can’t account for 99% of the plastic that we have in the ocean,” says Duarte, the team’s leader.
He suspects that a lot of the missing plastic has been eaten by marine animals. When plastic is floating out on the open ocean, waves and radiation from the sun can fragment it into smaller and smaller particles, until it gets so small it begins to look like fish food—especially to small lanternfish, a widespread small marine fish known to ingest plastic.
“Yes, animals are eating it,” says oceanographer Peter Davison of the Farallon Institute for Advanced Ecosystem Research in Petaluma, California, who was not involved in the study. “That much is indisputable.”
But, he says, it’s hard to know at this time what the biological consequences are. Toxic ocean pollutants like DDT, PCBs, or mercury cling to the surface of plastics, causing them to “suck up all the pollutants in the water and concentrate them.” When animals eat the plastic, that poison could be going into the fish and traveling up the food chain to market species like tuna or swordfish. Or, Davison says, toxins in the fish “may dissolve back into the water … or for all we know they’re puking [the plastic] or pooping it out, and there’s no long-term damage. We just don’t know.”
The last several years have witnessed an explosion in the understanding and use of novel, versatile trans-acting elements. TALEs, CRISPR/Cas, and sRNAs can be easily fashioned to bind any specific sequence of DNA (TALEs, CRISPR/Cas) or RNA (sRNAs) because of the simple rules governing their interactions with nucleic acids. This unique property enables these tools to repress the expression of genes at the transcriptional or post-transcriptional levels, respectively, without prior manipulation of cis-acting and/or chromosomal target DNA sequences. These tools are now being harnessed by synthetic biologists, particularly those in the eukaryotic community, for genome-wide regulation, editing, or epigenetic studies. Here we discuss the exciting opportunities for using TALEs, CRISPR/Cas, and sRNAs as synthetic trans-acting regulators in prokaryotes.
High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays ‘Fernandez’) plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable.
Cloud computing offers some great opportunities for science, but most cloud computing platforms are I/O and memory limited, and hence are poor matches for data-intensive computing. After 4 years of research software development we are now instrumenting and benchmarking our analysis pipelines; numbers, lessons learned, and future plans will be discussed. Everything is open source.
Plant stature and development are governed by cell proliferation and directed cell growth. These parameters are determined largely by cell wall characteristics. Cellulose microfibrils, composed of hydrogen-bonded β-1,4 glucans, are key components for anisotropic growth in plants. Cellulose is synthesized by plasma membrane–localized cellulose synthase complexes. In higher plants, these complexes are assembled into hexagonal rosettes in intracellular compartments and secreted to the plasma membrane. Here, the complexes typically track along cortical microtubules, which may guide cellulose synthesis, until the complexes are inactivated or internalized. Determining the regulatory aspects that control the behavior of cellulose synthase complexes is vital to understanding directed cell and plant growth and to tailoring cell wall content for industrial products, including paper, textiles, and fuel. In this review, we summarize and discuss cellulose synthesis and regulatory aspects of the cellulose synthase complex, focusing on Arabidopsis thaliana.
Drought is one of the most important environmental stresses affecting the productivity of most field crops. Elucidation of the complex mechanisms underlying drought resistance in crops will accelerate the development of new varieties with enhanced drought resistance. Here, we provide a brief review on the progress in genetic, genomic, and molecular studies of drought resistance in major crops. Drought resistance is regulated by numerous small-effect loci and hundreds of genes that control various morphological and physiological responses to drought. This review focuses on recent studies of genes that have been well characterized as affecting drought resistance and genes that have been successfully engineered in staple crops. We propose that one significant challenge will be to unravel the complex mechanisms of drought resistance in crops through more intensive and integrative studies in order to find key functional components or machineries that can be used as tools for engineering and breeding drought-resistant crops.
Genetically engineered crops were first commercialized in 1994 and since then have been rapidly adopted, enabling growers to more effectively manage pests and increase crop productivity while ensuring food, feed, and environmental safety. The development of these crops is complex and based on rigorous science that must be well coordinated to create a plant with desired beneficial phenotypes. This article describes the general process by which a genetically engineered crop is developed from an initial concept to a commercialized product.
CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems in bacteria and archaea employ CRISPR RNAs to specifically recognize the complementary DNA of foreign invaders, leading to sequence-specific cleavage or degradation of the target DNA. Recent work has shown that the accidental or intentional targeting of the bacterial genome is cytotoxic and can lead to cell death. Here, we have demonstrated that genome targeting with CRISPR-Cas systems can be employed for the sequence-specific and titratable removal of individual bacterial strains and species. Using the type I-E CRISPR-Cas system in Escherichia coli as a model, we found that this effect could be elicited using native or imported systems and was similarly potent regardless of the genomic location, strand, or transcriptional activity of the target sequence. Furthermore, the specificity of targeting with CRISPR RNAs could readily distinguish between even highly similar strains in pure or mixed cultures. Finally, varying the collection of delivered CRISPR RNAs could quantitatively control the relative number of individual strains within a mixed culture. Critically, the observed selectivity and programmability of bacterial removal would be virtually impossible with traditional antibiotics, bacteriophages, selectable markers, or tailored growth conditions. Once delivery challenges are addressed, we envision that this approach could offer a novel means to quantitatively control the composition of environmental and industrial microbial consortia and may open new avenues for the development of “smart” antibiotics that circumvent multidrug resistance and differentiate between pathogenic and beneficial microorganisms.
Genome duplication with hybridization, or allopolyploidization, occurs commonly in plants, and is considered to be a strong force for generating new species. However, genome-wide quantification of homeolog expression ratios was technically hindered because of the high homology between homeologous gene pairs. To quantify the homeolog expression ratio using RNA-seq obtained from polyploids, a new method named HomeoRoq was developed, in which the genomic origin of sequencing reads was estimated using mismatches between the read and each parental genome. To verify this method, we first assembled the two diploid parental genomes of Arabidopsis halleri subsp. gemmifera andArabidopsis lyrata subsp. petraea (Arabidopsis petraea subsp. umbrosa), then generated a synthetic allotetraploid, mimicking the natural allopolyploid Arabidopsis kamchatica. The quantified ratios corresponded well to those obtained by Pyrosequencing. We found that the ratios of homeologs before and after cold stress treatment were highly correlated (r = 0.870). This highlights the presence of nonstochastic polyploid gene regulation despite previous research identifying stochastic variation in expression. Moreover, our new statistical test incorporating overdispersion identified 226 homeologs (1.11% of 20 369 expressed homeologs) with significant ratio changes, many of which were related to stress responses. HomeoRoq would contribute to the study of the genes responsible for polyploid-specific environmental responses.