Gymnosperms are a group of land plants comprising the extant taxa, cycads, Ginkgo, gnetophytes and conifers. Gymnosperms first appeared more than 300 million years ago (Myr ago), well before the angiosperm lineage separated from the stem group of extant gymnosperms. The major radiation of conifer families occurred 250–65 Myr ago, and during their evolution the morphology of conifers has changed relatively little. There are approximately 630 conifer species, representing about 70 currently recognized genera, which dominate many terrestrial ecosystems, especially in the Northern Hemisphere. Conifers also dominated both before and after the major mass extinction events at the end of the Permian and Cretaceous periods, around 250 and 65 Myr ago, respectively. Conifers are of immense ecological and economic value; coniferous forests cover enormous areas in the Northern Hemisphere, and conifers are keystone species in many other ecosystems. Conifers contribute a large fraction of terrestrial photosynthesis and biomass, and the cultural and economic values of conifers are also paramount; early civilizations used conifers for firewood, tools and artefacts and today several national economies depend on commodities produced from conifers. However, despite their abundance and importance, our understanding of conifer genomes is limited. Most conifers have 12 (2n = 24) chromosomes, probably reflecting the ancestral karyotype, which are typically of similar size, each being roughly comparable to the size of the human genome, and containing high proportions of repetitive elements. The gene space of conifer genomes has not been well characterized, although several reports have suggested that gene families in conifers may be larger than their angiosperm counterparts and that conifer genomes contain numerous pseudogenes.
Because their genomes are among the largest—typically 20–30 gigabases pairs (Gb)—of all organisms, genome-wide analyses of conifers are particularly challenging. Thus, no full genome sequence of a gymnosperm species is available at present, whereas 30 angiosperm and more basal plant genomes have been sequenced. However, size is not the only challenge to sequencing presented by conifer genomes. Conifers are typically outbreeding, produce wind-dispersed pollen, have very large effective population sizes, and their genomes are highly heterozygous, although their nucleotide substitution rates are lower than those of most angiosperms, perhaps owing to long lifespan (decades to centuries). Furthermore, inbreeding depression negates the production of inbred lines that could facilitate genome assembly.